Tutorial 1: How predictive maps are made

From field measurements and satellite data to continuous maps of land health indicators.

1.1 What is a predictive map?

A predictive map estimates a variable (e.g., SOC, erosion risk, vegetation condition) at locations where we have no direct measurement.
It is built by learning relationships between:
- What we observe in the field (soil samples, vegetation surveys), and
- Other spatial layers (rainfall, topography, land cover, etc.) that cover the whole region.
The result is a continuous surface where every pixel has a predicted value based on those relationships.
Predictive maps are always estimates, not a perfect representation of reality.

Field data (observations):
- Measurements at specific locations (e.g., SOC from LDSF plots, tree counts, erosion assessments).
- These points are our “ground truth” used to train and check the model.
Satellite and other spatial data (covariates):
- Climate: rainfall, temperature, drought indices.
- Land: elevation, slope, landform, soils, land cover.
- Vegetation: EVI, NDVI, tree cover, fire history.
In a predictive model:
- Field measurements are the target (what we want to predict).
- Satellite and other layers are predictors (variables that explain how the target varies in space).

Step 1: Align data
- Extract predictor values (e.g., rainfall, EVI, elevation) at each field measurement location.
- Build a table where each row is a field sample and each column is a predictor.
Step 2: Fit a model
- Use statistical or machine-learning methods (e.g., regression, random forests, gradient boosting) to learn:
  - How changes in predictors (rainfall, slope, vegetation) are associated with changes in the target (SOC, erosion risk, etc.).
Step 3: Predict across the landscape
- Apply the trained model to every pixel in the region using the predictor layers.
- For each pixel, the model estimates the likely value of the target indicator.
- The output is a raster where each pixel holds the predicted value.

Good predictive maps must be tested, not just produced.
Common checks include:
- Holding back some field data as a test set that is not used for model training.
- Comparing model predictions against these held-out observations.
- Calculating summary statistics such as:
  - How close predictions are, on average, to observed values.
  - Whether the model underestimates or overestimates in certain environments.
Additional checks:
- Looking at maps of residuals (differences between predicted and observed values).
- Evaluating performance across different subregions or land types.
These tests help identify:
- Where the model is strong, and
- Where we should be more cautious in interpreting the map.

Raw model outputs often go through additional steps before reaching decision-makers:
- Masking areas outside the data domain (e.g., water bodies, urban areas, non-target ecosystems).
- Smoothing or aggregation to reduce noise and match the scale of decision-making (e.g., administrative units or watersheds).
- Reclassification into categories (e.g., low/medium/high SOC, low/high erosion risk) to make interpretation easier.
For K4GGWA, these processed maps are then:
- Integrated into dashboards and reports.
- Linked with contextual information (climate, land use, communities).
- Used alongside uncertainty information (explained in Tutorial 3) to support responsible decisions.
The key message:
- Predictive maps are data-driven approximations built from field and satellite information.
- Their value depends on both the quality of inputs and the rigour taken in modelling and validation.