Tutorial 1: How predictive maps are made
From field measurements and satellite data to continuous maps of land health indicators.
1.1 What is a predictive map?
- A predictive map estimates a variable (e.g., SOC, erosion risk, vegetation condition) at locations where we have no direct measurement.
- It is built by learning relationships between:
- What we observe in the field (soil samples, vegetation surveys), and
- Other spatial layers (rainfall, topography, land cover, etc.) that cover the whole region.
- What we observe in the field (soil samples, vegetation surveys), and
- The result is a continuous surface where every pixel has a predicted value based on those relationships.
- Predictive maps are always estimates, not a perfect representation of reality.
1.2 Inputs: field data, satellites, and covariates
- Field data (observations):
- Measurements at specific locations (e.g., SOC from LDSF plots, tree counts, erosion assessments).
- These points are our “ground truth” used to train and check the model.
- Satellite and other spatial data (covariates):
- Climate: rainfall, temperature, drought indices.
- Land: elevation, slope, landform, soils, land cover.
- Vegetation: EVI, NDVI, tree cover, fire history.
- In a predictive model:
- Field measurements are the target (what we want to predict).
- Satellite and other layers are predictors (variables that explain how the target varies in space).
1.3 Building the model
- Step 1: Align data
- Extract predictor values (e.g., rainfall, EVI, elevation) at each field measurement location.
- Build a table where each row is a field sample and each column is a predictor.
- Step 2: Fit a model
- Use statistical or machine-learning methods (e.g., regression, random forests, gradient boosting) to learn:
- How changes in predictors (rainfall, slope, vegetation) are associated with changes in the target (SOC, erosion risk, etc.).
- Use statistical or machine-learning methods (e.g., regression, random forests, gradient boosting) to learn:
- Step 3: Predict across the landscape
- Apply the trained model to every pixel in the region using the predictor layers.
- For each pixel, the model estimates the likely value of the target indicator.
- The output is a raster where each pixel holds the predicted value.
1.4 Validation and quality checks
- Good predictive maps must be tested, not just produced.
- Common checks include:
- Holding back some field data as a test set that is not used for model training.
- Comparing model predictions against these held-out observations.
- Calculating summary statistics such as:
- How close predictions are, on average, to observed values.
- Whether the model underestimates or overestimates in certain environments.
- Additional checks:
- Looking at maps of residuals (differences between predicted and observed values).
- Evaluating performance across different subregions or land types.
- These tests help identify:
- Where the model is strong, and
- Where we should be more cautious in interpreting the map.
- Where the model is strong, and
1.5 From model output to decision-ready maps
- Raw model outputs often go through additional steps before reaching decision-makers:
- Masking areas outside the data domain (e.g., water bodies, urban areas, non-target ecosystems).
- Smoothing or aggregation to reduce noise and match the scale of decision-making (e.g., administrative units or watersheds).
- Reclassification into categories (e.g., low/medium/high SOC, low/high erosion risk) to make interpretation easier.
- For K4GGWA, these processed maps are then:
- Integrated into dashboards and reports.
- Linked with contextual information (climate, land use, communities).
- Used alongside uncertainty information (explained in Tutorial 3) to support responsible decisions.
- The key message:
- Predictive maps are data-driven approximations built from field and satellite information.
- Their value depends on both the quality of inputs and the rigour taken in modelling and validation.