In this article we take a look at why we need comprehensive and systematic evaluation of rainfall models. We also examine a new model evaluation framework, with examples of the framework in action.

Imagine a typical scene from a detective novel:

Sirens scream past – like every Tuesday in this forsaken town. I was about to close up shop for the night when a worried young man stepped sheepishly into my office. I couldn’t understand him at first. But through his mumblings it became clear that something was wrong. Something was wrong with … The Rain.

Actually, the work of a hydrologist is much like a detective.

Rainfall models and the character of rain

We rely on models of ‘fake’ rain. These rainfall models are relied upon to assess the hydrological impacts of droughts, floods, land-use and climate change. For example, to evaluate flood risk you can select a spatial rainfall model capable of generating long sequences of rainfall for the catchment.


Daily spatial rainfall field simulation excerpt

Daily spatial rainfall field simulation over the Onkaparinga catchment

But to provide robust assessments, the simulated rainfall must reproduce observed rainfall characteristics in space and time. And across a wide range of scales.

This is not a simple task. Many potential issues can arise – not enough data, model is too simple, etc.

When people think of ‘rain’ they think of one character, when actually there is a whole family. Some of the main characters include:

  • Daily rainfall amounts
  • Total annual rainfall
  • Inter-annual variability (i.e. year-to-year variability in the rainfall)
  • Wet/dry spell distributions
  • Seasonality
  • Spatial variability
  • Extremes

This Family of Rain Characters is complex. Each has its own personality. The problem is that when there is trouble reproducing an ‘observed rainfall’ characteristic, any one of these characters (or perhaps all of them) could be a culprit.

It is challenging because they are all interlinked. When we try to isolate ‘who’ caused what effect, they can provide alibis for each other! For example, an issue with low variability between years could actually be an issue with seasonality instead.

Imagine how our detective would tackle this challenge:

It smelt fishy to me. When dealing with a bunch of low lives like the Rainfall family you need to be thorough. It may be tempting to only interrogate a few key players and repeat offenders (inter-annual variability and wet-dry pattern come to mind). But going with your gut won’t cut it in this case. It occurred to me that any analysis of these slippery characters needs to be comprehensive and systematic. They need to be lined up side-by-side and interrogated to figure out who is pulling the strings and who is in cahoots.

In reality, past evaluations of rainfall models have presented performance in descriptive terms (e.g. words like ‘satisfactory’ or ‘well’). They have often used a set of selected statistics, sites or time periods. It is not systematic.

A new model evaluation framework

To address these issues, members of this research group have developed a new framework for evaluating rainfall model performance. The framework uses quantitative criteria to assess model performance across a comprehensive range of observed statistics of interest.

The framework is comprehensive. It plainly summarises performance across a range of time scales (years/months/days), and spatial scales (sites/fields). By using quantitative criteria (defined a priori) the evaluation is made transparent and avoids the need to frame performance results in purely descriptive terms.

These features of the framework help to identify model strengths and weaknesses, and to untangle the origin of deficiencies.

The framework in action

Let’s look at applying the framework to evaluate the performance of a rainfall model in simulating 100 realisations of daily rainfall for 73 years. We’ll look at this rainfall across 19 sites for a range of statistics, scales and seasons. The problem has many dimensions and needs to be tackled in a comprehensive and systematic fashion.

The performance criteria of the framework is used first, to assess the performance of each individual statistic of interest for each site/scale.  Then, the individual analyses can be summarised to provide an overview of model performance across a range of model properties.

A short summary table is presented below to illustrate this concept. In the table, ‘Good’ performance is displayed in green, ‘Fair’ in yellow, and ‘Poor’ in red, according to the applied quantitative performance criteria. The figure below illustrates that the majority of sites and months are categorised as ‘Good’ in simulating:

  • mean wet day amounts
  • standard deviation of wet day amounts
  • the mean number of wet days
  • the mean total monthly rainfall
  • the standard deviation of  monthly total rainfall.
Table 1 - Comparison of performance (adapted from Bennett et al. 2016).

Figure 1 – Comparison of performance (adapted from Bennett et al. 2016). The quantitative performance criteria for each individual statistic are • Good – less than 10% of observations fall outside the simulation’s 90% probability limits (indicated using green) • Fair – the observed statistic lies within the 99.7% limits or the absolute relative different between the simulated and observed mean is 5% or less (indicated using yellow) • Poor – otherwise (indicated using red)


However, looking at the annual scale, the majority of sites are categorised as ‘Fair’ or ‘Poor’ in simulating the lower tail of the total annual rainfall distribution and variability in annual totals.

The ‘Poor’ performance is due to an over-estimation of the annual total rainfall in the lower tail, by 15% on average (see Figure 2).

This under-prediction of variability in aggregate totals is a known issue for many rainfall simulators [2], [3]. It is often attributed to a lack of model persistence between months or years. However, in this case, the comprehensive evaluation framework demonstrated that the model performance for year-to-year and month-to-month persistence were categorised as ‘Good’. In this case, the lack of variability in the number of wet days simulated annually (see the last row of Figure 1) was identified as the likely cause of the ‘Poor’ performance in simulating variability in total annual rainfall.

At site annual totals. Adapted from Bennett et al. 2016

Figure 2 – At site annual totals for all sites (left) standard deviations and (right) lower tail (5th percentile), 90% probability limits shown. Barcharts indicate performances as a percentage of sites. Adapted from Bennett et al. 2016

This ability to identify model strengths and weakness via systematic and comprehensive evaluation is the key advantage of the framework.

For more on the applying the full framework read the full journal article here.


[1] BENNETT, B., THYER, M., LEONARD, M., LAMBERT, M. & BATES, B. 2016. A comprehensive and systematic evaluation framework for a parsimonious daily rainfall field model, Journal of Hydrology, Available online 27 December 2016,

[2] MEHROTRA, R., & SHARMA, A. (2007). A semi-parametric model for stochastic generation of multi-site daily rainfall exhibiting low-frequency variability.Journal of Hydrology,  335(1), 180-193,

[3] WILKS, D. S. (1999). Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agricultural and Forest Meteorology, 93(3), 153-169, DOI: 10.1016/S0168-1923(98)00125-7.