Dynamical downscaling of global climate model (GCM) data from ~150 km to 12 km resolution or smaller
requires running a computationally expensive regional climate model (RCM) using GCM forcing data at the
lateral and surface boundaries. In addition, RCM output data (temperature, precipitation, etc.) are biased
with respect to in-situ observations due to various physical processes that are not adequately represented
in the model, sometimes due to sub-grid scale effects. Methods in machine learning have the potential to
extract more abstract relationships between in-situ observations and GCMs in addition to using RCM
simulation reference data, and thus could improve the representation and accuracy of downscaled
variables. In particular, precipitation is notoriously difficult to predict due to complex sub-grid scale
processes and local features such as orography. In this study, we explore and test different machine
learning approaches with the aim of improving the accuracy of regionally downscaled GCM output.
To test the effectiveness of methods in machine learning, we downscaled a variety of regional circulation
indices to monthly rainfall anomalies (mm/day) for a single location (Whenuapai, Auckland). The circulation
indices used were the M1 and Z1 Trenberth indices - which describe both zonal and meridional flow across
New Zealand, and the Southern Oscillation Index (SOI) - which describes the atmospheric phase of the El
Niš Ģo Southern Oscillation. These results were tested against a baseline multivariate linear regression.
With the help of NeSIās consultancy service, we developed a scalable pipeline to automatically run a variety
of experiments including varying the number of lagged circulation indices and training a large selection of
models. For all our linear models (e.g., OLS, Gradient Boosting) the minimum root mean square error (rms)
was achieved using approximately 96 lagged months of the circulation indices, which in turn explained
approximately 10 -15 % of the variance in the rainfall anomalies. However, through using a deep neural
network, we can explain approximately 50% of the variance in rainfall. The significant improvement in
accuracy is a strong indication that deep neutral networks can extract more abstract relationships from the
lagged history of circulation indices.
Since our initial results are promising, we have applied the trained model to data from past and future
climate model projections and compared the estimates of climate change signal at the study site from
machine learning with dynamical regional climate models output. Future work will include downscaling
circulation and synoptic flow patterns, that is two-dimensional synoptic fields to both daily and monthly
gridded rainfall anomalies.