Working with the World Bank climate data


Tasks covered:
Introduction to the World Bank data repository
Introduction to the World Bank climate data
Introduction to the rWBclimate package 
Downloading World Bank data  
Computing linear, quadratic, and cubic regression models 
Adding regression model curves to an existing chart 
 

Project script: Working with World Bank Climate data.R


The World Bank Open Data Repository is a rich data resource for many types of data analysis. It is a free and well curated source of time-series, geospatial, and unit-level [micro-data] data for regions of the world or individual countries. This article explores the World Bank climate data using functions in the rWBclimate package.

The first section of included R script will download temperature data from the World Bank Open Data Repository for Great Britain from 1900 to 2012. The code computes the correlation coefficient of the temperature data versus the year as the first step in analyzing this data.

gbr.historical <- get_historical_temp("GBR", "year") 
cor(gbr.historical$data, gbr.historical$year) # correlation coefficient 
[1] 0.5262769 

The command get_historical_temp( ) downloads temperature data for the designated geographical locator [region] and time scale [month, decade, or year]. The correlation coefficient of the temperature data versus the year is 0.5262769. While this is not a strong correlation value, it suggests that some positive correlation exists between these two variables. Plotting the temperature data helps see its behavior over the selected time span.

plot(gbr.historical$year, gbr.historical$data,type = 'l',xlab = 'Year',  
ylab = 'Temperature',main = 'UK temperatures 1900-2010') 



The positive correlation trend is visible, but some undulation in the plot trend may explain the 0.52 correlation value. The script computes three regression models: a linear model, a model that includes a quadratic term, and a model that includes a quadratic and a cubic term. 

# compute three regression models 
lmTemp <- lm(gbr.historical$data ~ gbr.historical$year) # linear model  
qdTemp <- update(lmTemp, . ~ . + I(gbr.historical$year^2)) # quadratic model 
cuTemp = update(qdTemp, . ~ . + I(gbr.historical$year^3)) # cubic model 

The linear model uses the year as a predictor of the temperature. The quadratic model uses the year plus the year squared to predict the temperature. The cubic model uses the year plus the year squared plus the year cubed to predict the temperature. The function below will evaluate the three models, given their coefficients, so that each model can be converted to a curve that can be added to the temperature plot. 

# a function to evluate a polynomial given its coeficients to plot the regression models 
polynomial = Vectorize(function(x, ps) { 
n = length(ps) 
sum(ps * x^(1:n-1)) 
}, "x") 

This function computes each predicted regression value for each input year and outputs a vector of the paired values [regression value, year]. Now for the complete plot with the regression curves.

# plot the temperature data and the regression models 
plot(gbr.historical$year, gbr.historical$data,type = 'l',xlab = 'Year', 
ylab = 'Temperature',main = 'UK temperatures 1900-2010') 
curve(polynomial(x, coef(lmTemp)), add = TRUE, col = 'red') # add the linear model  
curve(polynomial(x, coef(qdTemp)), add = TRUE, col = 'black', lty=2) # add the quadratic model 
curve(polynomial(x, coef(cuTemp)), add = TRUE, col = 'blue') # add the cubic model 
legend(1900, 9.7, legend = c("linear", "quadratic", "cubic"), lty = c(1,2,1), col = c('red','black','blue')) 


The regression lines help understand what is going on in this plot. There is a positive linear trend when the linear model is considered, but between 1945 and 1962 there is also a downward trend. The quadratic regression curve does not address this downward trend, but it reflects an acceleration in the upward trend after 1980. the cubic model fits the temperature data best. It also reflects the downward trend, indicating that this trend started near 1930 and became and upward trend again in 1960. While beyond the scope of this article, It would be an interesting project to study whether the reduction in industrial output from the Great Depression reversed any upward trend in temperatures and the resumption of production for World War II reinvigorated and accelerated that trend.

This analysis is repeated, this time looking at the historical temperature data for the United States over the same time span.  



This chart exhibits behavior similar to the UK plot above. The overall linear trend is positive and increases with the year. There is a downward trend that matches the one in the UK data. The overall temperature plot appears to be tighter [high and low temperatures closer together for each year] and the cubic regression model fits the plot better.

The example script repeats these steps for the historical precipitation data for the UK and the US. Here are the two historical precipitation charts with regression curves.






Both of these historical precipitation charts exhibit a small upward trend [linear regression]. Neither chart shows the obvious undulation that is visible in the historical temperature charts. An interesting observation regarding the chart for US historical precipitation is the similarity of the quadratic and cubic regression models. The UK precipitation data supports undulation in the cubic regression but the US data does not.

An extension of this exercise can be a good resource to study the issues associated with climate change. The World Bank data can compare different regions of the earth. Weather data can be combines with other World Bank data such as population, financial, or geographic. This exercise is simply a demonstration of what is available from the World Bank Open Data Repository. The references below include a more extensive tutorial of the rWBclimate package.  






References

World Bank Open Data

World Bank Climate Data

rWBclimate package documentation [web page] 

rWBclimate package documentation [pdf file]

rWBclimate tutorial











Return to the R Learning Infrastructure Home Web Page