Business statistics with Excel and Tableau (2015)
9. Time Series Regression Methods
The smoothing methods we worked in the previous chapter are fine when there is no evidence of seasonality, for example with the sheep data. However, smoothing has only limited use for longer term prediction andanalysis. Data that is more interesting for us as analysts may be non-linear in trend, and also have seasonal peaks and troughs. Using regression, we can measure the quantitative effect of seasonality and trend, either together or separately. The result is a model which can be used for prediction.
Below we will:
1. use regression to quantify a trend in a time series
2. introduce a quadratic term to account for non-linearity in the time series
3. use dummy variables to measure seasonality
9.1 Quantifying a linear trend in a time series using regression
The plot below shows life expectancy at birth for Canadians.
Canadian life expectancy
This is pretty much a straight line as one might expect in a developed country with a large per capita expenditure on health care. Note that this is for both sexes: for women only we might expect to see evenbetter figures. I ran the regression with Year as the independent variable, explaining life expectancy. The result is below.
Life expectancy regression results
Notice that the adjusted R-squared value is close to unity, reflecting
the straight line that we see on the graph. The coefficients for the intercept and the independent variable provide this estimated regression equation
^yt = 58.47 + 0.000565 ∗ Year
The meaning is that for every year after 1960 that a child was born, his/her life expectancy increased by 0.000565 years, or about 5 hours.
9.2 Measuring seasonality
The dataset for TV sales (from Modern Business Statistics by Anderson Sweeney and Williams) contains sales by quarter for four years. There are therefore sixteen observations.
I created a column which I called index so that we can see sales in a consecutive fashion. The first quarter is Spring and the last quarter is winter. It is apparent that quarters which are divisible by four are higher thanothers, and so forth. Perhaps sales are higher in winter?
We can use the dummy variable method of Chapter 5 to determine whether this is true and also the extent of the difference. We will create dummies for Summer, Fall and Winter, leaving Spring as the reference level.Recall that we have four possible states of the season variable, and so we will need k-1, or 4-1 = 3 dummies. It usually doesn’t matter which state you choose as your reference level: just don’t forget which one youpicked. The dataset is below, with a column called Index, which we’ll use to measure the time trend.
TV Sales with the quarterly dummies
I want to find out two things:
whether sales are increasing over time. I can do this by including the index as an explanatory variable. Because the regression tool requires that all the independent variable be in one block, I have copied and pastedthe Index column to the right.
The regression output is
TV Sales regression out
Let’s walk through this output line by line. First notice that the p values for all independent variables are smaller than 0.05. Therefore all are significant.
The reference level is Spring, and therefore there is no coefficient for this quarter. Summer has a negative sign in front of its coefficient, meaning that sales for Summer are smaller than those for Spring. Fall is positive,so more TV sales are sold in the Fall than in the Spring. Not unexpectedly, Winter has the largest coefficient of all,
reflecting the plot. The index has a positive sign, meaning that average TV sales are increasing with time.
There are therefore two components in this series: a trend and a seasonal component. Youtube¹
** Making predictions** from these results. Let’s predict the sales for Fall in seven quarters time. This is
yˆ = 4.7 + 1.05875 + 7 ∗ 0.147 = 6.78
The observed value was 6.8, so the prediction was reasonable if slightly pessimistic.
9.3 Curvilinear data
The data above followed a linear trend, which is perfect for ordinary least squares, which assumes that the relationship between the dependent variable and the independent variable is linear. But some interesting datadoes not follow a neat linear trend. The plot below shows crude oil and gas prices over the period 1949 to 2003.
Crude and gas prices showing crude price curvilinearity
However, if we show the two series on the same graph, as on the plot I did in Tableau, with separate y axes for each type of fuel, we can see that the shapes are very close.
Crude and gas on different axes
The curvilinearity of crude prices is clear. The prices came to a peak
in the 1970s as a result of OPEC’s decision to restrict supply. In any event, crude prices cannot be described as linear. The solution is to add an extra term to the regression of price on time, and that is time squared. Thefirst few lines of the dataset are below. I have added a column called to represent the year, and added tsq which is just t squared.
Data with time squared
Now regress crude against time and time squared, and the pleasing result below appears
The curvilinear regression for crude
The adjusted r-squared is high at 0.88 and the two time predictors are highly significant statistically. Notice that the two predictors are quite different in size and also have opposite signs. When t is small, then the variablet dominates and the trend is upward. However, as t gets larger, then t-squared gets even larger still. The negative sign on t-squared pulls the regression line down.