Better Business Decisions from Data: Statistical Analysis for Professional Success (2014)
Part VI. Forecasts
Chapter 19. Time Series
Yesterday Rain, Today Rain, Tomorrow . . . ?
One of the most difficult areas of forecasting is in dealing with time series. Our profits have been x, y, and z over the past three years, so what will they be next year? Unfortunately, we might say, this is perhaps the area where forecasting is most necessary in the business and commercial world. In the United Kingdom, and probably elsewhere, documentation involving the sale of shares and other investment products has to carry the warning “past performance is no guide to future performance.”
The problem is essentially to forecast data relating to the next time period from knowledge of the data from previous time periods. In Chapter 14, I explained the technique of regression analysis applied to relationships between two variables, and showed how a mathematical expression could be derived to describe the relationship. A time series can be treated as a relationship between two variables, and a mathematical relationship can be obtained using this technique. The relationship can then give predicted values for future times.
There are a number of problems associated with this approach. First, we will be extrapolating a correlation beyond the range of values within which it has been found to apply. Of course, as I have previously said, we have to extrapolate one way or another, and we are really looking for the least undesirable procedure. The second problem is that we are fitting the data to a straight line or a smooth curve with no justification. This is very different from the use of these correlation methods in establishing relationships between basic physical properties, which commonly vary smoothly with each other in line with well-defined laws. The third problem is that we have no way of knowing for certain how accurate our forecast will be.
With regard to the accuracy of our forecast, we can argue that provided our extension into the future is modest, in relation to the extent of our past data, the error is unlikely to be great. We can also put a value on the maximum possible precision of the forecast values. We will know the reliability of the correlation, and this is a measure of the precision of the correlation’s estimate of the existing data points. The estimate of the future data points cannot be more precise than this, so we have a measure of the best level of precision to be expected. Put another way, the correlation cannot be better at forecasting future values than it is at predicting the known past values.
If you look back at the time series discussed in Chapter 14 and shown plotted in Figure 14-10, you will recall that a simple linear regression yielded a correlation coefficient of r = 0.70. By using moving averages, the correlation coefficient was increased to 0.99. This indicates strong evidence for the rising trend indicated by the data, but the rising trend alone is of little value in forecasting the monthly performance in the short term. In the long term, there is the problem of having to extrapolate well beyond the range of the existing data.
Autocorrelation provides a means of examining whether there are correlations between data from different times in the past. Pairs of data from different times are selected and compared. If the comparison shows a significant relationship, then there is evidence that the past values can be used to forecast future values.
We would expect the daily temperature to bear some relation to the temperature of the previous day. A correlation between the two over a number of days would provide us with a basis for forecasting the next day’s temperature. It would not be perfect but, it would have a degree of success. If we considered the average monthly temperature and produced a correlation with the value for the corresponding month of the previous year, we would have better success. Indeed, this is the approach used in setting up projected monthly temperatures for various locations.
Of course, not everything that we have to deal with has the repeatability of weather and climate, but recognizable cyclic variations are not uncommon in data relating to business activities. We can use the following data showing the monthly profits of a company, in thousands of dollars, say, to illustrate a practical application. The data are plotted in Figure 19-1(a).
Figure 19-1. An example of autocorrelation
A correlation between each value and the value for the previous month can be obtained by a simple linear regression analysis of the following two sets of data:
The equation of the regression line, following the procedure from Chapter 14, is calculated to be
Current = –0.45 × Previous + 2.16
and is shown plotted in Figure 19-1(b). The predicted value for the next month—January of the following year—is obtained by inserting the value for December of 1.6 for the previous month. The prediction is 1.4. This would not be reliable, because the correlation coefficient for the data is –0.49, which is not significant at the 5% level. (See Chapter 14 for a selection of significance levels for the product moment correlation coefficient.) That is to say, the gradient of the regression line, –0.45, is not significantly different from zero.
The possibility of a seasonal effect can be examined by using the three-month previous values. The two sets of data are now as follows:
The equation of the regression line is calculated to be
Current = 0.79 × Previous + 0.33
which, with the value of 1.6 for the previous month (December), gives a predicted value of 1.6 for the next month (January). The data is shown in Figure 19-1(b). The correlation coefficient is 0.83, which is significant at the 1% level. Clearly this is a better forecast than the previous one.
In Chapter 14, we showed an example of a time series and described how the use of a moving average has a smoothing effect on the shape of the graph. If we assume that the undulations in the graph are due to random effects, rather than meaningful effects, we could decide that the graph of moving averages would provide a means of forecasting future values. To achieve some improvement, we could then take the view that the more recent data points are more relevant in predicting the future than the older ones. We could therefore apply a weighting procedure in calculating the moving averages. This thinking leads us to the method of exponential smoothing.
In exponential smoothing, the forecast values for the time periods are calculated successively starting with the earliest. The value for each next period is obtained by adding a proportion, α (Greek letter alpha), of the current value to a proportion (1 – α) of the previous similarly produced forecast value. The proportion α lies between 0 and 1: a proportion of 1 leaves the current value unchanged, and a proportion of 0 replaces the whole of the value with the previous value. The formula is
Ft+1 = αDt + (1-α)Ft
where Ft+1 = forecast for the next period
Ft = previous forecast applied to current period
Dt = current actual value
α = weighting factor
Because each forecast depends directly on the previous forecast, it consequently depends on all previous forecasts, though the dependency is greater the more recent the forecast.
The following is a small example, using a weighting factor of 0.2, to illustrate the procedure. Supposed sales figures are shown for six successive periods. Also shown is the forecast value for the seventh period.
The error in each period is the difference between the actual sales figure, Dt, and the forecast, Ft, which was calculated from the previous sales figure. The overall error is usually quantified by the mean squared error.
In carrying out this procedure, we had to make two choices. First, we had to decide on the weighting factor. A large value gives more weight to recent sales, whereas a small value gives more weight to earlier sales. Second, because we had no previous forecast value, we had to decide what to use as the first value for Ft. The example used the value 50, this being the actual sales figure in the previous period.
To achieve an acceptable forecast, the overall error needs to be minimized; but with two somewhat arbitrary choices to be made, it is not easy to achieve this manually. There are, of course, computer programs readily available that can rapidly run through a range of scenarios to work toward a minimum mean-squared error.
The method as described is referred to as singleexponential smoothing, just one weighting factor being used. The method works well when the data are approximately constant as time progresses and the up and down variations are random. In many situations, however, the data points will show a trend, either increasing or decreasing with time. Doubleexponential smoothing is then required. A second constant, β (Greek letter beta), is introduced to adjust for the trend in each previous interval. The first smoothing constant is applied to the trend-adjusted values in a way similar to that of single exponential smoothing.
In addition to trend, time series often show periodic variation which could be daily, monthly, seasonal or annual. To include effects of periodic variation, a third smoothing constant, γ (Greek letter gamma), can be included to give tripleexponential smoothing.
Exponential smoothing is essentially a trial-and-error procedure but is readily dealt with by the computer software that is available. It is worth pointing out, however, that there are a number of variants of the method, so not all computer programs produce the same results.
Notice that in exponential smoothing, unlike regression, no regard is taken of the expected shape of the fitted curve. The forecast is in essence based on the most recent value modified in accordance with how well each previous value would have forecast the next one in the series.
Lawton Plumbing Supplies was located in an industrial park on the edge of the town. It was a small business run by the owner, Bill Lawton, supplying tools and plumbing consumables to local tradesmen and DIY enthusiasts. Kitchen and bathroom accessories were also stocked for sale to the general public.
Rising prices of copper and brass seriously affected the value of inventory, and much of it had become slow-moving because of the trend toward greater use of plastic pipework and fittings. Bill was nevertheless conscious of the need to retain his customers by always having what they needed in stock. He realized that his inventory control and forward-ordering practices were a mess and needed sorting out.
He talked to various colleagues about the matter, and it was suggested to him that he should spend some time examining his sales records and employ a rational routine guided perhaps by some form of time series analysis.
Armed with a book on statistics from the local library, Bill studied the possibilities. Because of the trend of decreasing copper and brass sales, and the increasing trend of plastic sales, he decided that exponential smoothing seemed to be useful. It promised the ability to deal with random fluctuations and an underlying trend. There might also be a benefit from the incorporation of a cyclic variation, because sales of pipework increased in the winter when many householders suffered from frozen pipes and central-heating faults.
At this stage, he needed help. Through his many contacts, he located a local IT expert who ran a computer repair business. For a modest fee, Bill had a suitable package installed on his computer and several short tutorial sessions.
Bill became quite fascinated by the process and used the technique to analyze sales records for much of his stock. He appreciated that the benefits would not be immediate but would improve with time, although it was quickly apparent that the system was recognizing the trends he was most concerned about. He was also shrewd enough to understand that no statistical analysis was going to give precise answers and that his practical experience in the business would still be required. Customer retention would always demand that safety margins be incorporated in his forward planning.