How to do a Quick and Basic Time Series Analysis with ARIMA

5 min readSep 7, 2020

Time Series are found everywhere, but how do you know you have one? What would you do with?

It’s really simple to see! You have a time series when you are working with data that is tracked by equally spaced time frames. A lot of people commonly thing of prices, but you can have time series when looking at birthrates, temperatures, heart rates and more. Now that we know how to identify a time series, we can do an analysis to try to forecast whatever we are measuring. To do our basic analysis we are going to do only 5 simple steps: Visualize, Transform, Configure, Model, Forecast.

Visualization

The visualization step is self explanatory; just take a look at the series by graphing the data with time as the independent variable and your predictor as the dependent. We are looking at the data, not only because you should always visualize your data and do some EDA, but because you’ll probably be able to notice some patterns, like trends, seasonality, or randomness. Whether or not you see them, its time for transforming the data.

Transformation (Stationarity)

The transformation step is where we test for whether or not our data is stationary and deal with it appropriately. A series is either stationary or non-stationary, meaning is the series doesn’t have or does have a unit root. It’s a confusing sentence, but to clarify, you don’t want a unit root, which means you have stationary data.

Your data is non-stationary when you can see the mean, variance, or covariance are not constant. A quick example using the prices of an ETF is shown below.

The non-stationary image doesn’t have constant mean because if we were to split up the series in half, the means of the two splits would be different. To fix this, I subtract each data point by the previous data point, which implies the series starts one period ahead of the original. This method is called differencing and I did it simply by using a build in Pandas function:

data['Stationary'] = np.log(data['Open']).diff(periods=1)[1:]

Moreover, you can see I also took the log of the data, and that is because it seemed to me the data had unequal variances. This is one of the many ways to deal with unequal variance. The new data doesn’t seem perfect, but that’s the basics of the transforming series appropriately.

Configure

The next steps is the configuration stage. Many know this as looking at the Auto Correlation Function (ACF) and the Partial Auto Correlation Function (PACF). The ACF is looking at the correlation from the previous term in the series while the PACF looks at the correlation with the entire series vs the period of lag itself. Here’s a great resource for going into deeper depths.

The importance of the ACF and PACF is from being able to pick out the number of autoregressor and moving average terms for making a model. These two terms come from the theory that the time series’s formula is made up of different parts of an autoregressor (next value somewhat based on previous) plus the moving average (the average of the noise in a model in a prior time and current time) plus the noise. A time series can have any number of these and so in order to reiterate, we look at the ACF and PACF plots.

From looking at these plots of the stationary series from above, we can see that the we would only need 1 AR term and 1 MA term for our model. To give a little understanding on the graph, the blue area is the range for which the period is not significant. Since all the lag terms except for the first are not significant, then we know to only use 1 of each term. Again this resource is great to dig into.

Model

After picking the terms, it’s time to make the model! The model to make is the ARIMA model, which you can do it a lot of programs, but here’s the code for how how to do it in Python using the statsmodels library.

from statsmodels.tsa.arima_model import ARIMAmodel = ARIMA(np.log(data['Open']), order = (1,1,1)).fit()
model.summary()

I gave the function the logarithmic data and the order parameter (p,d,q) representing how many terms of AR, periods, and terms MA respectively. We can see the AR and MA terms near the bottom of the results have the P > |z| of 0.000 meaning it is significant. We can try other order terms and compare different models by looking at the AIC and/or BIC like how we do with other models.

Forecast

The final step after making the model is to forecast. It is key to remember if you applied any transformations to your data, then you have to transform it back. In this case, I took the logarithmic and differenced my data. The ARIMA method took care of the differencing so all I need to do is take the exponential of my forecast data.

forecast = model.forecast(10)
np.exp(forecast[0])---array([57.77318638, 57.79870571, 57.82646788, 57.85638661, 57.88837926,
       57.92236671, 57.95827314, 57.99602592, 58.03555545, 58.07679506])

Conclusion

There’s the steps to doing a basic time series analysis. Some steps that were not in this post was the need to splitting data and testing checking for error. If I had done the split, I also would’ve use Root Mean Square Error in cents as the measure for error. There’s also other models like SARIMA, SARIMAX, and FBProphet that you should definitely check out if you want to explore time series!