IT'S A GIRL! Time Series Forecasting with Deep Learning

 An exercise to sharpen my skills. Because, why not. 

Got this data from the Deep Learning for Time Series Forecasting mini-crash course by Jason Brownlee. It's a dataset on the daily female births in the year 1959. The units refer to the  count and there are a total of 365 observations. The source of the dataset is credited to Newton (1988).

The time series when plotted looks like the plot below. One glance gives a hint of seasonality as suggested by the oscillations in the graph. The timeseries looks non-stationary, but we'd have to be on the safe side of things and assume it's not. 


Before I dive straight into deep learning, it would be better to check out simple models first and use it as our baseline model mainly because, well, sometimes you don't have to use complicated models to solve something that can be interpreted my a simple model. It saves a lot of energy, both mine and my laptop's. So for this post, I'll be using a simple ARIMA model (just so we can get away with a possible non-stationary data) as baseline. Non-stationarity can be visualized by their residuals. But I won't be getting into that here.

ARIMA

A popular and widely used statistical method for time series forecasting is the so-called ARIMA model. ARIMA stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data.

This acronym is literally what it stands for:

  • ARAutoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
  • IIntegrated. The use of differencing of raw observations (e.g. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
  • MAMoving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
A quick look into the autrocorrelation plot of the series hinted that a good lag to start with is 3 (which by the way was suggested in the mini course. Aha!).

Using the ARIMA model with a  order 3, degree of differencing 1, and MA order of 0, I arrive at the following MAE aka the MAE to beat.

Base Model MAE to beat : 5.556

So yeah, that's the MAE to beat! Note that I did not do any grid search on this yet. So that's a TODO for me. 

Time Series Forecasting Using Deep Learning Methods

I'll be exploring three deep learning models: Multilayer Perceptron model, Convolutional Neural Network, and the Long-Short Term Memory Neural Network. I used the KERAS (Open Source Neural Network library written in Python that runs on top of Theano or Tensorflow) library.

I will only be saving the best model, meaning the epoch with the least MAE, in all my runs using callbacks in Tensorflow. callback is a powerful tool to customize the behavior of a Keras model during training, evaluation, or inference.

I first divided the dataset into 70% training and 30% test set, being careful this time, as I am dealing with timeseries data, where the order matters. So the test set should always be at a later timestep than the training set. I used Adam as my optimizer, Huber to compute the loss. I have also chosen the metric as MAE, as always, due to its interpretability.


1. Multilayer Perceptron model
The code looks a bit like this. I used a simple fully connected neural network with 100 units. 

Best MAE: 5.136


Here's timeseries plot with the train (green) and test (orange) predictions.  


2. Convolutional Neural Network

The convolutional neural network is a specialized type of neural network model designed for working with two-dimensional image data, but they can also be used with one-dimensional and three-dimensional data. Central to the convolutional neural network is the convolutional layer that gives the network its name. This layer performs an operation called a “convolution“.

In the context of a convolutional neural network, a convolution is a linear operation that involves the multiplication of a set of weights with the input, much like a traditional neural network. Given that the technique was designed for two-dimensional input, the multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel. 

There are three types of layers in a Convolutional Neural Network:

  • Convolutional Layers - comprised of filters and feature maps.
  • Pooling Layers - down-sample the previous layers feature map
  • Fully-Connected Layers - normal flat feed-forward neural network layer.

Best MAE: 5.122

As always, here's how the plot looks like. Red for test, blue for training.

3. Long Short Term Memory Neural Network

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. 

Recurrent neural networks are different from traditional feed-forward neural networks. 

"Recurrent networks … have an internal state that can represent context information. … [they] keep information about past inputs for an amount of time that is not fixed a priori, but rather depends on its weights and on the input data.

A recurrent network whose inputs are not fixed but rather constitute an input sequence can be used to transform an input sequence into an output sequence while taking into account contextual information in a flexible way."

— Yoshua Bengio, et al., Learning Long-Term Dependencies with Gradient Descent is Difficult, 1994.

The success of LSTMs is in their claim to be one of the first to overcome the technical problems of recurrent neural networks: vanishing gradients and exploding gradients.

I just used a single LSTM layer with 100 units in the hidden layer. 

Best MAE: 5.002


Here's how the plot looks like. Red for test, blue for training.


If I'd have to be very technical about it, the clear winner would be LSTMs. BUT, all the other MAEs are not that far behind either. 

Moral of the story: even the simplest model, can be really cool. 

Lots of TODOs here like more fine tuning. But, this will do for now! This blog is part of the mini crash course in Deep Learning at https://machinelearningmastery.com/ - really cool site.

Comments

Popular Posts