IT'S A GIRL! Time Series Forecasting with Deep Learning
An exercise to sharpen my skills. Because, why not.
Got this data from the Deep Learning for Time Series Forecasting mini-crash course by Jason Brownlee. It's a dataset on the daily female births in the year 1959. The units refer to the count and there are a total of 365 observations. The source of the dataset is credited to Newton (1988).
The time series when plotted looks like the plot below. One glance gives a hint of seasonality as suggested by the oscillations in the graph. The timeseries looks non-stationary, but we'd have to be on the safe side of things and assume it's not.
ARIMA
This acronym is literally what it stands for:
- AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
- I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation from an observation at the previous time step) in order to make the time series stationary.
- MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
The convolutional neural network is a specialized type of neural network model designed for working with two-dimensional image data, but they can also be used with one-dimensional and three-dimensional data. Central to the convolutional neural network is the convolutional layer that gives the network its name. This layer performs an operation called a “convolution“.
In the context of a convolutional neural network, a convolution is a linear operation that involves the multiplication of a set of weights with the input, much like a traditional neural network. Given that the technique was designed for two-dimensional input, the multiplication is performed between an array of input data and a two-dimensional array of weights, called a filter or a kernel.
There are three types of layers in a Convolutional Neural Network:
- Convolutional Layers - comprised of filters and feature maps.
- Pooling Layers - down-sample the previous layers feature map
- Fully-Connected Layers - normal flat feed-forward neural network layer.
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.
Recurrent neural networks are different from traditional feed-forward neural networks.
"Recurrent networks … have an internal state that can represent context information. … [they] keep information about past inputs for an amount of time that is not fixed a priori, but rather depends on its weights and on the input data.
A recurrent network whose inputs are not fixed but rather constitute an input sequence can be used to transform an input sequence into an output sequence while taking into account contextual information in a flexible way."
— Yoshua Bengio, et al., Learning Long-Term Dependencies with Gradient Descent is Difficult, 1994.
The success of LSTMs is in their claim to be one of the first to overcome the technical problems of recurrent neural networks: vanishing gradients and exploding gradients.
I just used a single LSTM layer with 100 units in the hidden layer.
Best MAE: 5.002
Lots of TODOs here like more fine tuning. But, this will do for now! This blog is part of the mini crash course in Deep Learning at https://machinelearningmastery.com/ - really cool site.
Comments
Post a Comment