To use machine learning for time series forecasting there are many things to take care of. Some of these are common to machine learning. Others are specific to time series forecasting. To get started, let’s briefly go over the basics of building a machine learning model.
The general process for adding machine learning to your system is as follows:
- Define the problem that you’re trying to solve
- Check whether you have any data that can be used to solve that problem. If not, collect it
- Decide which machine learning models to try out
- Arrange your datasets
- Train your models finding the parameters either by careful inspection or a search strategy
- Evaluate the model
- Integrate the model with your system
- Have reports calculating the effect of the model on the relevant business metrics and show a comparison with respect to the earlier state of the system – whether it was an older model or no model at all
- Iterate on the model and continuously improve
But, due to the nature of data in time series forecasting, we need to be careful. First and foremost, when building machine learning into your time series forecasting model, you need to take precautions to ensure data persistence and accuracy. Take a look at some of the common things you should look out for:
Be careful in performing data splits for other problems also solved by machine learning. But, in the case of time series data, you need to make sure that we do the splits based on time.
During feature engineering, you need to take care of features based on time. Maybe business problems have seasonality that affect relevant business metrics. Maybe older data is just not relevant; then you’ll need to age your data properly. Maybe there are special days of the week or holidays that affect your business metrics. Even something as trivial as day-to-day weather could affect you.
When testing the chosen model we need to decide the testing strategy. Do we do a simple test by comparing predictions with actual values? Should we do a walk forward validation? If we do a walk forward validation then how do we scale in case datasets are large? By walk forward hourly or daily basis? We can decide the granularity based on our belief of the granularity’s effect on the changes in behavior. And, we need to test our beliefs by checking the model performance via A/B tests.
You need to be able to test your assumptions because of the precautions taken when developing a machine learning model for time series forecasting. We wrote some pseudo code to give you an idea about a possible structure of a test harness that can be used to test out various models.
An Example of Machine Learning for Time Series Forecasting
//pseudo code starts here
We'll be using this convention and process for the datasets:
- train -> dataset used to train the model
- valid -> dataset used to check the "goodness" of the model during parameter search
- test -> dataset used to check the final "goodness" of the selected model
def __init__(self, data_root, model_algo, model_parameter_search_space):
Here we'll initialize the data root from where data needs to be read in. We'll need to use some conventions to maintain a consistent structure of files so this test harness can be re-used with different input data (maybe for different dates etc.)
- a function that will return the model we need to train
- a dictionary containing the search space boundaries for the model
- the key would be the parameter
- the value can be different depending on how you would want to do the search. For example:
- a list containing the search values
- a function defining the search strategy
self.data_root = data_root
self.model_algo = model_algo
self.model_parameter_search_space = model_parameter_search_space
# ... Initialize any other parameter
This will read in the train, test datasets and maintain data frames in attributes. The train/valid split will be done in the train_model method.
The metric used to define the "goodness" of models during training process. This is used to search over the model parameter space.
The metric used to define the goodness of models at final test. This could be different or the same as the train metric. It's also possible to have this be specific to models based on what's easier for the model to optimize for
def train_model(self, model_algo, model_parameter_search_space):
This trains a model after searching through the search space. You might also push the intermediate models, the parameters, train metric value into a permanent storage. This could be used to debug things if you want
Particular care needs to be taken here to split the train/valid data in case of time series forecasting model. Don't do random splits, do time-based splits.
def save_model(self, model):
Save model, metric value in permanent storage for future reference and debugging.
def test_model(self, model):
This is used to evaluate the model. This can be as simple as checking your metric on the test set directly or doing a walk-forward test.
This is the main method used to define the workflow of this test harness. Pseudo code is given below:
trained_model = self.train_model(self.model_algo, self.model_parameter_search_space)
//pseudo code ends here
Important Metrics for Your Machine Learning Model
After reading through this code you should have some idea about how we can use a test harness to develop a machine learning model. Such a test harness can be used to quickly iterate over various algorithms of your choice.
I’d now like to cover a few other things that weren’t discussed earlier:
Metrics are important. Training metrics, testing metrics as well as business metrics. Just like well-defined incident management metrics need to be chosen carefully, choosing your ML model metrics correctly is required for building an effective machine learning model for time series forecasting.
While developing a model, you may want to have different metrics for training as well as testing.
- During training, you may want to use metrics that are easier to optimize for the particular machine learning algorithm.
- For testing, you would want something that can be compared across models. This metric should not be dependent on the algorithm, but should be dependent on your problem.
Sometimes you’ll want to add additional metrics during testing other than the primary testing metric. The other metrics could be used for reporting purposes. Those could help you to compare the various models that you tested based on different business concerns.
- When your model is not sure about something, saying “we’re not sure” might be more important.
- When a business wants to take preventive action, it might be more important to say something might have happened.
When talking about adding machine learning into a system, people talk about developing the models. They skip the part of actually using it in the system. You need to consider how you’ll integrate the model into your system before you start developing it. You need business metrics to decide whether adding machine learning is beneficial to your business or not. This metric has to be something that business people can look at and understand. This business metric cannot be technical numbers like accuracy, precision, latency, etc. This has to be something like product sales, user engagement, views, etc. – data relevant to business people.
For a machine learning model for time series forecasting, saving the following into permanent storage is required:
- Intermediate models
- Train, test metrics
- Parameters used to train the model
- Data used to train the model. It’s possible that the whole dataset cannot be stored again. At least make sure that references to it are stored in a manner that a re-run using the same parameters will result in the same data being pulled in.
The metrics need to be in a format that is readable by a machine. This way, they can be read via code and checked to see whether there were any problems.
Software engineers are used to versioning the code base. But, when adding machine learning in a system, you need to consider all of the above as items you need to version. Maybe you could create a bucket in S3 and version things using directories to represent different versions.
After designing your test harness, you can move over to testing various different models.
Effective use of machine learning behind time series data can improve engineering practices from CI/CD to incident response – giving people the information they need when they need it. Check out the Incident Management Buyer’s Guide to learn all about the other tools you need in order to level up your incident management practices.