Moving from Univariate to Multivariate Time Series Prediction with LSTM: Addressing Optimization Challenges


Long Short-Term Memory (LSTM) neural networks have become a powerhouse in the world of time series prediction. Many of us start our journey with LSTMs by creating models for single-variable forecasting, often yielding impressive results in straightforward prediction scenarios. However, as I venture into the realm of complex multivariate time series data, I encounter a unique set of challenges.

The primary challenge arises when I attempt to extend my trusted univariate models to handle multivariate data. Unfortunately, my efforts often fall short of delivering the expected improvements across all output variables. The crux of this issue mostly lies in my choice of the loss function used for training.

In this brief blog post, I will dive into the common hurdle faced when transitioning from univariate to multivariate LSTM models. I'll place particular emphasis on the pivotal role of the loss function and propose a comprehensive solution. My solution involves making necessary adjustments to the model architecture, loss function, and the training process, all aimed at enhancing predictions for every output variable.

The Challenge: Univariate to Multivariate Transition


The challenge at hand revolves around applying the same methodology used for univariate models to the multivariate world. While I can use the same LSTM architecture, it doesn't guarantee success when dealing with multiple output variables. The issue arises when not all output variables show the desired improvement in prediction accuracy.

The main culprit behind this discrepancy in performance is the choice of the loss function. In univariate models, we have a single output variable as the target, and a standard loss function, such as Mean Squared Error (MSE), often works well. However, in multivariate cases, using the same loss function doesn't effectively account for the nuances of each output variable. Consequently, some variables may exhibit improved predictions, while others remain suboptimal.
 
To overcome this challenge and boost the accuracy of predictions for every output variable in multivariate time series data, I need to introduce several crucial adjustments:

1. Model Architecture

    I start by modifying the LSTM model's architecture. Instead of using a single output layer for all variables, I opt for an output layer dedicated to each variable. This significant change empowers the model to learn individual patterns for each output variable, ultimately leading to more accurate predictions.

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        out, _ = self l.s.t.m (x)
        out = self.fc(out)
        return out


2. Loss Function


The next critical modification involves the loss function. Instead of computing a single loss for all output variables, I calculate per-variable losses. This tailored approach allows me to assess the model's performance on each variable independently.

criterion = nn.MSELoss(reduction='none')  # Set reduction to 'none' to obtain per-sample losses

3. Training Process


During the training phase, I derive the total loss as the sum of individual losses. By optimizing this total loss, the model gains the ability to improve predictions for every output variable.

Original Training Loop:


for epoch in range(num_epochs):
    outputs = model(scaled_train_features.view(-1, 1, input_size))  
    loss = criterion(outputs, scaled_train_target)  
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


Revised Training Loop:


for epoch in range(num_epochs):
    outputs = model(scaled_train_features.view(-1, 1, input_size))
    loss = criterion(outputs, scaled_train_target.view(-1, 1, output_size))  # Per-variable
    total_loss = torch.sum(loss, dim=2).mean() # Calculate the total loss as the sum of individual losses
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

 

Example Plots

Here are some plots from my training. Sorry for the horrible coloring.

Before:

epoch 1:
 
 epoch 10:

After:

epoch 1:
epoch 10:

Conclusion

I've highlighted the crucial role played by the loss function in this challenge and provided a practical solution that involves adjustments to the model architecture, loss function, and the training process. These refinements significantly enhance the accuracy of predictions for every output variable. This approach equips me to harness the full potential of LSTMs for intricate time series forecasting tasks, delivering more reliable results and unlocking new possibilities for data-driven decision-making.