Understanding PLS : A Computational Walkthrough

April 04, 2025

Understanding PLS : A Computational Walkthrough

pls_explanation

I used PLS some time ago recently wanted to improve on that work. Therefore I needed to better understand it. I made some readings;

PLS Mathematical Understanding Roadmap

1. Core PLS Algorithm (NIPALS)

Article: “The NIPALS algorithm for PLS” by Wold et al. (about 5-7 pages)
Focus: Step-by-step walkthrough of the iterative algorithm

2. Loading and Score Vectors

Chapter 2 in “PLS Path Modeling with R” by Sanchez (10-15 pages)
Focus: Mathematical interpretation of loadings and scores

3. Deflation Procedures

Section 3 in “PLS Regression Methods” by Abdi (4-6 pages)
Focus: Matrix deflation and orthogonalization techniques

4. Prediction Equations

Article: “A mathematical view of PLS regression formulas” by Mevik & Wehrens (7-8 pages)
Focus: Derivation of the final prediction equations

5. Various PLS Variants

Paper: “A survey of partial least squares (PLS) methods, with emphasis on the two-block case” by Rosipal & Krämer (specific sections 2.1-2.3, about 10 pages)
Focus: Mathematical differences between PLSR, SIMPLS, and other algorithms

6. Optimization Perspective

Article: “PLS viewed through the lens of convex optimization” in Journal of Machine Learning Research (sections 2-3, about 8 pages)
Focus: Understanding PLS as an optimization problem

for it and briefly explaining how I think it works in computational mechanic standpoint.

What is PLS and Why Do We Need It?

PLS (Partial Least Squares) addresses a common problem in data science: how to handle situations where:

You have many predictor variables (X) that are highly correlated
You want to predict one or more response variables (Y)
The number of observations is potentially smaller than the number of variables

Traditional regression methods like OLS (Ordinary Least Squares) break down in these scenarios due to multicollinearity. PLS solves this by finding a low-dimensional representation that captures the essential relationships between X and Y.

Key Components of PLS

Let’s break down the key variables and components involved in PLS modeling:

Input Matrices

X: Matrix of predictor variables (n × p), where n is the number of observations and p is the number of predictor variables
Y: Matrix of response variables (n × q), where q is the number of response variables

Latent Variables

T: X-scores (n × a), where a is the number of components
U: Y-scores (n × a)

Loadings

P: X-loadings (p × a)
Q: Y-loadings (q × a)

Weights

W: X-weights (p × a), used to calculate the scores

Inner Relationship

B: Diagonal matrix of regression coefficients between U and T

The Computational Process of PLS

Now, let’s walk through how PLS works computationally:

1. Initialization

We start by standardizing both X and Y matrices (mean-centering and often scaling to unit variance):

\[X_0 = X - \mathbf{1}\bar{X}^T\] \[Y_0 = Y - \mathbf{1}\bar{Y}^T\]

Where \(\mathbf{1}\) is a vector of ones and \(\bar{X}\) and \(\bar{Y}\) are the means of each column.

2. The NIPALS Algorithm (for each component a = 1, 2, …, A)

Step 1: Initialize u

We start by selecting an initial value for the Y-score vector u, typically the first column of Y:

\[\mathbf{u}_a = \mathbf{Y}_{a-1}[:,1]\]

Step 2: Calculate X-weights

The weights represent the direction in X-space that has maximum covariance with u:

\[\mathbf{w}_a = \frac{\mathbf{X}_{a-1}^T \mathbf{u}_a}{||\mathbf{X}_{a-1}^T \mathbf{u}_a||}\]

The normalization ensures that the weights have unit length.

Step 3: Calculate X-scores

The X-scores are a projection of X onto the weights:

\[\mathbf{t}_a = \mathbf{X}_{a-1} \mathbf{w}_a\]

Step 4: Calculate Y-weights

The Y-weights explain the direction in Y-space most associated with t:

\[\mathbf{q}_a = \frac{\mathbf{Y}_{a-1}^T \mathbf{t}_a}{||\mathbf{Y}_{a-1}^T \mathbf{t}_a||}\]

Step 5: Update Y-scores

The Y-scores are a projection of Y onto the Y-weights:

\[\mathbf{u}_a^{new} = \mathbf{Y}_{a-1} \mathbf{q}_a\]

Step 6: Check convergence

If \(\mathbf{u}_a^{new} \approx \mathbf{u}_a\) (within some tolerance), continue to step 7. Otherwise, set \(\mathbf{u}_a = \mathbf{u}_a^{new}\) and return to step 2.

Step 7: Calculate X-loadings

The X-loadings represent how the original X variables load onto the component:

\[\mathbf{p}_a = \frac{\mathbf{X}_{a-1}^T \mathbf{t}_a}{\mathbf{t}_a^T \mathbf{t}_a}\]

Step 8: Calculate inner relationship

The inner relationship links the X-scores to the Y-scores:

\[b_a = \frac{\mathbf{u}_a^T \mathbf{t}_a}{\mathbf{t}_a^T \mathbf{t}_a}\]

Step 9: Deflate X and Y

We remove the explained variance from both X and Y:

\[\mathbf{X}_a = \mathbf{X}_{a-1} - \mathbf{t}_a \mathbf{p}_a^T\] \[\mathbf{Y}_a = \mathbf{Y}_{a-1} - b_a \mathbf{t}_a \mathbf{q}_a^T\]

Step 10: Store results and iterate

Store the vectors \(\mathbf{w}_a\), \(\mathbf{t}_a\), \(\mathbf{p}_a\), \(\mathbf{q}_a\), \(\mathbf{u}_a\), and \(b_a\) and continue to the next component.

The Meaning Behind Each Component

Let’s understand what each calculated variable actually means:

X-weights (w)

What they are: Directions in the X-space that maximize covariance with Y-scores
Why they matter: They determine how the original variables are combined to form the PLS components
Similarity to other models: Unlike PCA weights which maximize variance in X alone, PLS weights take Y into account

X-scores (t)

What they are: Projections of X onto the weight vectors
Why they matter: They represent the new, compressed variables that capture the most relevant information in X for predicting Y
Similarity to other models: Similar to principal components in PCA, but optimized for prediction rather than just explaining variance

X-loadings (p)

What they are: Coefficients that express how the original X variables contribute to each component
Why they matter: Help interpret what each PLS component represents
Similarity to other models: Analogous to loadings in PCA but oriented toward prediction capability

Y-weights (q)

What they are: Directions in Y-space that have maximum covariance with X-scores
Why they matter: They show how response variables relate to the latent components
Similarity to other models: No direct analog in PCA; unique to two-block methods like PLS

Y-scores (u)

What they are: Projections of Y onto the Y-weight vectors
Why they matter: They represent a compressed version of Y that is maximally correlated with the X-scores
Similarity to other models: No direct analog in simpler models

Inner Relationship (b)

What it is: Regression coefficient between X-scores and Y-scores
Why it matters: Quantifies how strongly each X-component predicts its corresponding Y-component
Similarity to other models: Conceptually similar to regression coefficients, but operates in the latent space

How Flow Chart Looks Like

Inference in PLS: How New Data Flows Through the Model

When we want to use a trained PLS model to make predictions on new data, the data follows a specific path through the model. Let’s walk through this process step by step:

Step 1: Data Preparation

First, any new data must be preprocessed in exactly the same way as the training data: - Mean-centering using the training data means - Scaling (if applied during training) using the training data standard deviations

This ensures that the new data exists in the same mathematical space as the training data.

Step 2: Projection onto Latent Space

The centered (and possibly scaled) new data is projected onto the PLS components: - We calculate the scores for the new observations using the weight matrices from training - This projection transforms the original variables into the latent variables

The key equation is: \[\mathbf{T}_{new} = \mathbf{X}_{new,centered} \mathbf{W}(\mathbf{P}^T\mathbf{W})^{-1}\]

Where: - \(\mathbf{X}_{new,centered}\) is the centered new data - \(\mathbf{W}\) is the weight matrix from training - \(\mathbf{P}\) is the loading matrix from training - The term \((\mathbf{P}^T\mathbf{W})^{-1}\) adjusts for the non-orthogonality of the weights

Step 3: Prediction in Latent Space

Once we have the scores, we use the inner relationship to predict the response: - The scores are multiplied by the diagonal matrix of regression coefficients \(\mathbf{B}\) - This gives us the predicted scores in Y-space

Step 4: Transformation Back to Original Space

Finally, we transform the predictions back to the original Y-space: - The predicted scores are multiplied by the Y-loadings - We add back the Y means to get the final predictions

The complete prediction equation is: \[\mathbf{Y}_{pred} = \mathbf{1}\bar{\mathbf{Y}}^T + \mathbf{T}_{new}\mathbf{B}\mathbf{Q}^T\]

Where: - \(\bar{\mathbf{Y}}\) is the mean vector of the response variables - \(\mathbf{T}_{new}\) are the scores for the new data - \(\mathbf{B}\) is the diagonal matrix of regression coefficients - \(\mathbf{Q}\) is the matrix of Y-loadings

This entire process can be combined into a single regression-like equation: \[\mathbf{Y}_{pred} = \mathbf{1}\bar{\mathbf{Y}}^T + \mathbf{X}_{new,centered} \mathbf{W}(\mathbf{P}^T\mathbf{W})^{-1}\mathbf{B}\mathbf{Q}^T\]

The beauty of PLS is that this seemingly complex transformation actually simplifies the prediction process by focusing on the most relevant aspects of the data.

Walk-Through with a Concrete Example

Let’s trace the flow of data through a PLS model using a simple example with actual numbers. We’ll use a small dataset with: - 5 observations - 3 predictor variables (X) - 2 response variables (Y)

Step 0: Our Raw Data

Here’s our raw data:

X matrix (5×3):

X = [
    [4.0, 2.0, 0.0],
    [2.0, 5.0, 1.0],
    [7.0, 3.0, 2.0],
    [3.0, 4.0, 1.5],
    [6.0, 1.0, 0.5]
]

Y matrix (5×2):

Y = [
    [9.0, 5.0],
    [7.0, 6.5],
    [15.0, 9.0],
    [8.5, 7.0],
    [11.0, 4.5]
]

Step 1: Mean-Centering the Data

First, we calculate the means:

X_means = [4.4, 3.0, 1.0]
Y_means = [10.1, 6.4]

Then we center the data:

X₀ (centered X):

X₀ = [
    [-0.4, -1.0, -1.0],
    [-2.4, 2.0, 0.0],
    [2.6, 0.0, 1.0],
    [-1.4, 1.0, 0.5],
    [1.6, -2.0, -0.5]
]

Y₀ (centered Y):

Y₀ = [
    [-1.1, -1.4],
    [-3.1, 0.1],
    [4.9, 2.6],
    [-1.6, 0.6],
    [0.9, -1.9]
]

Step 2: Extract the First Component (a=1)

Initialize u₁

We use the first column of Y₀:

u₁ = [-1.1, -3.1, 4.9, -1.6, 0.9]

Calculate X-weights (w₁)

w₁ = [0.733, -0.456, 0.504]

Calculate X-scores (t₁)

t₁ = [-0.3412, -2.6712, 2.4098, -1.2302, 1.8328]

Calculate Y-weights (q₁)

q₁ = [0.952, 0.307]

Calculate new Y-scores (u₁)

u₁ = [-1.454, -2.900, 5.622, -1.337, 0.069]

Calculate X-loadings (p₁)

p₁ = [0.974, -0.552, 0.068]

Calculate inner relationship (b₁)

b₁ = 1.317

Deflate X and Y

X₁ = [
    [-0.068, -1.188, -0.977],
    [0.202, 0.526, 0.182],
    [0.253, 1.330, 0.836],
    [-0.202, 0.321, 0.584],
    [-0.185, -0.988, -0.625]
]

Y₁ = [
    [-0.672, -1.262],
    [0.250, 1.180],
    [1.878, 1.626],
    [-0.057, 1.097],
    [-1.398, -2.641]
]

Step 3: Extract the Second Component (a=2)

We repeat the same process with the deflated matrices X₁ and Y₁. For brevity, I’ll just provide the final results:

w₂ = [0.106, 0.813, 0.581]
t₂ = [-1.515, 0.564, 1.587, 0.545, -1.180]
p₂ = [0.106, 0.813, 0.581]
q₂ = [0.544, 0.839]
b₂ = 1.551

Step 4: Making Predictions

Now, let’s say we have a new observation:

X_new = [5.0, 2.5, 1.2]

Center the new data:

X_new_centered = [0.6, -0.5, 0.2]

Calculate scores:

T_new = X_new_centered × W(PᵀW)⁻¹

where:

W = [
    [0.733, 0.106],
    [-0.456, 0.813],
    [0.504, 0.581]
]

P = [
    [0.974, 0.106],
    [-0.552, 0.813],
    [0.068, 0.581]
]

Computing this:

T_new = [0.7686, -0.0074]

Predict Y:

Y_pred = Y_means + T_new × B × Qᵀ

where:

B = [
    [1.317, 0],
    [0, 1.551]
]

Q = [
    [0.952, 0.307],
    [0.544, 0.839]
]

Computing:

Y_pred = [11.058, 6.701]

So our PLS model predicts Y values of [11.058, 6.701] for the new X observation [5.0, 2.5, 1.2].

Interpretation of Our Example

In this example:

The first component (accounting for the largest covariance between X and Y):
- Is strongly positively influenced by X variable 1 (weight 0.733)
- Is negatively influenced by X variable 2 (weight -0.456)
- Is positively influenced by X variable 3 (weight 0.504)
- Strongly predicts the first Y variable (loading 0.952) and moderately predicts the second Y variable (loading 0.307)
The second component (capturing remaining covariance):
- Is dominated by X variables 2 and 3 (weights 0.813 and 0.581)
- Has a smaller contribution from X variable 1 (weight 0.106)
- Predicts both Y variables (loadings 0.544 and 0.839)
The prediction process:
- New observations are projected into the PLS latent space
- These projections are used to predict Y through the inner relationship
- The predictions are then transformed back to the original Y-space

This example demonstrates how PLS creates a low-dimensional representation (we reduced from 3 X-variables to 2 components) that effectively captures the relationship between X and Y, even in this small dataset.

Summary of the Data Flow

The key insight from following this numerical example is that PLS:

Iteratively finds directions (weights) in X-space that are most predictive of Y
Projects data onto these directions to create scores
Establishes relationships between these projections
Uses these relationships to make predictions for new data

This step-by-step walkthrough with actual numbers shows how data flows through the PLS model, transforming from raw observations to predictions via a series of mathematically optimized projections.