Skip to content

Stacking

Description

Stacking is a popular ensemble learning technique that improves predictive performance by combining the outputs of multiple base models. It does this by training a higher-level model on the predictions of these base models.

Workflow

  1. Split the training data into two parts:

    • The first part is used to train the base models.
    • The second part is used to generate a new dataset of predictions from the base models.
  2. Train multiple base models on the first part of the training data.

  3. Generate predictions using the trained base models on the second part of the training data, creating a new dataset of predictions.
  4. Train a higher-level model (also called a meta-model or blender) using this new dataset of predictions as input features.
  5. Make final predictions on test data using the trained higher-level model.

The higher-level model is typically a simple algorithm, such as linear regression, logistic regression, or a decision tree. Its goal is to learn how to optimally combine the predictions of the base models, leading to improved overall accuracy.