Noise Injection Regularization
Description
Noise injection is a regularization technique commonly used to improve the generalization of machine learning models. By adding a small amount of noise to the input data, weights, or activation functions.
Noise injection helps prevent overfitting, the technique forces the model to be less reliant on specific patterns in the training data, encouraging it to learn more robust, general features that apply across different datasets.
This approach is particularly useful in neural networks, where noise such as the following can be injected at various stages:
- Input noise: Adds noise directly to the input data, helping the model become more robust to variations in the input
- Weight noise: Perturbs the weights during training, encouraging the model to generalize better
- Activation noise: Adds noise to the activation functions, leading to smoother decision boundaries and reducing overfitting
Example
import torch
from torch.nn.utils import clip_grad_norm_
from torch.optim import AdamW
def train(model, dataloader, grad_clip=1.0, noise_factor=0.01, lr=5e-5, epochs=3):
optimizer = AdamW(model.parameters(), lr=lr)
for e in range(epochs):
model.train()
total_loss = 0
for batch in dataloader:
optimizer.zero_grad()
input_ids = batch["input_ids"]
noise = torch.randn_like(input_ids, dtype=torch.float) * noise_factor
noisy_inputs = noisy_inputs = input_ids.float() + noise
noisy_inputs = noisy_inputs.long().clamp(min=0, max=model.config.vocab_size - 1)
outputs = model(input_ids=noisy_inputs, labels=input_ids)
loss = outputs.loss
loss.backward()
clip_grad_norm_(model.parameters(), grad_clip);
optimizer.step()
total_loss += loss.item()
print(f"Epoch {e+1}, Loss: {total_loss/len(dataloader):.4f}")
Info
What does clip_grad_norm_
do?
It limits (clips) the size of gradients during backpropagation. This prevents exploding gradients (when gradients become too large), helping the model train more stably and avoiding NaN losses.