Adaptive Dropout Regularization [Dropout]
Description
When fine-tuning pre-trained NNs, it's important to carefully adjust regularization to avoid blocking task-specific adaptation while still preventing overfitting.
It starts with a higher dropout rate and gradually decreases it over the course of fine-tuning. This allows the model to adapt to the new task while still maintaining some regularization to prevent overfitting.
Info
This approach prevents overfitting more efficiently than standard dropout, as it maintains the networkโs capacity to learn complex patterns through important neurons while aggressively regularizing redundant or noise-sensitive parts, resulting in models that generalize better with less performance sacrifice on critical features.
Example
import torch.nn as nn
from torch.optim import AdamW
from transformers import GPT2LMHeadModel, GPT2Tokenizer
def fine_tune(model_id, dataloader, initial_dropout=0.1, epochs=3):
model = GPT2LMHeadModel.from_pretrained(model_id)
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
optimizer = AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)
for epoch in range(epochs):
model.train()
total_loss = 0
current_dropout = initial_dropout * (1 - epoch / epochs)
# Update dropout rates
for module in model.modules():
if isinstance(module, nn.Dropout):
module.p = current_dropout
# Training loop
for batch in dataloader:
optimizer.zero_grad()
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
total_loss += loss.item()
print(
f"Epoch {epoch + 1}, "
f"Loss: {total_loss / len(dataloader):.4f}, "
f"Dropout: {current_dropout:.4f}"
)