Skip to content

Quality Filtering [NLP] [Augmentation]

Description

We can implement quality checks to filter out low-quality augmented samples.

Example

def quality_filter(
    augmented_texts,
    original_text,
    similarity_threshold=0.8,
    perplexity_threshold=100
):
    filtered_texts = []

    for candidate_text in augmented_texts:
        similarity_score = semantic_similarity(original_text, candidate_text)
        perplexity_score = calculate_perplexity(candidate_text)

        if similarity_score >= similarity_threshold and perplexity_score <= perplexity_threshold:
            filtered_texts.append(candidate_text)

    return filtered_texts