Skip to content

Word Error Rate (WER) [STT] [NLP]

Description

Is a common metric of the performance of a speech recognition or machine translation system.

The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences.

Formula

\[ WER = \frac{S + D + I}{N} \]
  • \(S\) = number of substitutions
  • \(D\) = number of deletions
  • \(I\) = number of insertions
  • \(N\) = number of words in the reference (ground truth)

Example

from jiwer import wer

reference = "the quick brown fox jumps over the lazy dog"
hypothesis = "the quick brown fox jump over lazy dog"

error_rate = wer(reference, hypothesis)
print(f"WER: {error_rate:.2f}")  # WER: 0.22