Fix inflated WER by filtering eos token before metric computation#3030
Fix inflated WER by filtering eos token before metric computation#3030Mr-Neutr0n wants to merge 2 commits intospeechbrain:developfrom
Conversation
The eos (end-of-sequence) token was being included in the predicted sequences passed to the WER/CER metric computation, causing inflated error rates. For example, a perfectly predicted utterance would show ~33% WER because the decoded eos token was counted as insertions. This fix uses the existing filter_seq2seq_output utility to strip eos tokens from predicted sequences before decoding and metric evaluation. Applied to all affected SLU recipes: - fluent-speech-commands/direct - timers-and-such/direct, decoupled, multistage - SLURP/direct, NLU Fixes speechbrain#2863
|
Hey @Mr-Neutr0n. Thanks a lot for looking at this issue. I do really appreciate it! I have been digging on my side, but I fear the issue is not really the beam searcher but rather the tokenizer. It seems to me that we end up having I need to have a look and debug as it is a bit concerning to me. |
|
@Adel-Moumen thanks for digging into this, really appreciate it! You raise a solid point — if That said, you're right that the tokenizer setup is the more fundamental concern. If I'm happy to hold off on this PR until you've had a chance to debug the tokenizer side. Or if you'd prefer, we could keep this as a defensive check while the underlying tokenizer config gets sorted out — whichever makes more sense to you. |
|
Closing — looks like the root cause goes deeper than what this PR addresses. Thanks for the thorough analysis @Adel-Moumen. |
Summary
eosis counted as token #2863: WER was inflated because theeos(end-of-sequence) token was included in predicted sequences during metric computationeostoken (appearing as⁇) was being counted as insertion errors, causing e.g. ~33% WER on perfectly predicted utterancesfilter_seq2seq_outpututility fromspeechbrain.decoders.utilsto stripeostokens from predicted sequences before decoding and WER/CER evaluationAffected recipes
All SLU recipes that use seq2seq decoding with
tokenizer.decode_ids()were affected:recipes/fluent-speech-commands/direct/train.pyrecipes/timers-and-such/direct/train.pyrecipes/timers-and-such/direct/train_with_wav2vec2.pyrecipes/timers-and-such/decoupled/train.pyrecipes/timers-and-such/multistage/train.pyrecipes/SLURP/direct/train.pyrecipes/SLURP/direct/train_with_wav2vec2.pyrecipes/SLURP/NLU/train.pyRoot cause
The beam searcher returns predicted token sequences that may contain the
eostoken (index 0). When these sequences are passed directly totokenizer.decode_ids(), theeostoken ID gets decoded by SentencePiece as⁇, which is then included in the WER computation as extra tokens (insertions).Fix
Each affected recipe now wraps the token sequence with
filter_seq2seq_output(utt_seq, eos_id=self.hparams.eos_index)before passing todecode_ids(). This truncates the sequence at the firsteosoccurrence, matching the expected behavior described in the issue.Test plan
eostoken for the fluent-speech-commands recipewer_test.txtno longer shows⁇insertions