Skip to content

Fix inflated WER by filtering eos token before metric computation#3030

Closed
Mr-Neutr0n wants to merge 2 commits intospeechbrain:developfrom
Mr-Neutr0n:fix-wer-eos-token-issue-2863
Closed

Fix inflated WER by filtering eos token before metric computation#3030
Mr-Neutr0n wants to merge 2 commits intospeechbrain:developfrom
Mr-Neutr0n:fix-wer-eos-token-issue-2863

Conversation

@Mr-Neutr0n
Copy link
Contributor

Summary

  • Fixes Fluent Speech Commands WER wrong since eos is counted as token #2863: WER was inflated because the eos (end-of-sequence) token was included in predicted sequences during metric computation
  • The decoded eos token (appearing as ) was being counted as insertion errors, causing e.g. ~33% WER on perfectly predicted utterances
  • Uses the existing filter_seq2seq_output utility from speechbrain.decoders.utils to strip eos tokens from predicted sequences before decoding and WER/CER evaluation

Affected recipes

All SLU recipes that use seq2seq decoding with tokenizer.decode_ids() were affected:

  • recipes/fluent-speech-commands/direct/train.py
  • recipes/timers-and-such/direct/train.py
  • recipes/timers-and-such/direct/train_with_wav2vec2.py
  • recipes/timers-and-such/decoupled/train.py
  • recipes/timers-and-such/multistage/train.py
  • recipes/SLURP/direct/train.py
  • recipes/SLURP/direct/train_with_wav2vec2.py
  • recipes/SLURP/NLU/train.py

Root cause

The beam searcher returns predicted token sequences that may contain the eos token (index 0). When these sequences are passed directly to tokenizer.decode_ids(), the eos token ID gets decoded by SentencePiece as , which is then included in the WER computation as extra tokens (insertions).

Fix

Each affected recipe now wraps the token sequence with filter_seq2seq_output(utt_seq, eos_id=self.hparams.eos_index) before passing to decode_ids(). This truncates the sequence at the first eos occurrence, matching the expected behavior described in the issue.

Test plan

  • Verify that WER computation no longer includes the eos token for the fluent-speech-commands recipe
  • Verify that the fix does not affect training loss (only evaluation metrics)
  • Run the fluent-speech-commands recipe and confirm wer_test.txt no longer shows insertions

The eos (end-of-sequence) token was being included in the predicted
sequences passed to the WER/CER metric computation, causing inflated
error rates. For example, a perfectly predicted utterance would show
~33% WER because the decoded eos token was counted as insertions.

This fix uses the existing filter_seq2seq_output utility to strip eos
tokens from predicted sequences before decoding and metric evaluation.

Applied to all affected SLU recipes:
- fluent-speech-commands/direct
- timers-and-such/direct, decoupled, multistage
- SLURP/direct, NLU

Fixes speechbrain#2863
@Adel-Moumen
Copy link
Collaborator

Hey @Mr-Neutr0n. Thanks a lot for looking at this issue. I do really appreciate it!

I have been digging on my side, but I fear the issue is not really the beam searcher but rather the tokenizer. It seems to me that we end up having bos=eos=unk=pad as bos=eos=unk=pad=0 which causes issues while decoding in text the tokens indices. The weird thing is that in the beam searcher we do hyps = undo_padding(best_hyps, best_lens) so we should remove the appended eos token (since the len is len(hyp) - 1) / max_len).

I need to have a look and debug as it is a bit concerning to me.

@hxrikp1729
Copy link

@Adel-Moumen thanks for digging into this, really appreciate it!

You raise a solid point — if bos=eos=unk=pad=0, that's definitely a deeper issue since undo_padding wouldn't properly strip the eos if it can't distinguish it from padding. The fix here is more of a symptom-level treatment: filtering eos before WER computation so the metric isn't inflated by the trailing token.

That said, you're right that the tokenizer setup is the more fundamental concern. If eos and pad share the same index, undo_padding can't reliably tell when the actual sequence ends vs padding, which would affect more than just WER.

I'm happy to hold off on this PR until you've had a chance to debug the tokenizer side. Or if you'd prefer, we could keep this as a defensive check while the underlying tokenizer config gets sorted out — whichever makes more sense to you.

@Mr-Neutr0n
Copy link
Contributor Author

Closing — looks like the root cause goes deeper than what this PR addresses. Thanks for the thorough analysis @Adel-Moumen.

@Mr-Neutr0n Mr-Neutr0n closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fluent Speech Commands WER wrong since eos is counted as token

3 participants

Comments