Fix inflated WER by filtering eos token before metric computation by Mr-Neutr0n · Pull Request #3030 · speechbrain/speechbrain

Mr-Neutr0n · 2026-02-09T18:45:34Z

Summary

Fixes Fluent Speech Commands WER wrong since eos is counted as token #2863: WER was inflated because the eos (end-of-sequence) token was included in predicted sequences during metric computation
The decoded eos token (appearing as ⁇) was being counted as insertion errors, causing e.g. ~33% WER on perfectly predicted utterances
Uses the existing filter_seq2seq_output utility from speechbrain.decoders.utils to strip eos tokens from predicted sequences before decoding and WER/CER evaluation

Affected recipes

All SLU recipes that use seq2seq decoding with tokenizer.decode_ids() were affected:

recipes/fluent-speech-commands/direct/train.py
recipes/timers-and-such/direct/train.py
recipes/timers-and-such/direct/train_with_wav2vec2.py
recipes/timers-and-such/decoupled/train.py
recipes/timers-and-such/multistage/train.py
recipes/SLURP/direct/train.py
recipes/SLURP/direct/train_with_wav2vec2.py
recipes/SLURP/NLU/train.py

Root cause

The beam searcher returns predicted token sequences that may contain the eos token (index 0). When these sequences are passed directly to tokenizer.decode_ids(), the eos token ID gets decoded by SentencePiece as ⁇, which is then included in the WER computation as extra tokens (insertions).

Fix

Each affected recipe now wraps the token sequence with filter_seq2seq_output(utt_seq, eos_id=self.hparams.eos_index) before passing to decode_ids(). This truncates the sequence at the first eos occurrence, matching the expected behavior described in the issue.

Test plan

Verify that WER computation no longer includes the eos token for the fluent-speech-commands recipe
Verify that the fix does not affect training loss (only evaluation metrics)
Run the fluent-speech-commands recipe and confirm wer_test.txt no longer shows ⁇ insertions

The eos (end-of-sequence) token was being included in the predicted sequences passed to the WER/CER metric computation, causing inflated error rates. For example, a perfectly predicted utterance would show ~33% WER because the decoded eos token was counted as insertions. This fix uses the existing filter_seq2seq_output utility to strip eos tokens from predicted sequences before decoding and metric evaluation. Applied to all affected SLU recipes: - fluent-speech-commands/direct - timers-and-such/direct, decoupled, multistage - SLURP/direct, NLU Fixes speechbrain#2863

Adel-Moumen · 2026-02-10T11:16:08Z

Hey @Mr-Neutr0n. Thanks a lot for looking at this issue. I do really appreciate it!

I have been digging on my side, but I fear the issue is not really the beam searcher but rather the tokenizer. It seems to me that we end up having bos=eos=unk=pad as bos=eos=unk=pad=0 which causes issues while decoding in text the tokens indices. The weird thing is that in the beam searcher we do hyps = undo_padding(best_hyps, best_lens) so we should remove the appended eos token (since the len is len(hyp) - 1) / max_len).

I need to have a look and debug as it is a bit concerning to me.

hxrikp1729 · 2026-02-10T12:26:11Z

@Adel-Moumen thanks for digging into this, really appreciate it!

You raise a solid point — if bos=eos=unk=pad=0, that's definitely a deeper issue since undo_padding wouldn't properly strip the eos if it can't distinguish it from padding. The fix here is more of a symptom-level treatment: filtering eos before WER computation so the metric isn't inflated by the trailing token.

That said, you're right that the tokenizer setup is the more fundamental concern. If eos and pad share the same index, undo_padding can't reliably tell when the actual sequence ends vs padding, which would affect more than just WER.

I'm happy to hold off on this PR until you've had a chance to debug the tokenizer side. Or if you'd prefer, we could keep this as a defensive check while the underlying tokenizer config gets sorted out — whichever makes more sense to you.

Mr-Neutr0n · 2026-02-16T12:17:36Z

Closing — looks like the root cause goes deeper than what this PR addresses. Thanks for the thorough analysis @Adel-Moumen.

style: fix import ordering for ruff compliance

6b284fb

Adel-Moumen mentioned this pull request Feb 12, 2026

Fix inflated PER in G2P recipe by filtering eos tokens #3031

Closed

3 tasks

Mr-Neutr0n closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inflated WER by filtering eos token before metric computation#3030

Fix inflated WER by filtering eos token before metric computation#3030
Mr-Neutr0n wants to merge 2 commits intospeechbrain:developfrom
Mr-Neutr0n:fix-wer-eos-token-issue-2863

Mr-Neutr0n commented Feb 9, 2026

Uh oh!

Adel-Moumen commented Feb 10, 2026

Uh oh!

hxrikp1729 commented Feb 10, 2026

Uh oh!

Mr-Neutr0n commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Mr-Neutr0n commented Feb 9, 2026

Summary

Affected recipes

Root cause

Fix

Test plan

Uh oh!

Adel-Moumen commented Feb 10, 2026

Uh oh!

hxrikp1729 commented Feb 10, 2026

Uh oh!

Mr-Neutr0n commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments