Skip to content

server : fix handling of the ignore_eos flag #14710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 16, 2025

Conversation

ggerganov
Copy link
Member

fix #14695 (comment)

  • Remove slot_params.ignore_eos
  • When ignore_eos is passed in the request, we ignore all EOG tokens by adding -INF logit bias to the sampler

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem correct to me and I can confirm that I am now getting the expected numbers of tokens when setting ignore_eos.

@ggerganov ggerganov merged commit 538cc77 into master Jul 16, 2025
52 of 56 checks passed
Comment on lines +476 to +481
for (llama_token i = 0; i < llama_vocab_n_tokens(vocab); i++) {
if (llama_vocab_is_eog(vocab, i)) {
//SRV_DBG("%s: added %s logit bias = %f\n", __func__, common_token_to_piece(ctx, i).c_str(), -INFINITY);
params.sampling.logit_bias.push_back({i, -INFINITY});
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is done for every token during generation, I suspect it is going to have a significant performance impact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done once per completion request, at the beginning, upon processing the input json parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If performance is a concern we could maybe provide a list of EoG tokens to iterate over instead of iterating over all tokens and checking whether each one is EoG. Although I think iterating over all tokens once per request is going to be negligible vs. iterating over all tokens once per generated token as is being done for sampling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my system this takes about 0.3 ms for a 150k vocab model, so I suppose it is not that bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this anyway: #14721

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants