Skip to content

fix: prioritise human-initiated builds over prebuilds #18933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 22, 2025

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Jul 21, 2025

Continues from #18882

  • Reverts extraneous changes
  • Adds explicit ORDER BY initiator_id = $PREBUILDS_USER_ID to AcquireProvisionerJob
  • Improves test added for above PR

blink-so bot and others added 5 commits July 21, 2025 12:31
…ueue

This change implements a priority queue system for provisioner jobs to ensure
that human-initiated workspace builds are processed before prebuild jobs,
improving user experience during high queue periods.

Changes:
- Add priority column to provisioner_jobs table (1=human, 0=prebuild)
- Update AcquireProvisionerJob query to order by priority DESC, created_at ASC
- Set priority in workspace builder based on initiator (PrebuildsSystemUserID)
- Expose priority field in API and SDK
- Add comprehensive test for priority queue behavior

Co-authored-by: kylecarbs <7122116+kylecarbs@users.noreply.github.com>
@johnstcn johnstcn self-assigned this Jul 21, 2025
@@ -26,6 +26,8 @@ WHERE
-- they are aliases and the code that calls this query already relies on a different type
AND provisioner_tagset_contains(@provisioner_tags :: jsonb, potential_job.tags :: jsonb)
ORDER BY
-- Ensure that human-initiated jobs are prioritized over prebuilds.
potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea, I like the simplicity of prioritizing human-initiated jobs.
That said, I’m a bit concerned about the risk of prebuild starvation under heavy load. If I understand the query correctly, we’re first ordering by whether the job is human-initiated (higher priority), and then by created_at within each group. That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.

Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue? I know we don’t have an explicit enqueued_at field, but maybe created_at could serve that purpose. That way, we could still prioritize human jobs, but allow older prebuilds to eventually be picked up and get processed, avoiding starvation while maintaining the intended prioritization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.

Correct. The assumption here is that there will never be a constant stream of human-initiated jobs. If this is the case, then the deployment is undersized and requires more provisioners.

Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue?

I'm not sure the added complexity is worthwhile here. How likely is it for users to be creating a constant stream of provisioner jobs similar in volume to that of the prebuilds reconciler?

Also, we need to decide how long is 'too long' for a prebuild to wait? 1 minute? 10 minutes? Does it need to be configurable?

Finally, after the initial waiting period elapses, users will still end up in the situation where they will be stuck waiting on a bunch of provisioner jobs (as the prebuilds reconciler creates provisioner jobs in batches).

My take is that if you are running into this problem in the first place, you need more provisioners.

I do like the suggestion of leveraging coder_workspace_tags to conditionally send prebuilds to a separate set of provisioners, but it appears that tfparse is not able to evaluate that right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same starvation concern in the original PR, but your point convinced me @johnstcn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update that query that tells us the queue position too?

Copy link
Member Author

@johnstcn johnstcn Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof maybe yea Done

coderabbitai[bot]

This comment was marked as resolved.

@coder coder deleted a comment from coderabbitai bot Jul 21, 2025
@johnstcn johnstcn merged commit c4b69bb into main Jul 22, 2025
35 checks passed
@johnstcn johnstcn deleted the cj/fix-database-AcquireProvisionerJob-prebuild-priority branch July 22, 2025 12:03
@github-actions github-actions bot locked and limited conversation to collaborators Jul 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants