-
Notifications
You must be signed in to change notification settings - Fork 949
fix: prioritise human-initiated builds over prebuilds #18933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prioritise human-initiated builds over prebuilds #18933
Conversation
…ueue This change implements a priority queue system for provisioner jobs to ensure that human-initiated workspace builds are processed before prebuild jobs, improving user experience during high queue periods. Changes: - Add priority column to provisioner_jobs table (1=human, 0=prebuild) - Update AcquireProvisionerJob query to order by priority DESC, created_at ASC - Set priority in workspace builder based on initiator (PrebuildsSystemUserID) - Expose priority field in API and SDK - Add comprehensive test for priority queue behavior Co-authored-by: kylecarbs <7122116+kylecarbs@users.noreply.github.com>
…bs are picked up first
@@ -26,6 +26,8 @@ WHERE | |||
-- they are aliases and the code that calls this query already relies on a different type | |||
AND provisioner_tagset_contains(@provisioner_tags :: jsonb, potential_job.tags :: jsonb) | |||
ORDER BY | |||
-- Ensure that human-initiated jobs are prioritized over prebuilds. | |||
potential_job.initiator_id = 'c42fdf75-3097-471c-8c33-fb52454d81c0'::uuid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice idea, I like the simplicity of prioritizing human-initiated jobs.
That said, I’m a bit concerned about the risk of prebuild starvation under heavy load. If I understand the query correctly, we’re first ordering by whether the job is human-initiated (higher priority), and then by created_at
within each group. That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.
Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue? I know we don’t have an explicit enqueued_at
field, but maybe created_at
could serve that purpose. That way, we could still prioritize human jobs, but allow older prebuilds to eventually be picked up and get processed, avoiding starvation while maintaining the intended prioritization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means prebuild jobs will only be picked up once all human-initiated jobs have been processed.
Correct. The assumption here is that there will never be a constant stream of human-initiated jobs. If this is the case, then the deployment is undersized and requires more provisioners.
Would it make sense to consider an ordering strategy that also takes into account how long a job has been waiting in the queue?
I'm not sure the added complexity is worthwhile here. How likely is it for users to be creating a constant stream of provisioner jobs similar in volume to that of the prebuilds reconciler?
Also, we need to decide how long is 'too long' for a prebuild to wait? 1 minute? 10 minutes? Does it need to be configurable?
Finally, after the initial waiting period elapses, users will still end up in the situation where they will be stuck waiting on a bunch of provisioner jobs (as the prebuilds reconciler creates provisioner jobs in batches).
My take is that if you are running into this problem in the first place, you need more provisioners.
I do like the suggestion of leveraging coder_workspace_tags
to conditionally send prebuilds to a separate set of provisioners, but it appears that tfparse
is not able to evaluate that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same starvation concern in the original PR, but your point convinced me @johnstcn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update that query that tells us the queue position too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof maybe yea Done
Continues from #18882
ORDER BY initiator_id = $PREBUILDS_USER_ID
toAcquireProvisionerJob