Skip to content

CI windows debugging: Bug 21398 ractor lock ordering issue hack #13851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jhawthorn
Copy link
Member

No description provided.

luke-gruber and others added 5 commits July 8, 2025 15:38
…d_wakeup()

In rb_ractor_sched_wait() (ex: Ractor.receive), we acquire
RACTOR_LOCK(cr) and then thread_sched_lock(cur_th). However, on wakeup
if we're a dnt, in thread_sched_wait_running_turn() we acquire
thread_sched_lock(cur_th) after condvar wakeup and then RACTOR_LOCK(cr).
This lock inversion can cause a deadlock with rb_ractor_wakeup_all()
(ex: port.send(obj)), where we acquire RACTOR_LOCK(other_r) and then
thread_sched_lock(other_th).

So, the error happens:

nt 1:   Ractor.receive
            rb_ractor_sched_wait() after condvar wakeup in thread_sched_wait_running_turn():
              - thread_sched_lock(cur_th) (condvar) # acquires lock
              - rb_ractor_lock_self(cr) # deadlock here: tries to acquire, HANGS

nt 2: port.send
            ractor_wakeup_all()
              - RACTOR_LOCK(port_r) # acquires lock
              - thread_sched_lock # tries to acquire, HANGS

One solution would be to rework `thread_sched_wait_running_turn()` with DNT's. I didn't
do this because it would be a bigger architectural change. What I changed is to unlock
RACTOR_LOCK before calling rb_ractor_sched_wakeup() in a pthread env. In a non-pthread
env it's safe to hold this lock, and we should.

Script that reproduces issue:

```ruby
require "async"
class RactorWrapper
  def initialize
    @Ractor = Ractor.new do
      Ractor.recv # Ractor doesn't start until explicitly told to
      # Do some calculations
      fib = ->(x) { x < 2 ? 1 : fib.call(x - 1) + fib.call(x - 2) }
      fib.call(20)
    end
  end

  def take_async
    @ractor.send(nil)
    Thread.new { @ractor.value }.value
  end
end

Async do |task|
  10_000.times do |i|
    task.async do
      RactorWrapper.new.take_async
      puts i
    end
  end
end
exit 0
```

Fixes [Bug #21398]
Copy link

launchable-app bot commented Jul 10, 2025

Tests Failed

✖️no tests failed ✔️61861 tests passed(4 flakes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants