Skip to content

Use concurrent set for global symbol table #13948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

peterzhu2118
Copy link
Member

@peterzhu2118 peterzhu2118 commented Jul 18, 2025

This PR switches the global symbol table and allows for symbol lookup and creation of dynamic symbols without needing the global VM lock. Creating static symbols still require the global VM lock but I will be investigating how to make that lock-free in the future.

Here's a microbenchmark demonstrating the speedup:

ARGV[0].to_i.times.map do
  Ractor.new do
    1_000_000.times do |i|
      "hello#{i}".to_sym
    end
  end
end.map(&:value)

Results:

chart

Raw results:

Ractor count Branch (s) Master (s)
1 0.364 0.401
2 0.555 1.149
3 0.583 3.890
4 0.680 3.288
5 0.789 5.107

rb_objspace_garbage_object_p expects only GC managed objects to be passed
in. We should skip the check if curr_key is a special constant.
We don't need to delay the freeing of the fstr for the symbol if we store
the hash of the fstr in the dynamic symbol and we use compare-by-identity
for removing the dynamic symbol from the sym_set.
If the object is garbage, then calling cmp on it may crash.
Benchmark:

    ARGV[0].to_i.times.map do
      Ractor.new do
        1_000_000.times do |i|
          "hello#{i}".to_sym
        end
      end
    end.map(&:value)

Results:

| Ractor count | Branch (s) | Master (s) |
|--------------|------------|------------|
| 1            | 0.364      | 0.401      |
| 2            | 0.555      | 1.149      |
| 3            | 0.583      | 3.890      |
| 4            | 0.680      | 3.288      |
| 5            | 0.789      | 5.107      |
If we create a key but don't insert it (due to other Ractor winning the
race), then it would leak memory if we don't free it. This introduces a
new function to free that memory for this case.
@peterzhu2118 peterzhu2118 requested a review from jhawthorn July 18, 2025 15:20
Copy link

Tests Failed

✖️no tests failed ✔️62102 tests passed(1 flake)

concurrent_set.c Outdated
struct concurrent_set *set = RTYPEDDATA_GET_DATA(set_obj);

struct concurrent_set_probe probe;
VALUE hash = set->funcs->hash(key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) I doubt it makes a significant difference, but I think hash won't change and could go above retry:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍 I copied the code from rb_concurrent_set_find_or_insert which also has this issue so I fixed it in both places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That didn't work because we only load set later (even though we know a new set will still have the same funcs).

I don't think this change is worth making 😅

Since the hash should never change, we can calculate it once before the
retry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants