Skip to content

YJIT: Allow parallel scanning for JIT-compiled code (attempt 2) #13843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2025

Conversation

wks
Copy link
Contributor

@wks wks commented Jul 10, 2025

Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC threads work in parallel during a GC. Currently, when two GC threads scan two iseq objects simultaneously when YJIT is enabled, both threads will attempt to borrow CodeBlock::mem_block, which will result in panic.

We make two changes to YJIT in order to support parallel GC.

We now set the YJIT code memory to writable in bulk before the reference-updating phase, and reset it to executable in bulk after the reference-updating phase. Previously, YJIT lazily sets memory pages writable while updating object references embedded in JIT-compiled machine code, and sets the memory back to executable by calling mark_all_executable. This approach is inherently unfriendly to parallel GC because it sets the whole CodeBlock as executable which races with other GC threads that are updating other iseq objects. It also has performance overhead due to the frequent invocation of system calls. We now set the permission of all the code memory in bulk before and after the reference updating phase. Multiple GC threads can now perform raw memory writes in parallel. We should also see performance improvement during moving GC because of the reduced number of mprotect system calls.

We also move RefCell one level down to VirtualMemory. We notice that some fields of VirtualMemory, such as region_start, are never modified once VirtualMemory is constructed. We change the type of the field CodeBlock::mem_block from Rc<RefCell<T>> to Rc<T>, and push the RefCell into VirtualMemory. We extract mutable fields of VirtualMemory into a dedicated struct VirtualMemoryMut, and store them in a field VirtualMemory::mutable which is a RefCell<VirtualMemoryMut>. After this change, methods that access immutable fields in VirtualMemory, particularly base_ptr() which reads region_start, will no longer need to borrow any RefCell. Methods that access mutable fields will need to borrow VirtualMemory::mutable, but the number of borrowing operations becomes strictly fewer than before because borrowing operations previously done in callers (such as CodeBlock::write_mem) are moved into methods of VirtualMemory (such as VirtualMemory::write_bytes).

Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC
threads work in parallel during a GC.  Currently, when two GC threads
scan two iseq objects simultaneously when YJIT is enabled, both threads
will attempt to borrow `CodeBlock::mem_block`, which will result in
panic.

This commit makes one part of the change.

We now set the YJIT code memory to writable in bulk before the
reference-updating phase, and reset it to executable in bulk after the
reference-updating phase.  Previously, YJIT lazily sets memory pages
writable while updating object references embedded in JIT-compiled
machine code, and sets the memory back to executable by calling
`mark_all_executable`.  This approach is inherently unfriendly to
parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it
sets the whole `CodeBlock` as executable which races with other GC
threads that are updating other iseq objects.  It also has performance
overhead due to the frequent invocation of system calls.  We now set the
permission of all the code memory in bulk before and after the reference
updating phase.  Multiple GC threads can now perform raw memory writes
in parallel.  We should also see performance improvement during moving
GC because of the reduced number of `mprotect` system calls.
@matzbot matzbot requested a review from a team July 10, 2025 09:31
This is the second part of making YJIT work with parallel GC.

During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need
to resolve offsets in `Block::gc_obj_offsets` into absolute addresses
before reading or updating the fields.  This needs the base address
stored in `VirtualMemory::region_start` which was previously behind a
`RefCell`.  When multiple GC threads scan multiple iseq simultaneously
(which is possible for some GC modules such as MMTk), it will panic
because the `RefCell` is already borrowed.

We notice that some fields of `VirtualMemory`, such as `region_start`,
are never modified once `VirtualMemory` is constructed.  We change the
type of the field `CodeBlock::mem_block` from `Rc<RefCell<T>>` to
`Rc<T>`, and push the `RefCell` into `VirtualMemory`.  We extract
mutable fields of `VirtualMemory` into a dedicated struct
`VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable`
which is a `RefCell<VirtualMemoryMut>`.  After this change, methods that
access immutable fields in `VirtualMemory`, particularly `base_ptr()`
which reads `region_start`, will no longer need to borrow any `RefCell`.
Methods that access mutable fields will need to borrow
`VirtualMemory::mutable`, but the number of borrowing operations becomes
strictly fewer than before because borrowing operations previously done
in callers (such as `CodeBlock::write_mem`) are moved into methods of
`VirtualMemory` (such as `VirtualMemory::write_bytes`).
@wks wks force-pushed the feature/ruby-yjit-parallel-scan2 branch from 8a013c7 to 39e9a6d Compare July 10, 2025 09:36
@wks
Copy link
Contributor Author

wks commented Jul 10, 2025

@XrXr @peterzhu2118 After discussing, we decided to address the architecture problem that forced the GC code to go through a RefCell borrow before accessing the field VirtualMemory::region_start which is actually immutable. This PR moves RefCell down into VirtualMemory so that it only governs its mutable fields. Unlike our previous PR, this PR does not change the size of Block::gc_obj_offsets and does not increase memory usage.

I will provide the results of yjit-bench later.

@wks
Copy link
Contributor Author

wks commented Jul 10, 2025

master: ruby 3.5.0dev (2025-07-08T07:26:18Z master c913a635d7) +YJIT +PRISM [x86_64-linux]
pr2: ruby 3.5.0dev (2025-07-08T08:31:58Z feature/ruby-yjit-.. ea5a7b5dbc) +YJIT +PRISM [x86_64-linux]

-----------------  -----------  ----------  --------  ----------  -----------  ----------
bench              master (ms)  stddev (%)  pr2 (ms)  stddev (%)  pr2 1st itr  master/pr2
hexapdf            N/A          N/A         N/A       N/A         N/A          N/A       
ruby-lsp           N/A          N/A         N/A       N/A         N/A          N/A       
activerecord       69.7         0.4         68.5      0.2         1.018        1.019     
chunky-png         200.7        0.3         198.5     0.2         1.010        1.011     
erubi-rails        312.6        0.2         301.4     0.3         0.757        1.037     
liquid-c           26.6         7.7         24.7      8.0         1.058        1.073     
liquid-compile     20.5         3.4         19.0      2.5         1.012        1.080     
liquid-render      34.8         6.0         33.1      6.5         0.999        1.053     
lobsters           362.1        0.9         356.6     1.1         0.951        1.015     
mail               58.9         8.8         57.4      9.1         1.017        1.026     
psych-load         689.7        0.4         652.6     0.3         1.055        1.057     
railsbench         793.9        1.0         800.2     1.4         1.001        0.992     
rubocop            55.0         4.2         50.6      4.1         1.018        1.086     
sequel             25.4         0.6         25.0      0.4         1.017        1.015     
binarytrees        90.2         0.5         85.6      5.1         1.007        1.053     
blurhash           75.8         0.3         75.9      0.3         1.001        1.000     
erubi              77.2         0.2         77.1      0.2         1.003        1.001     
etanni             123.9        0.2         121.0     0.1         0.992        1.024     
fannkuchredux      108.1        69.2        109.4     69.3        0.994        0.988     
fluentd            189.3        2.6         182.7     5.1         1.043        1.036     
graphql            15.2         1.3         14.7      0.4         1.051        1.034     
graphql-native     133.2        0.2         127.6     0.1         1.070        1.044     
lee                368.7        0.5         367.9     1.1         0.995        1.002     
matmul             109.8        0.5         110.0     0.6         0.997        0.999     
nbody              22.0         0.5         21.8      0.6         0.991        1.009     
nqueens            25.6         0.4         25.2      0.4         1.002        1.015     
optcarrot          725.0        1.4         716.5     1.4         1.014        1.012     
protoboeuf         20.6         1.9         17.4      1.2         1.048        1.182     
protoboeuf-encode  17.0         3.5         16.7      3.2         1.025        1.018     
rack               19.1         0.7         18.8      0.7         1.026        1.017     
ruby-json          144.1        0.8         146.9     0.3         1.034        0.981     
rubyboy            693.4        0.2         703.0     0.2         0.989        0.986     
rubykon            260.6        1.5         258.8     1.4         1.004        1.007     
sudoku             76.6         0.1         76.2      0.1         1.010        1.005     
tinygql            148.1        0.1         146.9     0.1         1.015        1.008     
30k_ifelse         49.1         0.3         48.9      0.2         1.009        1.004     
30k_methods        25.7         0.4         25.7      0.4         1.004        0.998     
cfunc_itself       11.3         4.8         11.3      4.8         0.991        1.001     
fib                15.2         0.5         15.1      0.2         1.028        1.009     
getivar            5.8          61.5        5.8       61.4        1.005        1.002     
keyword_args       11.4         5.2         11.4      5.1         1.007        1.002     
loops-times        135.9        0.2         135.6     0.2         0.997        1.002     
object-new         23.9         15.1        23.4      15.0        1.018        1.022     
respond_to         3.8          10.3        3.8       10.8        0.988        1.001     
ruby-xor           10.5         0.9         10.4      0.6         1.023        1.009     
setivar            3.1          88.0        3.1       88.2        1.006        1.001     
setivar_object     21.3         20.2        21.1      20.3        0.999        1.006     
setivar_young      21.1         20.4        21.0      20.5        0.998        1.003     
str_concat         13.1         1.7         12.5      1.8         1.017        1.047     
throw              9.1          0.8         9.1       1.1         0.999        1.003     
-----------------  -----------  ----------  --------  ----------  -----------  ----------

@XrXr
Copy link
Member

XrXr commented Jul 14, 2025

Thank you!

@XrXr XrXr merged commit 3a47f4e into ruby:master Jul 14, 2025
92 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants