-
Notifications
You must be signed in to change notification settings - Fork 5.4k
YJIT: Allow parallel scanning for JIT-compiled code (attempt 2) #13843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC threads work in parallel during a GC. Currently, when two GC threads scan two iseq objects simultaneously when YJIT is enabled, both threads will attempt to borrow `CodeBlock::mem_block`, which will result in panic. This commit makes one part of the change. We now set the YJIT code memory to writable in bulk before the reference-updating phase, and reset it to executable in bulk after the reference-updating phase. Previously, YJIT lazily sets memory pages writable while updating object references embedded in JIT-compiled machine code, and sets the memory back to executable by calling `mark_all_executable`. This approach is inherently unfriendly to parallel GC because (1) it borrows `CodeBlock::mem_block`, and (2) it sets the whole `CodeBlock` as executable which races with other GC threads that are updating other iseq objects. It also has performance overhead due to the frequent invocation of system calls. We now set the permission of all the code memory in bulk before and after the reference updating phase. Multiple GC threads can now perform raw memory writes in parallel. We should also see performance improvement during moving GC because of the reduced number of `mprotect` system calls.
This is the second part of making YJIT work with parallel GC. During GC, `rb_yjit_iseq_mark` and `rb_yjit_iseq_update_references` need to resolve offsets in `Block::gc_obj_offsets` into absolute addresses before reading or updating the fields. This needs the base address stored in `VirtualMemory::region_start` which was previously behind a `RefCell`. When multiple GC threads scan multiple iseq simultaneously (which is possible for some GC modules such as MMTk), it will panic because the `RefCell` is already borrowed. We notice that some fields of `VirtualMemory`, such as `region_start`, are never modified once `VirtualMemory` is constructed. We change the type of the field `CodeBlock::mem_block` from `Rc<RefCell<T>>` to `Rc<T>`, and push the `RefCell` into `VirtualMemory`. We extract mutable fields of `VirtualMemory` into a dedicated struct `VirtualMemoryMut`, and store them in a field `VirtualMemory::mutable` which is a `RefCell<VirtualMemoryMut>`. After this change, methods that access immutable fields in `VirtualMemory`, particularly `base_ptr()` which reads `region_start`, will no longer need to borrow any `RefCell`. Methods that access mutable fields will need to borrow `VirtualMemory::mutable`, but the number of borrowing operations becomes strictly fewer than before because borrowing operations previously done in callers (such as `CodeBlock::write_mem`) are moved into methods of `VirtualMemory` (such as `VirtualMemory::write_bytes`).
8a013c7
to
39e9a6d
Compare
@XrXr @peterzhu2118 After discussing, we decided to address the architecture problem that forced the GC code to go through a I will provide the results of yjit-bench later. |
|
Thank you! |
Some GC modules, notably MMTk, support parallel GC, i.e. multiple GC threads work in parallel during a GC. Currently, when two GC threads scan two iseq objects simultaneously when YJIT is enabled, both threads will attempt to borrow
CodeBlock::mem_block
, which will result in panic.We make two changes to YJIT in order to support parallel GC.
We now set the YJIT code memory to writable in bulk before the reference-updating phase, and reset it to executable in bulk after the reference-updating phase. Previously, YJIT lazily sets memory pages writable while updating object references embedded in JIT-compiled machine code, and sets the memory back to executable by calling
mark_all_executable
. This approach is inherently unfriendly to parallel GC because it sets the wholeCodeBlock
as executable which races with other GC threads that are updating other iseq objects. It also has performance overhead due to the frequent invocation of system calls. We now set the permission of all the code memory in bulk before and after the reference updating phase. Multiple GC threads can now perform raw memory writes in parallel. We should also see performance improvement during moving GC because of the reduced number ofmprotect
system calls.We also move
RefCell
one level down toVirtualMemory
. We notice that some fields ofVirtualMemory
, such asregion_start
, are never modified onceVirtualMemory
is constructed. We change the type of the fieldCodeBlock::mem_block
fromRc<RefCell<T>>
toRc<T>
, and push theRefCell
intoVirtualMemory
. We extract mutable fields ofVirtualMemory
into a dedicated structVirtualMemoryMut
, and store them in a fieldVirtualMemory::mutable
which is aRefCell<VirtualMemoryMut>
. After this change, methods that access immutable fields inVirtualMemory
, particularlybase_ptr()
which readsregion_start
, will no longer need to borrow anyRefCell
. Methods that access mutable fields will need to borrowVirtualMemory::mutable
, but the number of borrowing operations becomes strictly fewer than before because borrowing operations previously done in callers (such asCodeBlock::write_mem
) are moved into methods ofVirtualMemory
(such asVirtualMemory::write_bytes
).