-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
GH-132657: Add lock-free set contains implementation #132290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This makes for longer code vs using the custom LOAD_*/STORE_* macros. However, I think this makes the code more clear.
🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Misc/NEWS.d/next/Core_and_Builtins/2025-07-11-19-57-27.gh-issue-132657.vwDuO2.rst
Outdated
Show resolved
Hide resolved
Objects/setobject.c
Outdated
} | ||
Py_ssize_t ep_hash = ep->hash; | ||
if (ep_hash == hash) { | ||
if (PyUnicode_CheckExact(startkey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization was introduced to avoid the check on mutating tables and the incref/decref on the startkey
. Since that is not relevant for the frozenset, we can could perhaps remove this fast path. (there will still be a minor gain because unicode_eq is used directly, but the PyUnicode_CheckExact
check also takes time for the non-unicode cases).
See eendebakpt@93035c4, text Hacked up version...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Based on my benchmarking, the difference is small. So, removing that special case seems better.
This roughly follows what was done for dictobject to make a lock-free lookup operation. On a benchmark running
set.__contains__
in a tight loop, this is 1.5x faster on my computer. In the bm_deepcopy benchmark, the gains are very modest, between 1 to 2% faster. On the "set_contains" scaling benchmark, the results are much better. Also, the multi-threaded scaling of "copy" and "deepcopy" seem to be measurably improved.Summary of changes:
set_lookkey()
intoset_do_lookup()
which now takes a function pointer that does the entry comparison. This is similar to dictobject anddo_lookup()
. In an optimized build, the comparison function is inlined and there should be no performance cost to this.set_do_lookup
to return a status separately from the entry value.set_compare_frozenset()
and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)FT_ATOMIC_*
macros as needed for atomic loads and storesmemcpy()
.so->table
to NULL while the change is a happening. Assign the real table array after the change is done.Free-threading scaling benchmark results from the attached scripts (result for 6 cores in parallel). This is a modified version of the
ftscalingbenchmark.py
script.ftscaling_set.py.txt