Skip to content

GH-132657: Add lock-free set contains implementation #132290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Apr 8, 2025

This roughly follows what was done for dictobject to make a lock-free lookup operation. On a benchmark running set.__contains__ in a tight loop, this is 1.5x faster on my computer. In the bm_deepcopy benchmark, the gains are very modest, between 1 to 2% faster. On the "set_contains" scaling benchmark, the results are much better. Also, the multi-threaded scaling of "copy" and "deepcopy" seem to be measurably improved.

Summary of changes:

  • refactor set_lookkey() into set_do_lookup() which now takes a function pointer that does the entry comparison. This is similar to dictobject and do_lookup(). In an optimized build, the comparison function is inlined and there should be no performance cost to this.
  • change set_do_lookup to return a status separately from the entry value.
  • add set_compare_frozenset() and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)
  • use FT_ATOMIC_* macros as needed for atomic loads and stores
  • use a deferred free on the set table array, if shared (only on free-threaded build, normal build always does an immediate free)
  • for free-threaded build, use explicit for loop to zero the table, rather than memcpy().
  • when mutating the set, assign so->table to NULL while the change is a happening. Assign the real table array after the change is done.

Free-threading scaling benchmark results from the attached scripts (result for 6 cores in parallel). This is a modified version of the ftscalingbenchmark.py script.

base this PR
dict_contains 4.0x faster 4.0x faster
tuple_contains 5.4x faster 5.3x faster
list_contains 7.1x faster 6.1x faster
frozenset_contains 1.0x faster 5.9x faster
frozenset_contains_dunder 6.4x faster 3.9x faster
set_contains 1.0x slower 5.4x faster
set_contains_dunder 1.4x faster 5.6x faster
shallow_copy 1.9x faster 3.7x faster
deepcopy 2.5x faster 3.5x faster

ftscaling_set.py.txt

@nascheme nascheme changed the title Add lock-free set contains implemention GH-132657: Add lock-free set contains implementation Jul 12, 2025
@nascheme nascheme added performance Performance or resource usage topic-free-threading 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 12, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 12, 2025
}
Py_ssize_t ep_hash = ep->hash;
if (ep_hash == hash) {
if (PyUnicode_CheckExact(startkey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization was introduced to avoid the check on mutating tables and the incref/decref on the startkey. Since that is not relevant for the frozenset, we can could perhaps remove this fast path. (there will still be a minor gain because unicode_eq is used directly, but the PyUnicode_CheckExact check also takes time for the non-unicode cases).

See eendebakpt@93035c4, text Hacked up version...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Based on my benchmarking, the difference is small. So, removing that special case seems better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-free-threading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants