GH-132657: Add lock-free set contains implementation #132290

nascheme · 2025-04-08T19:08:25Z

This roughly follows what was done for dictobject to make a lock-free lookup operation. On a benchmark running set.__contains__ in a tight loop, this is 1.5x faster on my computer. In the bm_deepcopy benchmark, the gains are very modest, between 1 to 2% faster. On the "set_contains" scaling benchmark, the results are much better. Also, the multi-threaded scaling of "copy" and "deepcopy" seem to be measurably improved.

Summary of changes:

refactor set_lookkey() into set_do_lookup() which now takes a function pointer that does the entry comparison. This is similar to dictobject and do_lookup(). In an optimized build, the comparison function is inlined and there should be no performance cost to this.
change set_do_lookup to return a status separately from the entry value.
add set_compare_frozenset() and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)
use FT_ATOMIC_* macros as needed for atomic loads and stores
use a deferred free on the set table array, if shared (only on free-threaded build, normal build always does an immediate free)
for free-threaded build, use explicit for loop to zero the table, rather than memcpy().
when mutating the set, assign so->table to NULL while the change is a happening. Assign the real table array after the change is done.

Free-threading scaling benchmark results from the attached scripts (result for 6 cores in parallel). This is a modified version of the ftscalingbenchmark.py script.

	base	this PR
dict_contains	4.0x faster	4.0x faster
tuple_contains	5.4x faster	5.3x faster
list_contains	7.1x faster	6.1x faster
frozenset_contains	1.0x faster	5.9x faster
frozenset_contains_dunder	6.4x faster	3.9x faster
set_contains	1.0x slower	5.4x faster
set_contains_dunder	1.4x faster	5.6x faster
shallow_copy	1.9x faster	3.7x faster
deepcopy	2.5x faster	3.5x faster

ftscaling_set.py.txt

Issue: copy.copy and copy.deepcopy scale poorly with free-threading #132657

Objects/setobject.c

This makes for longer code vs using the custom LOAD_*/STORE_* macros. However, I think this makes the code more clear.

bedevere-bot · 2025-07-12T15:37:54Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

Misc/NEWS.d/next/Core_and_Builtins/2025-07-11-19-57-27.gh-issue-132657.vwDuO2.rst

eendebakpt · 2025-07-13T19:30:30Z

Objects/setobject.c

+    }
+    Py_ssize_t ep_hash = ep->hash;
+    if (ep_hash == hash) {
+        if (PyUnicode_CheckExact(startkey)


This optimization was introduced to avoid the check on mutating tables and the incref/decref on the startkey. Since that is not relevant for the frozenset, we can could perhaps remove this fast path. (there will still be a minor gain because unicode_eq is used directly, but the PyUnicode_CheckExact check also takes time for the non-unicode cases).

See eendebakpt@93035c4, text Hacked up version...

Okay. Based on my benchmarking, the difference is small. So, removing that special case seems better.

wip: lock-free set contains

ff1d60d

kumaraditya303 reviewed Apr 13, 2025

View reviewed changes

Objects/setobject.c Outdated Show resolved Hide resolved

nascheme mentioned this pull request Jul 11, 2025

gh-132657: Avoid locks and refcounts in frozenset operations #136107

Open

nascheme added 8 commits July 11, 2025 16:52

Use FT_ATOMIC_* macros.

55ab02a

This makes for longer code vs using the custom LOAD_*/STORE_* macros. However, I think this makes the code more clear.

Increase items and loops for set test.

7df8f02

Re-order some atomic store operations.

157cd60

Add and use set_compare_frozenset().

4c3596c

Merge 'origin/main' into set_lockfree_contains

6efe562

Fix _PyMem_FreeDelayed() calls, need size.

8ff7dbd

Fix frozenset contains method.

87278ef

Add NEWS.

b2affbf

nascheme changed the title ~~Add lock-free set contains implemention~~ GH-132657: Add lock-free set contains implementation Jul 12, 2025

bedevere-app bot mentioned this pull request Jul 12, 2025

copy.copy and copy.deepcopy scale poorly with free-threading #132657

Open

Re-generate clinic output.

70a1c1f

nascheme added performance Performance or resource usage topic-free-threading 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 12, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 12, 2025

eendebakpt reviewed Jul 13, 2025

View reviewed changes

nascheme added 2 commits July 14, 2025 10:48

Better markup in NEWS.

64b17af

Remove unicode case for set_compare_frozenset.

6c339f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GH-132657: Add lock-free set contains implementation #132290

GH-132657: Add lock-free set contains implementation #132290

Uh oh!

nascheme commented Apr 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

bedevere-bot commented Jul 12, 2025

Uh oh!

Uh oh!

eendebakpt Jul 13, 2025

Uh oh!

nascheme Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

GH-132657: Add lock-free set contains implementation #132290

Are you sure you want to change the base?

GH-132657: Add lock-free set contains implementation #132290

Uh oh!

Conversation

nascheme commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bedevere-bot commented Jul 12, 2025

Uh oh!

Uh oh!

eendebakpt Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

nascheme Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nascheme commented Apr 8, 2025 •

edited

Loading