-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-130819: Update tarfile.py#_create_gnu_long_header
to align with GNU Tar
#130820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
ce2e670
to
24b90cf
Compare
Misc/NEWS.d/next/Library/2025-03-04-03-14-44.gh-issue-130819.Dphgb6.rst
Outdated
Show resolved
Hide resolved
24b90cf
to
283b34e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you motivate the choice for this? namely is there a real benefit between having an explicit user+mode rather than letting the "defaults"? And more importantly, can you cite the relevant manpage / specs where we can find this?
Note: whether this is accpeted or not, this should be treated as a feature request and not a bug IMO. As such, a What's New entry will need to be created, unless the motivation behind this change is not sufficient (in which case we would close the issue as "not planned")
Lib/tarfile.py
Outdated
info["mode"] = 0o100644 | ||
info["uname"] = "root" | ||
info["gname"] = "root" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the specs are these decided?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Oh btw, please reply on the issue instead of the PR (I'll repost my comment above) |
80b8591
to
5282dd6
Compare
I have made the requested changes; please review again |
Thanks for making the requested changes! @picnixz: please review the changes made to this pull request. |
Lib/tarfile.py
Outdated
_unames = {} # Cached mappings of uid=0 -> uname | ||
_gnames = {} # Cached mappings of gid=0 -> gname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we keep per-instance caches instead of per-class caches, even for 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, sorry but there seems no available TarFile
object for _create_gnu_long_header
to cache the querying result:
- a
TarFile
instance does have cache members ofself._unames: Dict[uid, uname]
- however, across the calling stack of
TarInfo.tobuf() -> TarInfo.create_gnu_header() -> TarInfo._create_gnu_long_header()
, there's noTarFile
argument.
If we add a TarFile
into TarInfo.tobuf(...)
, then this PR may break existing subclasses of TarInfo
. Is it indeed necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, to make this cache safe for the free-threading version of CPython, I've replaced the _unames = {}
with _name_uid0 = None
(and _gnames = {}
with _name_gid0 = None
).
Do you developers have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@picnixz Any idea about where to put the cache object?
47861a1
to
ca20885
Compare
ca20885
to
02bbde5
Compare
I have made the requested changes; please review again. |
Thanks for making the requested changes! @picnixz: please review the changes made to this pull request. |
@@ -1708,6 +1708,15 @@ sysconfig | |||
(Contributed by Xuehai Pan in :gh:`131799`.) | |||
|
|||
|
|||
tarfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be moved in whatsnew/3.15.rst now
_name_uid0 = None # Cached uname of uid=0 | ||
_name_gid0 = None # Cached gname of gid=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant before is: why using a class variable? the issue is that once we deduce uid=0
, we're stuck with it for the entire Python process.
EDIT: I didn't see you comment, my bad. Then we need to think of another solution because storing them in TarFile
feels wrong. What we can do is to add a private attribute in TarInfo and populate it from TarFile. When writing, if the attribute is not set, we populate it eagerly (and thus subclasses of TarInfo will be slower but they won't be broken). Or instead, we can even just dump them with the legacy way (namely without aligning with GNU Tar). Only default TarFile and TarInfo objects will be having this new feature.
More generally, we should be able to set cached contextual information on TarInfo objects coming from a TarFile.
The latest
tarfile
may still generate a file slightly different with the one made by GNU Tar, whenever a path name is longer than 100 bytes. So this PR tries to avoid the difference.More details are in #130819 .