Skip to content

gh-110495: ensure ntpath.realpath() result is a final, normalized pathname. #110751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

moonsikpark
Copy link
Contributor

@moonsikpark moonsikpark commented Oct 12, 2023

By not breaking out of the loop when _readlink_deep() is successful, we can ensure that the result of ntpath.realpath() is a final, normalized pathname.

@moonsikpark
Copy link
Contributor Author

@eryksun can you review this?

@encukou
Copy link
Member

encukou commented Mar 8, 2024

cc @serhiy-storchaka (os.path expert) & @barneygale (with the reproducer in the issue, pathlib.Path('C:\\testdir\\symlink2\\filelink.txt').resolve() also shows the symlink rather than file)

@moonsikpark moonsikpark force-pushed the gh-110495-ensure-ntpath-realpath-final-normalized-pathname branch from 1f8656e to 3ae56f0 Compare March 19, 2024 05:41
@eryksun
Copy link
Contributor

eryksun commented Jun 17, 2024

I'm not having much luck getting this to work with a common seen set. I think an independent set is needed in _getfinalpathname_nonstrict(). For example:

    def _getfinalpathname_nonstrict(path):
        # These error codes indicate that we should stop resolving the
        # path and return the value we currently have.
        allowed_winerror = (
            1,    # ERROR_INVALID_FUNCTION
            2,    # ERROR_FILE_NOT_FOUND
            3,    # ERROR_DIRECTORY_NOT_FOUND
            5,    # ERROR_ACCESS_DENIED
            21,   # ERROR_NOT_READY (e.g. drive with no media)
            32,   # ERROR_SHARING_VIOLATION (e.g. NTFS paging file)
            50,   # ERROR_NOT_SUPPORTED
            53,   # ERROR_BAD_NETPATH
            65,   # ERROR_NETWORK_ACCESS_DENIED
            67,   # ERROR_BAD_NET_NAME (e.g. remote server unavailable)
            87,   # ERROR_INVALID_PARAMETER
            123,  # ERROR_INVALID_NAME
            161,  # ERROR_BAD_PATHNAME
            1920, # ERROR_CANT_ACCESS_FILE
            1921, # ERROR_CANT_RESOLVE_FILENAME (e.g. can't follow symlink)
        )

        # Non-strict algorithm is to find as much of the target
        # directory as we can and join the rest.
        tail = path[:0]
        seen = set()
        while path and normcase(path) not in seen:
            seen.add(normcase(path))
            try:
                path = _getfinalpathname(path)
                return join(path, tail) if tail else path
            except OSError as ex:
                if ex.winerror not in allowed_winerror:
                    raise
                # Split off the base name, and break if it's root.
                parent_path, name = split(path)
                if not name:
                    break
                try:
                    # Try resolving the final component as a link. If
                    # successful, continue resolving the real path.
                    new_path = _readlink_deep(path)
                    if new_path != path:
                        path = new_path
                        continue
                except OSError:
                    pass
                # For some error cases, try to get the real name from
                # the directory entry.
                if ex.winerror in (1, 5, 32, 50, 87, 1920, 1921):
                    try:
                        name = _findfirstfile(path)
                    except OSError:
                        pass
                path = parent_path
                tail = join(name, tail) if tail else name
        if path:
            return join(path, tail) if tail else path
        return tail

@eryksun
Copy link
Contributor

eryksun commented Jun 18, 2024

Some of the checks in test_realpath_broken_symlinks() that validate an incomplete resolution of the final path would need to be modified.

# bpo-38453: We no longer recursively resolve segments of relative
# symlinks that the OS cannot resolve.
self.assertPathEqual(ntpath.realpath(r"broken1"),
ABSTFN + r"\broken\bar")
self.assertPathEqual(ntpath.realpath(r"broken1\baz"),
ABSTFN + r"\broken\bar\baz")
self.assertPathEqual(ntpath.realpath("broken2"),
ABSTFN + r"\self\self\missing")
self.assertPathEqual(ntpath.realpath("broken3"),
ABSTFN + r"\subdir\parent\subdir\parent\missing")

New checks:

            self.assertPathEqual(ntpath.realpath(r"broken1"),
                                 ABSTFN + r"\missing\bar")
            self.assertPathEqual(ntpath.realpath(r"broken1\baz"),
                                 ABSTFN + r"\missing\bar\baz")
            self.assertPathEqual(ntpath.realpath("broken2"),
                                 ABSTFN + r"\missing")
            self.assertPathEqual(ntpath.realpath("broken3"),
                                 ABSTFN + r"\missing")

The corresponding bytes path checks would also have to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants