-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Open
Labels
stdlibPython modules in the Lib dirPython modules in the Lib dirtopic-pathlibtype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Suprisingly (contrary to its name and being a generator), Path.iterdir()
does not stream directory entries:
It reads all directory entries into memory before yielding the first entry.
This can cause excessive memory usage when "iterating" over very large directories.
Users would expect by default that .iterdir()
operates in a streaming way, like UNIX find
or readdir()
, streaming e.g. the results of the underlying system calls such as getdents64()
on Linux.
But it does entries = list(scandir_it)
instead:
cpython/Lib/pathlib/__init__.py
Lines 835 to 843 in c419af9
def iterdir(self): | |
"""Yield path objects of the directory contents. | |
The children are yielded in arbitrary order, and the | |
special entries '.' and '..' are not included. | |
""" | |
root_dir = str(self) | |
with os.scandir(root_dir) as scandir_it: | |
entries = list(scandir_it) |
This should be documented, and can hopefully be fixed without too much breakage.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Related issues
Linked PRs
Metadata
Metadata
Assignees
Labels
stdlibPython modules in the Lib dirPython modules in the Lib dirtopic-pathlibtype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error