Skip to content

BUG: numpy.loadtxt reads only 50000 lines when skip_rows >= max_rows #28319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 18, 2025

Conversation

l09rin
Copy link
Contributor

@l09rin l09rin commented Feb 11, 2025

Fixed wrong behavior of loadtxt when reading files in chunks.

Misnamed variable (skiprows instead of skiplines) in function _read in numpy/lib/npyio_impl.py caused the function to skip lines at every chunk read.

Closes #28315

…iable skiplines as skiprows; added test in numpy/lib/tests/test_loadtxt.py
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Feb 12, 2025
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think we could just put it in as is also. I noted a few nits, some of which would be nice to address.

I wonder if we should flag this as a larger fix in the release notes (i guess as a "change" see doc/release/upcoming_changes/README.txt), but we don't really have a habit around it, so maybe not.
(Mainly because it previously did silently return wrong things, and who knows if someone even worked around that in some weird way...)

@l09rin
Copy link
Contributor Author

l09rin commented Feb 17, 2025

Thanks, I think we could just put it in as is also. I noted a few nits, some of which would be nice to address.

Thank you for pointing out, I applied all the suggested changes!

I wonder if we should flag this as a larger fix in the release notes (i guess as a "change" see doc/release/upcoming_changes/README.txt), but we don't really have a habit around it, so maybe not. (Mainly because it previously did silently return wrong things, and who knows if someone even worked around that in some weird way...)

Unfortunately I don't know that much about these policies, I have no clue on that...

Personal curiosity: why does style matter?

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistent style helps readability, you can argue its a nit sometimes, but many nits make for a less nice experience reading the code. (And sure, maybe these nits don't bother you, but it does bother some.)

Let's not worry about a release note. I don't think there was any way to work around this bug (besides just not using skip_rows).


Will squash-merge later after tests passed.

@seberg
Copy link
Member

seberg commented Feb 18, 2025

Just for the record, macOS tests ran into the matmul warning, the TSAN test just hung (freebsd is a general failure).

@seberg seberg merged commit 9159ed1 into numpy:main Feb 18, 2025
65 of 68 checks passed
@l09rin
Copy link
Contributor Author

l09rin commented Feb 18, 2025

Indeed I see... I thought there was some technical reason besides that...
Thanks for the explanation :)

charris pushed a commit to charris/numpy that referenced this pull request Feb 21, 2025
…umpy#28319)

* fixed bug in function _read in numpy/lib/_npyio_impl.py, misnamed variable skiplines as skiprows; added test in numpy/lib/tests/test_loadtxt.py

* fixed sintax in test_loadtxt.py

* changed use of mkstemp with use of tmpdir provided by pytest

* fixed bug in use of tmpdir in loadtxt test

* Update numpy/lib/tests/test_loadtxt.py

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

* Update file numpy/lib/tests/test_loadtxt.py

* Update file numpy/lib/tests/test_loadtxt.py

* Update numpy/lib/tests/test_loadtxt.py

---------

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: numpy.loadtxt reads only 50000 lines when skip_rows >= max_rows
3 participants