Skip to content

BUG: Unexpected read_csv parse_dates behavior #57512

@kounoupis

Description

@kounoupis

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

#!/opt/apps/anaconda3/bin/python

import pandas
from io import StringIO

if __name__ == "__main__":
    csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
    ''')

    columns={'Ticker': str, 'Last Update Timestamp': str}

    df = pandas.read_csv(csv, usecols=columns.keys(), dtype=columns, parse_dates=['Last Update Timestamp'])

    print(pandas.__version__)
    print(df)

Issue Description

parse_dates in combination with dtype does not correctly identify date column as a DateTime object and, in addition, converts the column into int64 (that are not even valid epochs).

This used to work correctly with pandas 1.4.0

The output of the above example is:

2.2.0
  Ticker Last Update Timestamp
0    AAA   1706547859000000000
1    AAA   1706588397000000000
2   ABEQ   1707402831000000000
3   ABEQ   1707231897000000000
4   ABEQ   1707810791000000000

Expected Behavior

#!/opt/apps/anaconda3/bin/python

import pandas
from io import StringIO

if __name__ == "__main__":
    csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
    ''')

    columns={'Ticker': str, 'Last Update Timestamp': str}

    df = pandas.read_csv(csv, parse_dates=['Last Update Timestamp'])

    print(pandas.__version__)
    print(df)

Output:

> ./date.py
2.2.0
  Ticker Last Update Timestamp
0    AAA   2024-01-29 17:04:19
1    AAA   2024-01-30 04:19:57
2   ABEQ   2024-02-08 14:33:51
3   ABEQ   2024-02-06 15:04:57
4   ABEQ   2024-02-13 07:53:11

Installed Versions

/opt/apps/anaconda3/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : f538741
python : 3.11.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-9-amd64
Version : #1 SMP Debian 5.10.70-1 (2021-09-30)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.8.2
setuptools : 69.0.3
pip : 24.0
Cython : None
pytest : 8.0.0
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : 3.1.9
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.21.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.10.0
gcsfs : None
matplotlib : 3.8.0
numba : 0.59.0
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : 2.0.24
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.1.1
xlrd : None
zstandard : 0.22.0
tzdata : 2023.4
qtpy : 2.4.1
pyqt5 : None
None

Metadata

Metadata

Assignees

Labels

BugDatetimeDatetime data dtypeDtype ConversionsUnexpected or buggy dtype conversionsIO CSVread_csv, to_csv

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions