-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import math
import typing
import uuid
import pandas
import sqlalchemy
import sqlalchemy.engine
def insert_callback(table: pandas.io.sql.SQLTable, conn: sqlalchemy.engine.base.Connection, keys: list[str], data_iter: typing.Iterable[tuple]) -> None:
data = [list(d) for d in data_iter]
print(f"{data=}")
print(f"{keys=}")
print(f"{[c.name for c in table.table.columns]=}")
sql = f"""
INSERT INTO [{table.name}]({", ".join(f"[{k}]" for k in keys)})
VALUES ({", ".join(f":{i}" for i, _ in enumerate(keys))})
""".strip()
for values in data:
conn.execute(sqlalchemy.text(sql), {str(i): v for i, v in enumerate(values)})
input = {
1: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
2: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
3: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
4: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
5: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
}
df = pandas.DataFrame.from_dict(input, "index")
conn_url = sqlalchemy.engine.URL.create("mssql+pyodbc", query={"odbc_connect": "DRIVER={ODBC Driver 18 for SQL Server};SERVER=.;Trusted_Connection=yes;TrustServerCertificate=yes"})
engine = sqlalchemy.create_engine(conn_url, echo=True)
with engine.connect() as conn:
with conn.begin():
df.to_sql(f"insert_callback_{str(uuid.uuid4())}", conn, index=True, method=insert_callback)
conn.rollback()
Issue Description
When calling pandas.DataFrame.to_sql with index set to True, it gives the the wrong name for the SQL column used for the index.
E.x.: In the code I provided, it creates a column called index as the index but it passed ('index', '') as the index column name in the list of columns.
Below is the output from sql alchemy
2024-06-26 16:02:23,136 INFO sqlalchemy.engine.Engine SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
2024-06-26 16:02:23,136 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-06-26 16:02:23,137 INFO sqlalchemy.engine.Engine SELECT schema_name()
2024-06-26 16:02:23,137 INFO sqlalchemy.engine.Engine [generated in 0.00014s] ()
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine SELECT CAST('test max support' AS NVARCHAR(max))
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine [generated in 0.00013s] ()
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine SELECT 1 FROM fn_listextendedproperty(default, default, default, default, default, default, default)
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine [generated in 0.00011s] ()
2024-06-26 16:02:23,139 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-06-26 16:02:23,151 INFO sqlalchemy.engine.Engine SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE ([INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) OR [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max))) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))
2024-06-26 16:02:23,151 INFO sqlalchemy.engine.Engine [generated in 0.00032s] ('BASE TABLE', 'VIEW', 'insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06', 'dbo')
2024-06-26 16:02:23,153 INFO sqlalchemy.engine.Engine
CREATE TABLE [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06] (
[index] BIGINT NULL,
[('A', 'X')] FLOAT(53) NULL,
[('A', 'Y')] FLOAT(53) NULL,
[('B', 'X')] FLOAT(53) NULL,
[('B', 'Y')] FLOAT(53) NULL
)
2024-06-26 16:02:23,153 INFO sqlalchemy.engine.Engine [no key 0.00014s] ()
2024-06-26 16:02:23,155 INFO sqlalchemy.engine.Engine CREATE INDEX [ix_insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06_index] ON [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06] ([index])
2024-06-26 16:02:23,155 INFO sqlalchemy.engine.Engine [no key 0.00012s] ()
data=[[1, None, None, None, None], [2, None, None, None, None], [3, None, None, None, None], [4, None, None, None, None], [5, None, None, None, None]]
keys=["('index', '')", "('A', 'X')", "('A', 'Y')", "('B', 'X')", "('B', 'Y')"]
[c.name for c in table.table.columns]=['index', "('A', 'X')", "('A', 'Y')", "('B', 'X')", "('B', 'Y')"]
2024-06-26 16:02:23,158 INFO sqlalchemy.engine.Engine INSERT INTO [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06]([('index', '')], [('A', 'X')], [('A', 'Y')], [('B', 'X')], [('B', 'Y')])
VALUES (?, ?, ?, ?, ?)
2024-06-26 16:02:23,158 INFO sqlalchemy.engine.Engine [generated in 0.00016s] (1, None, None, None, None)
2024-06-26 16:02:23,160 INFO sqlalchemy.engine.Engine ROLLBACK
Expected Behavior
The sql column name of the index should be in the list of keys/columns provided to the call back.
Installed Versions
import pandas as pd
pd.show_versions()
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.12.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 70.1.0
pip : 24.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.2.0
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.5.0
fsspec : 2024.6.0
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.3
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.30
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None