Skip to content

query_data_frame severe performance problem in converting _time into datatime64. #187

@max0x7ba

Description

@max0x7ba

Problem description:
I query a dataset from a local InfluxDB which Data Explorer UI queries and plots in 0.02 seconds.

When I load exactly the same dataset into pandas.DataFrame using query_data_frame function it takes 184 seconds. 157 seconds of which cProfile attributes to pandas.to_datetime function, called by PandasDateTimeHelper for each and every row's timestamp. This performance problem seems to originate from the fact that the query receives _time column as text in format %Y-%m-%dT%H:%M:%SZ, which must then be parsed. Complete cProfile output.

It looks like this conversion of _time column represented as uint64 number of nanoseconds to string in ISO datetime format in InfluxDB and then parsing that string back into uint64 number of nanoseconds in influxdb_client is totally unnecessary, if I understand the dataflow correctly.

How do I get rid of these unnecessary conversions for _time column, please?

Is there a way to receive _time column as uint64 number of nanoseconds, which I could cast to pandas.datetime64 with one call to pandas.to_datetime on entire column?

Specifications:

  • Client Version: 1.13.0
  • InfluxDB Version: 2.0.3
  • Platform: Ubuntu 18.04.5 LTS, amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions