I am finding the python client to be very very slow I'm querying the

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

For me, the current best workaround is still <a class="user-mention notranslate" data-

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

this is still very very slow <div class="highlight highlight-source-python notrans

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Python client is slow: takes 20min to query data that takes 6 seconds using a cURL command about influxdb-client-python HOT 13 CLOSED

influxdata commented on May 14, 2024

Python client is slow: takes 20min to query data that takes 6 seconds using a cURL command

from influxdb-client-python.

Comments (13)

bednar commented on May 14, 2024 5

Hi @joranbeasley,

Could you share how your data looks like?

One of the possible speed up could be install a ciso8601. The ciso8601 speed up parsing dates a lot of.

pip install ciso8601

https://github.com/influxdata/influxdb-client-python#installation

Regards

from influxdb-client-python.

ojdo commented on May 14, 2024 1

For me, the current best workaround is still @joranbeasley's idea to go for query_raw() + pandas roughly like this for simple cases with a single table:

import pandas as pd
from influxdb_client import InfluxDBClient
from io import BytesIO

def perform_simple_query(
        query_api,
        organization:str,
        query: str,
        field: str
    ) -> pd.DataFrame:
    """Perform simple query against InfluxDB query API.
    
    Left as an exercise: generalize to results with multiple groups
    """
    response = query_api.query_raw(
        query=query,
        org=organization,
    )
    try:
        df = pd.read_csv(
            BytesIO(response.data),
            skiprows=[0, 1, 2]  # group header rows
        )
    except pd.errors.EmptyDataError:
        return pd.DataFrame()

    df.rename(columns={'_value': field}, inplace=True)
    df.drop(['Unnamed: 0', '_field', '_start', '_stop', 'result',
            '_measurement', 'table'], axis=1, inplace=True)  # customize as needed
    df['_time'] = pd.to_datetime(df['_time'])
    df.set_index('_time', inplace=True)
    return df

from influxdb-client-python.

bednar commented on May 14, 2024

Hi @pjayathissa,

Thanks for an open issue. Could you please share an information how your data looks like (cardinality, amount,...) ?

Regards

from influxdb-client-python.

pjayathissa commented on May 14, 2024

Attached is a zip file of the csv that was extracted when using a cURL function and appending the result to a csv
awaircopy.csv.zip

from influxdb-client-python.

bednar commented on May 14, 2024

Thanks @pjayathissa, I am currently investigate this issue...

from influxdb-client-python.

bednar commented on May 14, 2024

Hi @pjayathissa,

I prepared fixed version in a branch fix/pandas-performance.

If you would like to test it then install client via:

pip install git+https://github.com/influxdata/influxdb-client-python.git@fix/pandas-performance

Regards

from influxdb-client-python.

joranbeasley commented on May 14, 2024

this is still very very slow

# this test returns 194k rows of data
query_api().query(qs,org=org) # ~28s
query_api().query_dataframe(qs,org=org) # ~31.5s
def custom_query_dataframe(qs,org):
      httpResp = query_api().query_raw(qs,org=org)
      headers = [httpResp.readline() for _ in range(3)] # stuff i dont need(I think?... not sure about "groups")
      df = pandas.read_csv(httpResp)
      return df.drop(columns=df.columns[:2]) # some extra stuff i dont need
custom_query_dataframe(qs,org=org) # ~1-2s

from influxdb-client-python.

franz101 commented on May 14, 2024

Painfully slow here as well with the latest dev version

from influxdb-client-python.

bednar commented on May 14, 2024

Hi @franz101,

Could you please share an information how your data looks like - cardinality, amount, example... ?

Which version of Python do you use?

Regards

from influxdb-client-python.

idubinets commented on May 14, 2024

I can confirm that query_api is super slow! 200k records query takes 62 seconds.

do you have some solution or workaround?

from influxdb-client-python.

bednar commented on May 14, 2024

@idubinets, see #371 (comment)

from influxdb-client-python.

sarjarapu commented on May 14, 2024

@bednar 's recommendation of installing the ciso8601 dependency improved my query execution time from 1.23s to 0.34s. Thank you

from influxdb-client-python.

calumroy commented on May 14, 2024

@ojdo Thanks.
For when multiple tables are returend by a raw query I found simply splitting on the characters '\r\n\r\n' and processing each one worked.

response = query_api.query_raw(
    query=query,
    org=organization,
)
df_tables = response.data.split(b'\r\n\r\n')
for df_table in df_tables:
    df = pd.DataFrame()
    if df_table != b'':
        try:
            df = pd.read_csv(
                BytesIO(df_table),
                skiprows=[0, 1, 2]  # group header rows
            )
        except pd.errors.EmptyDataError:
            return pd.DataFrame()

from influxdb-client-python.

Python client is slow: takes 20min to query data that takes 6 seconds using a cURL command about influxdb-client-python HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent