Giter VIP home page Giter VIP logo

Comments (13)

bednar avatar bednar commented on May 14, 2024 5

Hi @joranbeasley,

Could you share how your data looks like?

One of the possible speed up could be install a ciso8601. The ciso8601 speed up parsing dates a lot of.

pip install ciso8601

Regards

from influxdb-client-python.

ojdo avatar ojdo commented on May 14, 2024 1

For me, the current best workaround is still @joranbeasley's idea to go for query_raw() + pandas roughly like this for simple cases with a single table:

import pandas as pd
from influxdb_client import InfluxDBClient
from io import BytesIO

def perform_simple_query(
        query_api,
        organization:str,
        query: str,
        field: str
    ) -> pd.DataFrame:
    """Perform simple query against InfluxDB query API.
    
    Left as an exercise: generalize to results with multiple groups
    """
    response = query_api.query_raw(
        query=query,
        org=organization,
    )
    try:
        df = pd.read_csv(
            BytesIO(response.data),
            skiprows=[0, 1, 2]  # group header rows
        )
    except pd.errors.EmptyDataError:
        return pd.DataFrame()

    df.rename(columns={'_value': field}, inplace=True)
    df.drop(['Unnamed: 0', '_field', '_start', '_stop', 'result',
            '_measurement', 'table'], axis=1, inplace=True)  # customize as needed
    df['_time'] = pd.to_datetime(df['_time'])
    df.set_index('_time', inplace=True)
    return df

from influxdb-client-python.

bednar avatar bednar commented on May 14, 2024

Hi @pjayathissa,

Thanks for an open issue. Could you please share an information how your data looks like (cardinality, amount,...) ?

Regards

from influxdb-client-python.

pjayathissa avatar pjayathissa commented on May 14, 2024

Attached is a zip file of the csv that was extracted when using a cURL function and appending the result to a csv
awaircopy.csv.zip

from influxdb-client-python.

bednar avatar bednar commented on May 14, 2024

Thanks @pjayathissa, I am currently investigate this issue...

from influxdb-client-python.

bednar avatar bednar commented on May 14, 2024

Hi @pjayathissa,

I prepared fixed version in a branch fix/pandas-performance.

If you would like to test it then install client via:

pip install git+https://github.com/influxdata/influxdb-client-python.git@fix/pandas-performance

Regards

from influxdb-client-python.

joranbeasley avatar joranbeasley commented on May 14, 2024

this is still very very slow

# this test returns 194k rows of data
query_api().query(qs,org=org) # ~28s
query_api().query_dataframe(qs,org=org) # ~31.5s
def custom_query_dataframe(qs,org):
      httpResp = query_api().query_raw(qs,org=org)
      headers = [httpResp.readline() for _ in range(3)] # stuff i dont need(I think?... not sure about "groups")
      df = pandas.read_csv(httpResp)
      return df.drop(columns=df.columns[:2]) # some extra stuff i dont need
custom_query_dataframe(qs,org=org) # ~1-2s

from influxdb-client-python.

franz101 avatar franz101 commented on May 14, 2024

Painfully slow here as well with the latest dev version

from influxdb-client-python.

bednar avatar bednar commented on May 14, 2024

Hi @franz101,

Could you please share an information how your data looks like - cardinality, amount, example... ?

Which version of Python do you use?

Regards

from influxdb-client-python.

idubinets avatar idubinets commented on May 14, 2024

I can confirm that query_api is super slow! 200k records query takes 62 seconds.

do you have some solution or workaround?

from influxdb-client-python.

bednar avatar bednar commented on May 14, 2024

@idubinets, see #371 (comment)

from influxdb-client-python.

sarjarapu avatar sarjarapu commented on May 14, 2024

@bednar 's recommendation of installing the ciso8601 dependency improved my query execution time from 1.23s to 0.34s. Thank you

from influxdb-client-python.

calumroy avatar calumroy commented on May 14, 2024

@ojdo Thanks.
For when multiple tables are returend by a raw query I found simply splitting on the characters '\r\n\r\n' and processing each one worked.

response = query_api.query_raw(
    query=query,
    org=organization,
)
df_tables = response.data.split(b'\r\n\r\n')
for df_table in df_tables:
    df = pd.DataFrame()
    if df_table != b'':
        try:
            df = pd.read_csv(
                BytesIO(df_table),
                skiprows=[0, 1, 2]  # group header rows
            )
        except pd.errors.EmptyDataError:
            return pd.DataFrame()

from influxdb-client-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.