Comments (13)
Hi @joranbeasley,
Could you share how your data looks like?
One of the possible speed up could be install a ciso8601
. The ciso8601
speed up parsing dates a lot of.
pip install ciso8601
Regards
from influxdb-client-python.
For me, the current best workaround is still @joranbeasley's idea to go for query_raw()
+ pandas roughly like this for simple cases with a single table:
import pandas as pd
from influxdb_client import InfluxDBClient
from io import BytesIO
def perform_simple_query(
query_api,
organization:str,
query: str,
field: str
) -> pd.DataFrame:
"""Perform simple query against InfluxDB query API.
Left as an exercise: generalize to results with multiple groups
"""
response = query_api.query_raw(
query=query,
org=organization,
)
try:
df = pd.read_csv(
BytesIO(response.data),
skiprows=[0, 1, 2] # group header rows
)
except pd.errors.EmptyDataError:
return pd.DataFrame()
df.rename(columns={'_value': field}, inplace=True)
df.drop(['Unnamed: 0', '_field', '_start', '_stop', 'result',
'_measurement', 'table'], axis=1, inplace=True) # customize as needed
df['_time'] = pd.to_datetime(df['_time'])
df.set_index('_time', inplace=True)
return df
from influxdb-client-python.
Hi @pjayathissa,
Thanks for an open issue. Could you please share an information how your data looks like (cardinality, amount,...) ?
Regards
from influxdb-client-python.
Attached is a zip file of the csv that was extracted when using a cURL function and appending the result to a csv
awaircopy.csv.zip
from influxdb-client-python.
Thanks @pjayathissa, I am currently investigate this issue...
from influxdb-client-python.
Hi @pjayathissa,
I prepared fixed version in a branch fix/pandas-performance
.
If you would like to test it then install client via:
pip install git+https://github.com/influxdata/influxdb-client-python.git@fix/pandas-performance
Regards
from influxdb-client-python.
this is still very very slow
# this test returns 194k rows of data
query_api().query(qs,org=org) # ~28s
query_api().query_dataframe(qs,org=org) # ~31.5s
def custom_query_dataframe(qs,org):
httpResp = query_api().query_raw(qs,org=org)
headers = [httpResp.readline() for _ in range(3)] # stuff i dont need(I think?... not sure about "groups")
df = pandas.read_csv(httpResp)
return df.drop(columns=df.columns[:2]) # some extra stuff i dont need
custom_query_dataframe(qs,org=org) # ~1-2s
from influxdb-client-python.
Painfully slow here as well with the latest dev version
from influxdb-client-python.
Hi @franz101,
Could you please share an information how your data looks like - cardinality, amount, example... ?
Which version of Python do you use?
Regards
from influxdb-client-python.
I can confirm that query_api is super slow! 200k records query takes 62 seconds.
do you have some solution or workaround?
from influxdb-client-python.
@idubinets, see #371 (comment)
from influxdb-client-python.
@bednar 's recommendation of installing the ciso8601
dependency improved my query execution time from 1.23s to 0.34s. Thank you
from influxdb-client-python.
@ojdo Thanks.
For when multiple tables are returend by a raw query I found simply splitting on the characters '\r\n\r\n' and processing each one worked.
response = query_api.query_raw(
query=query,
org=organization,
)
df_tables = response.data.split(b'\r\n\r\n')
for df_table in df_tables:
df = pd.DataFrame()
if df_table != b'':
try:
df = pd.read_csv(
BytesIO(df_table),
skiprows=[0, 1, 2] # group header rows
)
except pd.errors.EmptyDataError:
return pd.DataFrame()
from influxdb-client-python.
Related Issues (20)
- Handling "Unauthorized Access" Error with MultiprocessingWriter in InfluxDB Client After Service Restart HOT 1
- Can not write a point to the specific time (HH:MM:SS) using Point.time HOT 1
- Pivotted query result with long and float type columns causes ValueError in `_to_value()` HOT 3
- Missing Type Hints HOT 2
- `class Point` doesn't support equality comparison HOT 5
- ur patch is no longer working i have test it multiple time HOT 1
- AttributeError: 'WriteApi' object has no attribute '_subject' HOT 1
- Flux Script parse failed because of missing value with pivot function HOT 1
- Discrepancies between measurement values for batched and synchronous api HOT 1
- Ping throwing exception when DEBUG is True HOT 1
- warnings returned when using example program on https://influxdb-client.readthedocs.io/en/latest/
- README.rst is not rendered correctly on GitHub HOT 3
- Refactor DataFrame Operations to Avoid Chained Assignment and Address FutureWarning in Pandas
- ThreadPoolExecutor alway alive after close batching API and database client HOT 3
- atexit doesnโt work properly in batch mode: "cannot schedule new futures after interpreter shutdown"
- Can't send a custom timestamp to InfluxDB HOT 5
- Doc improvment HOT 3
- Timezone location does not work HOT 3
- "query_data_frame()" triggers "pandas.errors.InvalidIndexError" when the response data table has a column named "result" HOT 1
- bucket stays empty after sending dataframe. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from influxdb-client-python.