synergetic-engineering / odata-influxdb Goto Github PK

View Code? Open in Web Editor NEW

31.0 8.0 16.0 156 KB

An OData compliant API for accessing data stored in influxdb

License: MIT License

Python 100.00%

odata-server influxdb odata-influxdb

odata-influxdb's Introduction

influxdb_odata

This project allows you to use the OData REST API to access InfluxDB data.

This enables software such as Power BI, Excel, SAP, Drupal, and LINQPad to access InfluxDB data. (Note: only some of these have been tested.)

Requirements:

python: Currently requires Python 2. Tested on latest 2.7.

pyslet: The OData functionality from the pyslet project is used in this project.

Usage:

Run the following command to generate a sample config file:

python server.py --makeSampleConfig

Update the dsn in the conf file to reflect your InfluxDB server location.

You can change the hostname/port for the API server by updating service_advertise_root, server_listen_interface, and server_listen_port in the conf file.

Start your odata endpoint server with python server.py.

Point an OData browser to http://hostname:8080/

Production:

The recommended production deployment is as follows:

The odata-influxdb service is stateless/sessionless. An XML file is generated upon starting the server to describe your InfluxDB metadata structure in a way that pyslet can understand. You can decrease server startup time drastically by disabling this feature in your .conf file ([metadata] -> autogenerate=no) after it has been generated once. You'll need to re-enable it if your InfluxDB structure changes. You can also keep this feature disabled if you need to hand-edit your .xml file to limit/change what is browseable to OData clients.

It is recommended that you run InfluxDB with auth enabled. Odata-influxdb passes through http basic auth credentials to your InfluxDB server. You can specify a user in your .conf file dsn settings. Example: [influxdb] dsn=influxdb://user:pass@localhost:8086

The default setting [influxdb] max_items_per_query=50 is set extremely conservatively. It is recommended to increase this value to as high as 1000 depending on your testing of response times.

Tests:

Run unit tests with python tests.py

OData layout:

Upon startup, the server pulls the metadata from your InfluxDB server (database names, measurement names, field keys, and tag keys).

Each measurement is set up as an OData table. All field keys and tag keys from the InfluxDB database are included in the table, but many values may be null depending on your InfluxDB setup. You can use OData $select query options to limit which columns are returned.

Filters

OData $filter spec is supported, but has some limitations.

Supported operators are:

gt (greater than, >)
ge (greater than or equal to, >=)
lt (less than, <)
le (less than or equal to, <=)
eq (equals, =)
ne (not equal to, !=)
and (boolean and)

Grouping

This project currently depends on pyslet currently gives us OData 2 support, which does not include grouping, so this project provies a non-standard implementation of grouping operations. Because of this, you cannot use GUI tools to form the grouping queries.

InfluxDB requires a WHERE clause on the time field when grouping by time.*
The release version of pyslet had a bug (now fixed) where you could not use a field called "time" so use "timestamp" to refer to InfluxDB's "time" field.*
When using aggregate functions without grouping by '*', only influxdb fields will be populated in the result, not tags. It is recommended to use influxdbgroupby=* in your queries, as it is not very expensive and allows flexibility in your OData client processing.

Example queries:

Group by day. Aggregate the mean of each field.

Query URL:

/db?$filter=timestamp ge datetime'2017-01-01T00:00:00' and timestamp le datetime'2017-03-01T00:00:00'&$top=1000&groupByTime=1h&aggregate=mean

Resulting InfluxDB query:

SELECT mean(*) FROM measurement 
  WHERE time >= '2017-01-01 AND time <= '2017-03-01'
  GROUP BY time(1d)

Group by day. Aggregate the mean of each field. Also group by all tag keys

Query URL:

/db?$filter=timestamp ge datetime'2017-01-01T00:00:00' and timestamp le datetime'2017-03-01T00:00:00'&$top=1000&groupByTime=1h&aggregate=mean&influxgroupby=*

Resulting InfluxDB query:

SELECT mean(*) FROM measurement 
  WHERE time >= '2017-01-01' AND time <= '2017-03-01' 
  GROUP BY *,time(1d)

odata-influxdb's People

Contributors

Stargazers

Watchers

Forkers

soundstripe area42 ale-jin simringuet robertpaulson90 crh rapel aesmin cidermole cluck veeruds yuxiang88 dorbye mipa-de anthodingo

odata-influxdb's Issues

Handle spaces in influxdb measurement names

InfluxDB is very lax about checking names of dbs and measurements (see influxdb issue#6011) and we can't possibly handle all cases, but spaces are probably common so we should handle that.

Not seeing Grouping statements in Influx queries

Hello again.

We've been attempting to use the recently added Grouping features, but we're not having any luck. It appears the requests are essentially ignored.

Here's one of our queries constructed per the example in the documentation:

/db__WaterTemperature?$filter=timestamp%20gt%20datetime%272017-06-28T15:57:06%27&$top=1000&groupByTime=5m&aggregate=mean&influxgroupby=*

The above processes without error but produces a list of all data points with their raw timestamps and no resampling of the data. We think we're constructing the URL properly.

In the logs for odata-influxdb, we see the full query URL come through. However, we don't see any grouping or averaging statements in the resulting Influx query. Below is a log entry produced by the above URL. It appears to be missing the GROUP BY and SELECT mean(*) Influx query terms.

Querying InfluxDB: SELECT * FROM "WaterTemperature" WHERE time > '2017-06-28 15:57:06' LIMIT 1000
##.###.##.### - - [28/Jun/2017 16:57:09] "GET /db__WaterTemperature?$filter=timestamp%20gt%20datetime%272017-06-28T15:57:06%27&$top=1000&groupByTime=5m&aggregate=mean&influxgroupby=* HTTP/1.1" 200 -

Are we overlooking something? To our knowledge, save for a documentation update on June 6, we're using the latest and greatest version of odata-influxdb. Any thoughts appreciated.

Handle spaces in key/field names

The current version doesn't support spaces in the names of the fields in a measurement. So a $select statement fails when keys with spaces are provided. I see a similar thread (closed issue #9 ) which handles spaces in measurement name. Any heads up on this would really help.

Add more tests

Unit test as much non-trivial code as possible

Implement mock responses for InfluxDB calls so it can be tested without having influx running (i.e. we can use CI)

All fields across all measurements advertised for each measurement?

After expanding our measurements recently, we've noticed a seeming oddity. If measurement Alpha has fields 'foo' and 'bar' while measurement Beta has fields 'bar' and 'baz', then the OData feed includes fields 'foo', 'bar', and 'baz' for both Alpha and Beta. Is this expected?

Methods renamed in pyslet.odata2.client and influxdb.client

Hello,

I was trying to run python server.py --makeSampleConfig and received the following error message:

Traceback (most recent call last): File "server.py", line 16, in <module> from influxdbds import InfluxDBEntityContainer File "/test/odata-influxdb/influxdbds.py", line 9, in <module> from pyslet.odata2.core import EntityCollection, CommonExpression, PropertyExpression, BinaryExpression, \ ImportError: cannot import name FormatExpand

It turns out that FormatExpand and FormatSelect were renamed in pyslet.odata2.client to format_expand and format_select, respectively.

So, I changed the import on line 10 of influxdbds.py and the corresponding calls on line 389 and line 392. Then, running python server.py --makeSampleConfig successfully created the sample.conf file.

Likewise, the FormatSysQueryOptions method was renamed in pyslet.odata2.client to format_sys_query_options. So, I updated the call on line 405 of influxdbds.py.

Additionally, from_DSN was renamed in influxdb.client to from_dsn. So, updated the respective calls in influxdbmeta.py (line 57) and influxdbds.py (line 50).

Werkzeug dependency not mentioned

Hi,

Thanks for making this!
I'm trying this on Windows 7 64Bit. Just installed Python and trying server.py:

python server.py --makeSampleConfig
Traceback (most recent call last):
  File "server.py", line 8, in <module>
    from werkzeug.wrappers import AuthorizationMixin, BaseRequest, Response
ImportError: No module named werkzeug.wrappers

This dependency is not mentioned in Readme.md

Reflect nested structure of influx database -> measurement

at the moment, to get the thing working the influx databases and measurements have been flattened into single resources from the odata api perspective

It'd be great if we could separate these out as nested resources to better reflect how influx is structured

How to form queries with time filters?

We're back. Our setup is working fairly well (thanks again), but we've hit a stumbling block.

As InfluxDB is a time-series database we'd like to execute time-based queries. So far we've had no luck. Of course, we're new to OData and there are limitations to the present state of this project, so it's entirely possible we're simply doing it wrong.

With the following URL that filters by ‘Location’, we get entries for that location that look like below:

http://odata.acme.com:8080/sensors__Water__sp__Temperature?$top=10&$filter=Location%20eq%20%27South%20Tank%27

<entry>
  <id>
    http://odata.acme.com:8080/sensors__Water__sp__Temperature(datetime'2017-01-12T01%3A00%3A24')
  </id>
  <title type="text"/>
  <updated>2017-05-26T19:43:33Z</updated>
  <link href="http://odata.acme.com:8080/sensors__Water__sp__Temperature(datetime'2017-01-12T01%3A00%3A24')" rel="edit"/>
  <category scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" term="InfluxDBSchema.sensors__Water__sp__Temperature"/>
  <content type="application/xml">
    <m:properties>
      <d:time>2017-01-12T01:00:24</d:time>
      <d:Location>South Tank</d:Location>
      <d:Value>27.67</d:Value>
      <d:Units>°C</d:Units>
    </m:properties>
  </content>
</entry>

This looks right and makes sense.

However, if we change the query to specifically reference the timestamp in the above entry, we see an error.

http://odata.acme.com:8080/sensors__Water__sp__Temperature?$top=10&$filter=time%20eq%20datetime%272017-01-12T01%3A24%3A35%27

<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
  <code>InvalidSystemQueryOption</code>
  <message>
    Invalid System Query Option: $filter : ParserError: expected time literal at [4]
  </message>
</error>

Questions:

What is the proper way to specify a datetime literal in a query? According to the OData v2 spec, we think we're doing it correctly -- ex. datetime'2000-12-12T12:00'.
Is it possible there's a mismatch between odata-influxdb and our version of InfluxDB (1.1)?
Is the project missing support among its implemented filter operators (gt, lt, eq, ne) for time values? If so, how might we go about adding that?

Bonus Question:

Using odata-influxdb as currently implemented, is it possible to construct a generic query URI that returns data for the most recent x timespan without having to embed a reference date (e.g. Dear OData source, please give me all the data for the most recent two weeks in your datastore.)?

Problem loading data with fractional time stamps

Influx allows for time stamps down to the nano second

So far, we've just been rounding our data for ease of use, but when I tried loading up some data with the very precise time, there was an error in the server

I've created a measure IssueWithFractionalTimeStamps in our dev influxdb that

Traceback (most recent call last):
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\odata2\server.py", line 328, in __call__
    request, environ, start_response, response_headers)
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\odata2\server.py", line 1547, in HandleRequest
    request, environ, start_response, response_headers)
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\odata2\server.py", line 773, in HandleRequest
    response_headers)
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\odata2\server.py", line 1113, in ReturnEntityCollection
    data = str(doc)
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\xml20081126\structures.py", line 347, in __str__
    self.WriteXML(s, EscapeCharData7)
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\xml20081126\structures.py", line 548, in WriteXML
    for s in self.GenerateXML(escapeFunction, tab):
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\xml20081126\structures.py", line 544, in GenerateXML
    for s in self.root.GenerateXML(escapeFunction, '', tab, root=True):
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\xmlnames20091208.py", line 376, in GenerateXML
    child = children.next()
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\xml20081126\structures.py", line 1581, in GetCanonicalChildren
    iChild = children.next()
  File "C:\Projects\soundstripe\odata-influxdb\env\lib\site-packages\pyslet\odata2\core.py", line 3255, in GetChildren
    for entity in self.collection.iterpage():
  File "C:\Projects\soundstripe\odata-influxdb\influxdbds.py", line 211, in iterpage
    for e in self.itervalues():
  File "C:\Projects\soundstripe\odata-influxdb\influxdbds.py", line 106, in expand_entities
    for e in entityIterable:
  File "C:\Projects\soundstripe\odata-influxdb\influxdbds.py", line 129, in _generate_entities
    t = datetime.datetime.strptime(m['time'], '%Y-%m-%dT%H:%M:%SZ')
  File "c:\anaconda2\Lib\_strptime.py", line 332, in _strptime
    (data_string, format))
ValueError: time data '2017-05-08T23:09:47.685131794Z' does not match format '%Y-%m-%dT%H:%M:%SZ'

Guessing it'll just be a matter of adding to that line 129 in influxdbds.py: t = datetime.datetime.strptime(m['time'], '%Y-%m-%dT%H:%M:%SZ'), but worth adding a test into the suite for

note: not sure if the term 'fractional time stamp' is very correct

Any Python3 supports for this project in the future?

Hi Developing Team,

This is Zion here, I came across this great project and found out this is super suitable for our usage. However, I am using Python3 so I am wondering will there be future support for Python3 version? Even though I have already figured out the Python3 version code from my side with only 1 testcase failed. So far so good.

Or I can make a pull request to create a new branch like py3_version, for other people's convenience?

Anyway, thank you for your time and contribution!

Best regards,
Zion

Embedded service URL different from server URL

I think I found a little glitch in using the project when deployed in complex networking environments (e.g. virtual machine hosts/guests, Docker containers, NAT port forwarding, etc.).

In such environments I was able to access the base OData feed but not the entities within. I changed the code such that the URL embedded in the served XML is now configurable. I added an entry service_advertise_root in the [server] section of the configuration file and modified server.py:

def configure_app(c, doc):
    service_root = c.get('server', 'service_advertise_root')
    logger.info("Advertising service at %s" % service_root)
    app = ReadOnlyServer(serviceRoot=service_root)
    app.SetModel(doc)
    return app

Once I did the above, I was able to access our data with OData dev tools (e.g. Sesame) and ultimately able to connect up Power BI with our production system. In our usage Influx and odata-influxdb are both running in Docker containers with NAT port forwarding to the host machine.

We're trying to integrate InfluxDB with Power BI. Do the errors we're seeing mean OData v3 / v4 is required?

Technically this is a question more than an issue and more to the development team than to the project. I couldn't find a great way to submit this other than as an issue.

My organization is working with InfluxDB for time-series sensor data. It's been great thus far. We're also using Microsoft's Power BI tool for some of our analytics. Power BI does not natively support InfluxDB†, but it can work with a handful of generic data access formats including OData.

We were thrilled to find your odata-influxdb project. Yesterday I set up an Influx instance, Power BI, and your odata-influxdb project. Everything was going along pretty well until I discovered that the pyslet project only support OData v2, and Power BI only works with OData v3 / v4.

We are newbies at best in working with OData. Given that you have successfully wired up InfluxDB as an OData feed, we consider you something of domain experts. According to the developers, the pyslet project will not be supporting OData v3 for a while yet. Do you have any suggestions on pulling together more recent OData version support for InfluxDB? That is, what projects or starting points would you point us to in order to build that support as easily as possible? Would it be possible to patch onto your project some sort of translation from OData v2 to v3?

Thanks in advance for anything you can suggest.

† We've learned that Microsoft now has native InfluxDB support in their internal development schedule for Power BI. We just don't know when it will be available. It seems likely they will provide it as there's growing interest in supporting Influx. We're just trying to figure out a stopgap between then and now.

KeyError: u'boolean'

Hi,

I'm trying to use your connector, but as i run the server.py, i always get this error:
KeyError: u'boolean'

I'm using Influxdb 1.7, all requierments have been installed with pip install.
When i run tests.py, the result seems to be ok:

/usr/local/lib/python2.7/dist-packages/requests/init.py:83:RequestsDependencyWarning: Old version of cryptography ([1, 2, 3]) may cause slowdown.
warnings.warn(warning, RequestsDependencyWarning)
Querying InfluxDB: SELECT * FROM "measurement1"
...Querying InfluxDB: SELECT COUNT() FROM "measurement1"
Querying InfluxDB: SELECT COUNT() FROM "measurement1"
Querying InfluxDB: SELECT * FROM "measurement1" LIMIT 200
Querying InfluxDB: SELECT COUNT() FROM "measurement1"
Querying InfluxDB: SELECT COUNT() FROM "measurement1"
Querying InfluxDB: SELECT * FROM "measurement1" LIMIT 200 OFFSET 200
Querying InfluxDB: SELECT COUNT(*) FROM "measurement1"
..
Ran 8 tests in 0.257s

OK.

The config sample file is well generated, so i copied it to production.conf, then edit it to put my conf:

[server]
service_advertise_root = http://xxxxxx:8080
server_listen_interface = 0.0.0.0
server_listen_port = 8080

[metadata]
; set autogenerate to "no" for quicker startup of the server if you know your influxdb structure has not changed
autogenerate = yes
; metadata_file specifies the location of the metadata file to generate
metadata_file = metadata.xml

[influxdb]
; supported schemes include https+influxdb:// and udp+influxdb://
; user:pass in this dsn is used for generating metadata
dsn = https+influxdb://127.0.0.1:8086
max_items_per_query = 50
; authentication_required will pass through http basic auth username
; and password to influxdb
authentication_required = no

My influxdb is set to HTTPS, no auth required and listenning on 0.0.0.0:8086

Here is the output when i run server.py:

./usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
[...]
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
Traceback (most recent call last):
File "server.py", line 168, in
main()
File "server.py", line 161, in main
doc = load_metadata(c)
File "server.py", line 66, in load_metadata
metadata = generate_metadata(dsn)
File "/root/odata-influxdb/influxdbmeta.py", line 131, in generate_metadata
entity_sets, entity_types = entity_sets_and_types(i)
File "/root/odata-influxdb/influxdbmeta.py", line 122, in entity_sets_and_types
for m in db.measurements:
File "/root/odata-influxdb/influxdbmeta.py", line 87, in measurements
measurements.extend(m_dict(m) for m in rs.get_points())
File "/root/odata-influxdb/influxdbmeta.py", line 87, in
measurements.extend(m_dict(m) for m in rs.get_points())
File "/root/odata-influxdb/influxdbmeta.py", line 85, in m_dict
d['fields'] = self.fields(db['name'])
File "/root/odata-influxdb/influxdbmeta.py", line 70, in fields
return tuple(fields)
File "/root/odata-influxdb/influxdbmeta.py", line 69, in
) for f in fields)
File "/root/odata-influxdb/influxdbmeta.py", line 31, in get_edm_type
return influx_type_to_edm_type[influx_type]
KeyError: u'boolean'

The metadata.xml file is not generated.

In the influxdb log file, i can see request with a 200 return code:

127.0.0.1 - None [18/Sep/2019:09:50:11 +0200] "GET /query?q=SHOW+DATABASES HTTP/1.1" 200 123 "-" "python-requests/2.22.0" f108cc85-d9e8-11e9-84ba-005056a91a3c 1020
127.0.0.1 - None [18/Sep/2019:09:50:11 +0200] "GET /query?db=ORA&q=SHOW+MEASUREMENTS HTTP/1.1" 200 285 "-" "python-requests/2.22.0" f10c14cd-d9e8-11e9-84bc-005056a91a3c 13063
127.0.0.1 - None [18/Sep/2019:09:50:11 +0200] "GET /query?db=ORA&q=SHOW+FIELD+KEYS HTTP/1.1" 200 2741 "-" "python-requests/2.22.0" f10e6318-d9e8-11e9-84bd-005056a91a3c 27526

i have no idea on what to do here.... can you help ? :)

Implement authentication

Pass-through Auth (HTTP Basic)

This assumes that the final implementation will look something like this:
- End-User-->HTTPS Proxy (Nginx?)-->OData-InfluxDB-->HTTPS (or VPN) InfluxDB