spectriclabs / elastic_datashader Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 1.0 6.59 MB

:earth_americas: Datashader enabled TMS server with ElasticSearch backend

License: Apache License 2.0

Python 97.77% HTML 0.59% Dockerfile 1.26% Makefile 0.38%

elasticsearch datashader kibana tms heatmap python

elastic_datashader's Introduction

Elastic Datashader

Introduction

Elastic Datashader combines the power of ElasticSearch with Datashader. So you can go from this:

To this:

Running

Setup

Poetry takes care of installing dependencies within the virtual environment. First install poetry.

python3 -m pip install poetry

Now we can create the virtual environment and install dependencies into it with

poetry install

Note that there are extras that can also be installed with --extras which are specified below.

Locally

First enter the virtualenv created by poetry.

poetry shell

Uvicorn

First you need to install the localwebserver optional extra.

poetry install --extras localwebserver

uvicorn is now available for you within the virtualenv (you can reenter with poetry shell). Note that the log level for the datashader logger can be set within the logging_config.yml or by setting the DATASHADER_LOG_LEVEL environment variable; the latter takes precedence.

DATASHADER_ELASTIC=http://user:password@localhost:9200 uvicorn elastic_datashader:app --reload --port 6002 --log-config deployment/logging_config.yml

Docker

First build the Docker container by running 'make' within the folder:

make

To run in production mode via Docker+Uvicorn:

$ docker run -it --rm=true -p 5000:5000 \
    elastic_datashader:latest \
    --log-level=debug \
    -b :5000 \
    --workers 32 \
    --env DATASHADER_ELASTIC=http://user:passwordt@host:9200 \
    --env DATASHADER_LOG_LEVEL=DEBUG

SSL Config Options

docker run -it --rm=true -p 5000:5000 \
    elastic_datashader:latest \
    --log-level=debug \
    -b :5000 \
    --workers 32 \
    --env DATASHADER_ELASTIC=http://user:passwordt@host:9200 \
    --env DATASHADER_LOG_LEVEL=DEBUG \
    --certfile <path> \
    --keyfile <path> \
    --ca-certs <path>

Running behind NGINX

Run datashader as normal and use the following NGINX configuration snippet:

  location /datashader/ {
    proxy_pass http://ip-to-datashader-server:5000/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-Port $server_port;
    proxy_set_header X-Forwarded-Proto $scheme;
  }

Testing

From within the virtualenv (poetry shell) just run the following.

pytest

Tweaks

Datashader layers will be generated faster if Elastic search.max_buckets is increase to 65536.

Kibana

Integration with Kibana Maps can be found here. This code requires changes to code covered via the Elastic License. It is your responsibility to use this code in compliance with this license.

You can build a Kibana with Elastic-Datashader support:

cd kibana
make

API

The API is currently provisional and may change in future releases.

Get Tile

URL : /tms/{index-name}/{z}/{x}/{y}.png Method : GET QueryParameter :

Required:

geopoint_field=[alphanumeric] : the field to use for geopoint coordinates.

Optional:

geopfield_type=[alphanumeric] : the field type to use for the query (default: geo_point) this is needed because crosscluster get_field_mapping doesn't work
timestamp_field=[string] : the field to use for time (default: @timestamp)
params=[json] : query/filter parameters from kibana.
cmap=[alphanumeric] : the colorcet map to use (default: bmy for heatmap and glasbey_category10 for colored points)
category_field=[alphanumeric] : the field to be used for coloring points/ellipses
category_type=[alphanumeric] : the type of the category_field (as found in Kibana Index Pattern)
category_format=[alphanumeric] : the format for numeric category fields (in NumeralJS format)
ellipses=[boolean] : if ellipse shapes should be drawn (default: false)
ellipse_major=[alphanumeric] : the field that contains the ellipse major axis size
ellipse_minor=[alphanumeric] : the field that contains the ellipse minor axis size
ellipse_tilt=[alphanumeric] : the field that contains the ellipse tilt degrees
ellipse_units=[alphanumeric] : the units for the ellipse axis (one of majmin_nm, semi_majmin_nm, or semi_majmin_m)
ellipse_search=[alphanumeric] : how far to search for ellipse when generating tiles (one of narrow, normal, or wide)
spread=[alphanumeric] : how large points should be rendered (one of large, medium, small, auto)
span_range=[alphanumeric] : the dyanmic range to be applied for alpha channel (one of flat, narrow, normal, wide, auto)
resolution=[alphanumeric] : the aggregation grid size (default: finest),
bucket_min=[numeric] : a filter to filterout lower count grid points (percentage of maximum records per grid point)
bucket_max=[numeric] : a filter to filter out higher count grid points (percentage of maximum records per grid point) Params

{
  "lucene_query": "a lucene query"
  "timeFilters": {
     "from": "now-5h"
     "to": "now"
  }
  "filters" : { ... filter information extracted from Kibana ...}
}

Get Legend

URL : /legend/{index-name}/fieldname Method : GET

Required:

geopoint_field=[alphanumeric] : the field to use for geopoint coordinates.

Optional:

timestamp_field=[string] : the field to use for time (default: @timestamp)
params=[json] : query/filter parameters from kibana.
category_field=[alphanumeric] : the field to be used for coloring points/ellipses
category_type=[alphanumeric] : the type of the category_field (as found in Kibana Index Pattern)
category_format=[alphanumeric] : the format for numeric category fields (in NumeralJS format)
cmap=[alphanumeric] : the colorcet map to use (default: bmy for heatmap and glasbey_category10 for colored points)

Params

{
  "lucene_query": "a lucene query"
  "timeFilters": {
     "from": "now-5h"
     "to": "now"
  }
  "filters" : { ... filter information extracted from Kibana ...}
  "extent": {
    "minLat": 0.0, "maxLat": 0.0,
    "minLon: 0.0, "maxLon: 0.0
  }
}

Returns:

[
  {"key"="xyz", "color"="acolor", "count"=100},
  {"key"="abc", "color"="acolor", "count"=105},
]

Release Instructions

Releases

Draft New Release

Create tag with one-up build number, Target:Master

[Publish Release]

elastic_datashader's People

Contributors

Stargazers

Watchers

Forkers

1dry2tam4cra

elastic_datashader's Issues

Upgrade numpy to >=1.22

Dependabot reports that numpy <= 1.21.6 is afflicted with a comparison vulnerability, plus there may be other useful optimizations.

Disable ciphers in tlsv1.3

We need to disable chacha20 and aes_128.

I couldn't get the 1.3 ciphers to disable using --ciphers and had to disable all of tlsv1.3. This isn't desirable.

Also In the Dockerfile I pegged

    pip install gunicorn==20.1.0 && \
    pip install uvicorn==0.22.0

because --ssl-version="TLSv1_2" wasn't working in the new versions. These should be fixed

Error when filter string contains "#"

When filtering against field with values that include "#" datashader could not render layer.
Example: "#1234 -- Brown Fox"
When the query criteria was changed to "*Fox" datashader rendered/returned results with no errors.

The following errors were seen. (List is not exhaustive.)
JSONDecodeError('Unterminated string starting at line 1 column 236 (char 235'))
JSONDecodeError('Unterminated string starting at line 1 column 491 (char 490'))
JSONDecodeError('Unterminated string starting at line 1 column 728 (char 727'))

Ensure base URL is stripped of any whitespace

Add basic pylint code checks to CI

Some rudimentary static analysis via pylint would help ensure code quality as commits hit CI.

"Color by value" is broken

"custom" filters are not supported by Datashader

If you add a filter and use the "Edit As Query DSL" it will be passed to Datashader with type equal to "custom" and the dsl itself will be in a field equal to "value". Code similar to this needs to be added:

elif f.get("meta", {}).get("type") == "custom" and f.get("meta", {}).get("key") is not None:
  filter_key = f.get("meta", {}).get("key")
  if f.get("meta", {}).get("negate"):
    filter["must_not"].append({filter_key : f.get(filter_key) })
  else:
    filter["filter"].append({filter_key : f.get(filter_key) })

This code is just an example and likely would want additional protection and/or checking of corner cases.

Change parameters from a dict to a dataclass or pydantic

This will allow for more strict checks of the parameter field names and types.

Violations of use

I think you should remove this repository. Everything about this repository is in violation of the Elastic License. I would advise you to carefully read Section 2 of that license.

You can't make a portion of the Elastic License code available publicly (all those tar files in kibana and tile_layer.js). You can copy the repository which is basically a fork and that can be public since it will contain the Elastic License Agreement and contain the entire source as-is. Furthermore you cannot really do anything with this repository as it's against the license to be deployed in any environment as it contains derivative works.

Other's will claim that they thought this was Elastic License code because the file headers of your new code that you have interleaved with Elastic is claiming them to be but that argument will be fallacious as anyone who reads the License can clearly tell this is not Elastic code that the repository is claiming.

This is a cool thing you have done. I'm just trying to ensure that you are aware of the many violations in this repo and what the license says about people that breach section 2.

A breach or threatened breach, by You of Section 2 may cause
irreparable harm for which damages at law may not provide adequate relief, and
therefore Elastic shall be entitled to seek injunctive relief without being
required to post a bond

Datashader Kibana Layer Bug - Resolved In Newer Kibana Version

Creating this issue by request.

in Kibana Version 7.10.* if you setup a datashader layer:
If you type invalid KQL in the search bar (i.e. ‘())’) it crashes the map

when testing this in version 7.13 - 8.0 the error was caught and no longer crashed the map.

Peg dependency versions

Many of the dependencies listed in pyproject.toml give ranges or acceptable versions, or just have * for no version specifier. We should peg these so we get consistent results, and be explicit about upgrading to newer version of dependencies. We may also want to consider committing the poetry.lock file to make sure the dependencies hash to the same value.

Add DATASHADER_ELASTIC_API_KEY environment variable

Add DATASHADER_ELASTIC_API_KEY environment variable and if the variable is set, tack on "Authorization: ApiKey..." header to each request.

Upgrade to Python 3.11

We are still running on Python 3.9. Supposedly Python 3.11 benchmarks about 20% faster. Note that Python 3.10 and greater require OpenSSL 1.1.1.

ellipse mode doesn't work if the major/minor/tilt fields are indexed as numbers but published to JSON as strings

Currently datashader uses the source values when rendering ellipses. The problem is that the source values are kept as the original type, regardless of how the data was indexed.

At a minimum, datashader should call float(x) on these values. Ideally, datashader would be modified to use Elastic docvalues instead.

Replace MD5 hashing with SHA for FIPS systems

Systems that are FIPS 140-2 compliant do not support MD5 hashing. The datashader server currently uses MD5 to hash request parameters and to create color palettes, which will throw an exception on FIPS systems. It should be easy enough to replace this with SHA1 or SHA256.

pandas remove_unused_categories inplace parameter is deprecated

pandas_util.py:51: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version.

Feature: support asynchronous loading of tiles

It would improve the Kibana user experience if a tile request returned immediately while the lengthy tile generation process continued.

Update for Kibana 7.17

If datashader is unable to startup if there are too many cache files

gunicorn times out the worker while it tries to clear the cache because it takes too long to glob the files in the cache directory.

https://docs.gunicorn.org/en/stable/settings.html#timeout

Potential temporary relief to increase the timeout time. but need a long term solution.

Update for ES 7.13

Investigate Kibana timeslider with datashader

Investigate the timeslider with datashader. The task is to get its modified time window into the datashader request tile requests.

Update to PEP517/518

The Python ecosystem is moving from setup.py to pyproject.toml as specified by PEP517/518. This will provide more consistent build and test environments which can be leveraged in CI.

Update for Kibana 8.3.3

Include in this update receiving and passing along query parameters for logging and tracking:

-X-Opaque-ID
-User info data

Datashader instance driven by configuration, not user set / saved object captured

Currently when a user creates a Datashader layer it is prepopulated with a URL that is set in configuration , but can be modified by the user. The instance of Datashader used on a Map/Dashboard should always be the one that points to the same Elasticsearch as the Kibana instance because it is within the Datashader configuration that the ES instance utilized is defined, not based on the Kibana used.

ENHANCEMENTS:
-Change Kibana UI so URL is not user editable.
-Change Map Component/Dashboard save so URL is not captured in SavedObjects, but rather read from config when the MapDashboard is utilized.

Color-by boolean fails

If you choose color-by a field, where the type is boolean it fails.

Feature: improve auto scaling

Requested enhancement: Would it be possible to make the dots just a bit larger if there are only a few of them of that type, in the Field of View? Or maybe all of them just slightly bigger so that they can be seen easier if there are just a few?

Toggle of "Show ellipses" causes ellipse color change which no longer aligns with legend

Steps to reproduce:
Add Datashader layer to Map
Layer Style section
Color : By Value
Observe legend is correct for plot points.
In Layer Style section change the "Render Mode" of "Show ellipses" to "on".
Observe ellipse colors do not align with legend colors.