Giter VIP home page Giter VIP logo

elastic_datashader's Introduction

Elastic Datashader

Introduction

Elastic Datashader combines the power of ElasticSearch with Datashader. So you can go from this:

Kibana Default Heatmap

To this:

Kibana Default Heatmap

Running

Setup

Poetry takes care of installing dependencies within the virtual environment. First install poetry.

python3 -m pip install poetry

Now we can create the virtual environment and install dependencies into it with

poetry install

Note that there are extras that can also be installed with --extras which are specified below.

Locally

First enter the virtualenv created by poetry.

poetry shell

Uvicorn

First you need to install the localwebserver optional extra.

poetry install --extras localwebserver

uvicorn is now available for you within the virtualenv (you can reenter with poetry shell). Note that the log level for the datashader logger can be set within the logging_config.yml or by setting the DATASHADER_LOG_LEVEL environment variable; the latter takes precedence.

DATASHADER_ELASTIC=http://user:password@localhost:9200 uvicorn elastic_datashader:app --reload --port 6002 --log-config deployment/logging_config.yml 

Docker

First build the Docker container by running 'make' within the folder:

make

To run in production mode via Docker+Uvicorn:

$ docker run -it --rm=true -p 5000:5000 \
    elastic_datashader:latest \
    --log-level=debug \
    -b :5000 \
    --workers 32 \
    --env DATASHADER_ELASTIC=http://user:passwordt@host:9200 \
    --env DATASHADER_LOG_LEVEL=DEBUG

SSL Config Options

docker run -it --rm=true -p 5000:5000 \
    elastic_datashader:latest \
    --log-level=debug \
    -b :5000 \
    --workers 32 \
    --env DATASHADER_ELASTIC=http://user:passwordt@host:9200 \
    --env DATASHADER_LOG_LEVEL=DEBUG \
    --certfile <path> \
    --keyfile <path> \
    --ca-certs <path>

Running behind NGINX

Run datashader as normal and use the following NGINX configuration snippet:

  location /datashader/ {
    proxy_pass http://ip-to-datashader-server:5000/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Host $host;
    proxy_set_header X-Forwarded-Server $host;
    proxy_set_header X-Forwarded-Port $server_port;
    proxy_set_header X-Forwarded-Proto $scheme;
  }

Testing

From within the virtualenv (poetry shell) just run the following.

pytest

Tweaks

Datashader layers will be generated faster if Elastic search.max_buckets is increase to 65536.

Kibana

Integration with Kibana Maps can be found here. This code requires changes to code covered via the Elastic License. It is your responsibility to use this code in compliance with this license.

You can build a Kibana with Elastic-Datashader support:

cd kibana
make

API

The API is currently provisional and may change in future releases.

Get Tile

URL : /tms/{index-name}/{z}/{x}/{y}.png Method : GET QueryParameter :

Required:

  • geopoint_field=[alphanumeric] : the field to use for geopoint coordinates.

Optional:

  • geopfield_type=[alphanumeric] : the field type to use for the query (default: geo_point) this is needed because crosscluster get_field_mapping doesn't work
  • timestamp_field=[string] : the field to use for time (default: @timestamp)
  • params=[json] : query/filter parameters from kibana.
  • cmap=[alphanumeric] : the colorcet map to use (default: bmy for heatmap and glasbey_category10 for colored points)
  • category_field=[alphanumeric] : the field to be used for coloring points/ellipses
  • category_type=[alphanumeric] : the type of the category_field (as found in Kibana Index Pattern)
  • category_format=[alphanumeric] : the format for numeric category fields (in NumeralJS format)
  • ellipses=[boolean] : if ellipse shapes should be drawn (default: false)
  • ellipse_major=[alphanumeric] : the field that contains the ellipse major axis size
  • ellipse_minor=[alphanumeric] : the field that contains the ellipse minor axis size
  • ellipse_tilt=[alphanumeric] : the field that contains the ellipse tilt degrees
  • ellipse_units=[alphanumeric] : the units for the ellipse axis (one of majmin_nm, semi_majmin_nm, or semi_majmin_m)
  • ellipse_search=[alphanumeric] : how far to search for ellipse when generating tiles (one of narrow, normal, or wide)
  • spread=[alphanumeric] : how large points should be rendered (one of large, medium, small, auto)
  • span_range=[alphanumeric] : the dyanmic range to be applied for alpha channel (one of flat, narrow, normal, wide, auto)
  • resolution=[alphanumeric] : the aggregation grid size (default: finest),
  • bucket_min=[numeric] : a filter to filterout lower count grid points (percentage of maximum records per grid point)
  • bucket_max=[numeric] : a filter to filter out higher count grid points (percentage of maximum records per grid point) Params
{
  "lucene_query": "a lucene query"
  "timeFilters": {
     "from": "now-5h"
     "to": "now"
  }
  "filters" : { ... filter information extracted from Kibana ...}
}

Get Legend

URL : /legend/{index-name}/fieldname Method : GET

Required:

  • geopoint_field=[alphanumeric] : the field to use for geopoint coordinates.

Optional:

  • timestamp_field=[string] : the field to use for time (default: @timestamp)
  • params=[json] : query/filter parameters from kibana.
  • category_field=[alphanumeric] : the field to be used for coloring points/ellipses
  • category_type=[alphanumeric] : the type of the category_field (as found in Kibana Index Pattern)
  • category_format=[alphanumeric] : the format for numeric category fields (in NumeralJS format)
  • cmap=[alphanumeric] : the colorcet map to use (default: bmy for heatmap and glasbey_category10 for colored points)

Params

{
  "lucene_query": "a lucene query"
  "timeFilters": {
     "from": "now-5h"
     "to": "now"
  }
  "filters" : { ... filter information extracted from Kibana ...}
  "extent": {
    "minLat": 0.0, "maxLat": 0.0,
    "minLon: 0.0, "maxLon: 0.0
  }
}

Returns:

[
  {"key"="xyz", "color"="acolor", "count"=100},
  {"key"="abc", "color"="acolor", "count"=105},
]

Release Instructions

Releases

Draft New Release

Create tag with one-up build number, Target:Master

[Publish Release]

elastic_datashader's People

Contributors

ballenspectric avatar benshoespectric avatar billallen256 avatar desean1625 avatar maihde avatar mattastley avatar mkellogg91 avatar mrecachinas avatar ndmitch311 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

1dry2tam4cra

elastic_datashader's Issues

Upgrade numpy to >=1.22

Dependabot reports that numpy <= 1.21.6 is afflicted with a comparison vulnerability, plus there may be other useful optimizations.

Disable ciphers in tlsv1.3

We need to disable chacha20 and aes_128.

I couldn't get the 1.3 ciphers to disable using --ciphers and had to disable all of tlsv1.3. This isn't desirable.

Also In the Dockerfile I pegged

    pip install gunicorn==20.1.0 && \
    pip install uvicorn==0.22.0

because --ssl-version="TLSv1_2" wasn't working in the new versions. These should be fixed

Error when filter string contains "#"

When filtering against field with values that include "#" datashader could not render layer.
Example: "#1234 -- Brown Fox"
When the query criteria was changed to "*Fox" datashader rendered/returned results with no errors.

The following errors were seen. (List is not exhaustive.)
JSONDecodeError('Unterminated string starting at line 1 column 236 (char 235'))
JSONDecodeError('Unterminated string starting at line 1 column 491 (char 490'))
JSONDecodeError('Unterminated string starting at line 1 column 728 (char 727'))

"custom" filters are not supported by Datashader

If you add a filter and use the "Edit As Query DSL" it will be passed to Datashader with type equal to "custom" and the dsl itself will be in a field equal to "value". Code similar to this needs to be added:

elif f.get("meta", {}).get("type") == "custom" and f.get("meta", {}).get("key") is not None:
  filter_key = f.get("meta", {}).get("key")
  if f.get("meta", {}).get("negate"):
    filter["must_not"].append({filter_key : f.get(filter_key) })
  else:
    filter["filter"].append({filter_key : f.get(filter_key) })

This code is just an example and likely would want additional protection and/or checking of corner cases.

Violations of use

I think you should remove this repository. Everything about this repository is in violation of the Elastic License. I would advise you to carefully read Section 2 of that license.

You can't make a portion of the Elastic License code available publicly (all those tar files in kibana and tile_layer.js). You can copy the repository which is basically a fork and that can be public since it will contain the Elastic License Agreement and contain the entire source as-is. Furthermore you cannot really do anything with this repository as it's against the license to be deployed in any environment as it contains derivative works.

Other's will claim that they thought this was Elastic License code because the file headers of your new code that you have interleaved with Elastic is claiming them to be but that argument will be fallacious as anyone who reads the License can clearly tell this is not Elastic code that the repository is claiming.

This is a cool thing you have done. I'm just trying to ensure that you are aware of the many violations in this repo and what the license says about people that breach section 2.

A breach or threatened breach, by You of Section 2 may cause
irreparable harm for which damages at law may not provide adequate relief, and
therefore Elastic shall be entitled to seek injunctive relief without being
required to post a bond

Datashader Kibana Layer Bug - Resolved In Newer Kibana Version

Creating this issue by request.

in Kibana Version 7.10.* if you setup a datashader layer:
If you type invalid KQL in the search bar (i.e. โ€˜())โ€™) it crashes the map

when testing this in version 7.13 - 8.0 the error was caught and no longer crashed the map.

Peg dependency versions

Many of the dependencies listed in pyproject.toml give ranges or acceptable versions, or just have * for no version specifier. We should peg these so we get consistent results, and be explicit about upgrading to newer version of dependencies. We may also want to consider committing the poetry.lock file to make sure the dependencies hash to the same value.

Upgrade to Python 3.11

We are still running on Python 3.9. Supposedly Python 3.11 benchmarks about 20% faster. Note that Python 3.10 and greater require OpenSSL 1.1.1.

Replace MD5 hashing with SHA for FIPS systems

Systems that are FIPS 140-2 compliant do not support MD5 hashing. The datashader server currently uses MD5 to hash request parameters and to create color palettes, which will throw an exception on FIPS systems. It should be easy enough to replace this with SHA1 or SHA256.

Update to PEP517/518

The Python ecosystem is moving from setup.py to pyproject.toml as specified by PEP517/518. This will provide more consistent build and test environments which can be leveraged in CI.

Update for Kibana 8.3.3

Include in this update receiving and passing along query parameters for logging and tracking:

-X-Opaque-ID
-User info data

Datashader instance driven by configuration, not user set / saved object captured

Currently when a user creates a Datashader layer it is prepopulated with a URL that is set in configuration , but can be modified by the user. The instance of Datashader used on a Map/Dashboard should always be the one that points to the same Elasticsearch as the Kibana instance because it is within the Datashader configuration that the ES instance utilized is defined, not based on the Kibana used.

ENHANCEMENTS:
-Change Kibana UI so URL is not user editable.
-Change Map Component/Dashboard save so URL is not captured in SavedObjects, but rather read from config when the MapDashboard is utilized.

Feature: improve auto scaling

Requested enhancement: Would it be possible to make the dots just a bit larger if there are only a few of them of that type, in the Field of View? Or maybe all of them just slightly bigger so that they can be seen easier if there are just a few?

Add a /metrics endpoint

Some stats/metrics are currently displayed on the index page, but they are not formatted in a way that can be consumed by tools like Prometheus.

Remove calls to du

Disk cache space remaining is currently checked by shelling out to du and parsing the output. This can be done directly using os.scandir and the humanize package.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.