Giter VIP home page Giter VIP logo

aws / graph-notebook Goto Github PK

View Code? Open in Web Editor NEW
688.0 35.0 159.0 27.26 MB

Library extending Jupyter notebooks to integrate with Apache TinkerPop, openCypher, and RDF SPARQL.

Home Page: https://github.com/aws/graph-notebook

License: Apache License 2.0

Python 6.74% JavaScript 0.09% CSS 0.03% Jupyter Notebook 92.66% HTML 0.09% TypeScript 0.31% Shell 0.05% Dockerfile 0.02%
graph gremlin jupyter jupyter-widgets sparql neptune opencypher cypher rdf apache

graph-notebook's Introduction

Graph Notebook: easily query and visualize graphs

The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Using this open-source Python package, you can connect to any graph database that supports the Apache TinkerPop, openCypher or the RDF SPARQL graph models. These databases could be running locally on your desktop or in the cloud. Graph databases can be used to explore a variety of use cases including knowledge graphs and identity graphs.

A colorful graph picture

Visualizing Gremlin queries

Gremlin query and graph

Visualizing openCypher queries

openCypher query and graph

Visualizing SPARQL queries

SPARL query and graph

Instructions for connecting to the following graph databases:

Endpoint Graph model Query language
Gremlin Server property graph Gremlin
Blazegraph RDF SPARQL
Amazon Neptune property graph or RDF Gremlin, openCypher, or SPARQL
Neo4J property graph Cypher

We encourage others to contribute configurations they find useful. There is an additional-databases folder where more information can be found.

Features

Notebook cell 'magic' extensions in the IPython 3 kernel

%%sparql - Executes a SPARQL query against your configured database endpoint. Documentation

%%gremlin - Executes a Gremlin query against your database using web sockets. The results are similar to those a Gremlin console would return. Documentation

%%opencypher or %%oc Executes an openCypher query against your database. Documentation

%%graph_notebook_config - Sets the executing notebook's database configuration to the JSON payload provided in the cell body.

%%graph_notebook_vis_options - Sets the executing notebook's vis.js options to the JSON payload provided in the cell body.

%%neptune_ml - Set of commands to integrate with NeptuneML functionality, as described here. Documentation

TIP πŸ‘‰ %%sparql, %%gremlin, and %%oc share a suite of common arguments that be used to customize the appearance of rendered graphs. Example usage of these arguments can also be found in the sample notebooks under 02-Visualization.

TIP πŸ‘‰ There is syntax highlighting for language query magic cells to help you structure your queries more easily.

Notebook line 'magic' extensions in the IPython 3 kernel

%gremlin_status - Obtain the status of Gremlin queries. Documentation

%sparql_status - Obtain the status of SPARQL queries. Documentation

%opencypher_status or %oc_status - Obtain the status of openCypher queries. Documentation

%load - Generate a form to submit a bulk loader job. Documentation

%load_ids - Get ids of bulk load jobs. Documentation

%load_status - Get the status of a provided load_id. Documentation

%cancel_load - Cancels a bulk load job. You can either provide a single load_id, or specify --all-in-queue to cancel all queued (and not actively running) jobs. Documentation

%neptune_ml - Set of commands to integrate with NeptuneML functionality, as described here. You can find a set of tutorial notebooks here. Documentation

%status - Check the Health Status of the configured host endpoint. Documentation

%seed - Provides a form to add data to your graph, using sets of insert queries instead of a bulk loader. Sample RDF and Property Graph data models are provided with this command. Alternatively, you can select a language type and provide a file path(or a directory path containing one or more of these files) to load the queries from.

%stream_viewer - Interactively explore the Neptune CDC stream (if enabled)

%graph_notebook_config - Returns a JSON payload that contains connection information for your host.

%graph_notebook_host - Set the host endpoint to send queries to.

%graph_notebook_version - Print the version of the graph-notebook package

%graph_notebook_vis_options - Print the Vis.js options being used for rendered graphs

TIP πŸ‘‰ You can list all the magics installed in the Python 3 kernel using the %lsmagic command.

TIP πŸ‘‰ Many of the magic commands support a --help option in order to provide additional information.

Example notebooks

This project includes many example Jupyter notebooks. It is recommended to explore them. All of the commands and features supported by graph-notebook are explained in detail with examples within the sample notebooks. You can find them here. As this project has evolved, many new features have been added. If you are already familiar with graph-notebook but want a quick summary of new features added, a good place to start is the Air-Routes notebooks in the 02-Visualization folder.

Keeping track of new features

It is recommended to check the ChangeLog.md file periodically to keep up to date as new features are added.

Prerequisites

You will need:

  • Python 3.8.x-3.10.13
  • A graph database that provides one or more of:
    • A SPARQL 1.1 endpoint
    • An Apache TinkerPop Gremlin Server compatible endpoint
    • An endpoint compatible with openCypher

Installation

Begin by installing graph-notebook and its prerequisites, then follow the remaining instructions for either Jupyter Classic Notebook or JupyterLab.

# install the package
pip install graph-notebook

Jupyter Classic Notebook

# Enable the visualization widget
jupyter nbextension enable  --py --sys-prefix graph_notebook.widgets

# copy static html resources
python -m graph_notebook.static_resources.install
python -m graph_notebook.nbextensions.install

# copy premade starter notebooks
python -m graph_notebook.notebooks.install --destination ~/notebook/destination/dir

# create nbconfig file and directory tree, if they do not already exist
mkdir ~/.jupyter/nbconfig
touch ~/.jupyter/nbconfig/notebook.json

# start jupyter notebook
python -m graph_notebook.start_notebook --notebooks-dir ~/notebook/destination/dir

JupyterLab 3.x

# install jupyterlab
pip install "jupyterlab>=3,<4"

# copy premade starter notebooks
python -m graph_notebook.notebooks.install --destination ~/notebook/destination/dir

# start jupyterlab
python -m graph_notebook.start_jupyterlab --jupyter-dir ~/notebook/destination/dir

Loading magic extensions in JupyterLab

When attempting to run a line/cell magic on a new notebook in JupyterLab, you may encounter the error:

UsageError: Cell magic `%%graph_notebook_config` not found.

To fix this, run the following command, then restart JupyterLab.

python -m graph_notebook.ipython_profile.configure_ipython_profile

Alternatively, the magic extensions can be manually reloaded for a single notebook by running the following command in any empty cell.

%load_ext graph_notebook.magics

Upgrading an existing installation

# upgrade graph-notebook
pip install graph-notebook --upgrade

After the above command completes, rerun the commands given at Jupyter Classic Notebook or JupyterLab 3.x based on which flavour is installed.

Connecting to a graph database

Configuration options can be set using the %graph_notebook_config magic command. The command accepts a JSON object as an argument. The JSON object can contain any of the configuration options listed below. The command can be run multiple times to change the configuration. The configuration is stored in the notebook's metadata and will be used for all subsequent queries.

Configuration Option Description Default Value Type
auth_mode The authentication mode to use for Amazon Neptune connections DEFAULT string
aws_region The AWS region to use for Amazon Neptune connections your-region-1 string
host The host url to form a connection with localhost string
load_from_s3_arn The ARN of the S3 bucket to load data from [Amazon Neptune only] string
neptune_service The name of the Neptune service for the host url [Amazon Neptune only] neptune-db string
port The port to use when creating a connection 8182 number
proxy_host The proxy host url to route a connection through [Amazon Neptune only] string
proxy_port The proxy port to use when creating proxy connection [Amazon Neptune only] 8182 number
ssl Whether to make connections to the created endpoint with ssl or not [True/False] False boolean
ssl_verify Whether to verify the server's TLS certificate or not [True/False] True boolean
sparql SPARQL connection object { "path": "sparql" } string
gremlin Gremlin connection object { "username": "", "password": "", "traversal_source": "g", "message_serializer": "graphsonv3" } string
neo4j Neo4J connection object { "username": "neo4j", "password": "password", "auth": true, "database": null } string

Gremlin Server

In a new cell in the Jupyter notebook, change the configuration using %%graph_notebook_config and modify the fields for host, port, and ssl. Optionally, modify traversal_source if your graph traversal source name differs from the default value, username and password if required by the graph store, or message_serializer for a specific data transfer format. For a local Gremlin server (HTTP or WebSockets), you can use the following command:

%%graph_notebook_config
{
  "host": "localhost",
  "port": 8182,
  "ssl": false,
  "gremlin": {
    "traversal_source": "g",
    "username": "",
    "password": "",
    "message_serializer": "graphsonv3"
  }
}

To setup a new local Gremlin Server for use with the graph notebook, check out additional-databases/gremlin server

Blazegraph

Change the configuration using %%graph_notebook_config and modify the fields for host, port, and ssl. For a local Blazegraph database, you can use the following command:

%%graph_notebook_config
{
  "host": "localhost",
  "port": 9999,
  "ssl": false,
  "sparql": {
    "path": "sparql"
  }
}

You can also make use of namespaces for Blazegraph by specifying the path graph-notebook should use when querying your SPARQL like below:

%%graph_notebook_config

{
  "host": "localhost",
  "port": 9999,
  "ssl": false,
  "sparql": {
    "path": "blazegraph/namespace/foo/sparql"
  }
}

This will result in the url localhost:9999/blazegraph/namespace/foo/sparql being used when executing any %%sparql magic commands.

To setup a new local Blazegraph database for use with the graph notebook, check out the Quick Start from Blazegraph.

Amazon Neptune

Change the configuration using %%graph_notebook_config and modify the defaults as they apply to your Neptune instance.

Neptune DB

%%graph_notebook_config
{
  "host": "your-neptune-endpoint",
  "neptune_service": "neptune-db",
  "port": 8182,
  "auth_mode": "DEFAULT",
  "load_from_s3_arn": "",
  "ssl": true,
  "ssl_verify": true,
  "aws_region": "your-neptune-region"
}

Neptune Analytics

%%graph_notebook_config
{
  "host": "your-neptune-endpoint",
  "neptune_service": "neptune-graph",
  "port": 443,
  "auth_mode": "IAM",
  "ssl": true,
  "ssl_verify": true,
  "aws_region": "your-neptune-region"
}

To setup a new Amazon Neptune cluster, check out the Amazon Web Services documentation.

When connecting the graph notebook to Neptune via a private endpoint, make sure you have a network setup to communicate to the VPC that Neptune runs on. If not, you can follow this guide.

In addition to the above configuration options, you can also specify the following options:

Amazon Neptune Proxy Connection

%%graph_notebook_config
{
  "host": "clustername.cluster-ididididid.us-east-1.neptune.amazonaws.com",
  "neptune_service": "neptune-db",
  "port": 8182,
  "ssl": true,
  "proxy_port": 8182,
  "proxy_host": "host.proxy.com",
  "auth_mode": "IAM",
  "aws_region": "us-east-1",
  "load_from_s3_arn": ""
}

See also: Connecting to Amazon Neptune from clients outside the Neptune VPC using AWS Network Load Balancer

Authentication (Amazon Neptune)

If you are running a SigV4 authenticated endpoint, ensure that your configuration has auth_mode set to IAM:

%%graph_notebook_config
{
  "host": "your-neptune-endpoint",
  "neptune_service": "neptune-db",
  "port": 8182,
  "auth_mode": "IAM",
  "load_from_s3_arn": "",
  "ssl": true,
  "ssl_verify": true,
  "aws_region": "your-neptune-region"
}

Additionally, you should have the following Amazon Web Services credentials available in a location accessible to Boto3:

  • Access Key ID
  • Secret Access Key
  • Default Region
  • Session Token (OPTIONAL. Use if you are using temporary credentials)

These variables must follow a specific naming convention, as listed in the Boto3 documentation

A list of all locations checked for Amazon Web Services credentials can also be found here.

Neo4J

Change the configuration using %%graph_notebook_config and modify the fields for host, port, ssl, and neo4j authentication.

If your Neo4J instance supports multiple databases, you can specify a database name via the database field. Otherwise, leave the database field blank to query the default database.

For a local Neo4j Desktop database, you can use the following command:

%%graph_notebook_config
{
  "host": "localhost",
  "port": 7687,
  "ssl": false,
  "neo4j": {
    "username": "neo4j",
    "password": "password",
    "auth": true,
    "database": ""
  }
}

Ensure that you also specify the %%oc bolt option when submitting queries to the Bolt endpoint.

To setup a new local Neo4J Desktop database for use with the graph notebook, check out the Neo4J Desktop User Interface Guide.

Building From Source

A pre-release distribution can be built from the graph-notebook repository via the following steps:

# 1) Clone the repository and navigate into the clone directory
git clone https://github.com/aws/graph-notebook.git
cd graph-notebook

# 2) Create a new virtual environment

# 2a) Option 1 - pyenv
pyenv install 3.10.13  # Only if not already installed; this can be any supported Python 3 version in Prerequisites
pyenv virtualenv 3.10.13 build-graph-notebook
pyenv local build-graph-notebook

# 2b) Option 2 - venv
rm -rf /tmp/venv
python3 -m venv /tmp/venv
source /tmp/venv/bin/activate

# 3) Install build dependencies
pip install --upgrade pip setuptools wheel twine
pip install "jupyterlab>=3,<4"

# 4) Build the distribution
python3 setup.py bdist_wheel

You should now be able to find the built distribution at

./dist/graph_notebook-4.2.0-py3-none-any.whl

And use it by following the installation steps, replacing

pip install graph-notebook

with

pip install ./dist/graph_notebook-4.1.0-py3-none-any.whl

Contributing Guidelines

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

graph-notebook's People

Contributors

abhishekpradeepmishra avatar amazon-auto avatar austinkline avatar bechbd avatar beebs-systap avatar charlesivie avatar cubeddu avatar dependabot[bot] avatar gkutchekaws avatar gopuneet avatar huashu avatar iansrobinson avatar imengby avatar jennyzhang0215 avatar jongwooo avatar joywa avatar jroimartin avatar kevinphillips81 avatar krlawrence avatar michaelnchin avatar mverkruyse avatar nicksinch avatar ogoodness avatar rogargon avatar sojiadeshina avatar spmallette avatar triggan avatar xina0311 avatar zacharyrs avatar zingdle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graph-notebook's Issues

cannot connect to public blazegraph endpoint

Hello. We have a public blazegraph endpoint set up at http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql. However, I cannot connect to it in graph-noteboook. My setup is:

%%graph_notebook_config
{
  "host": "http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql",
  "port": 80,
  "auth_mode": "DEFAULT",
  "iam_credentials_provider_type": "ROLE",
  "load_from_s3_arn": "",
  "ssl": false,
  "aws_region": "us-east-1"
}

The query:

%%sparql
select * where {
    ?s ?p ?o
} limit 10

returns an error:

{'error': ConnectionError(MaxRetryError("HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: //kg-hub-rdf.berkeleybop.io/blazegraph/sparql:80/sparql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f90e0f09150>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))"))}

If you wish, you can verify our endpoint with this curl command:

curl -G query='SELECT ?x WHERE {?x <https://w3id.org/biolink/vocab/category> <https://w3id.org/biolink/vocab/Gene>} limit 10' -H "Accept:application/sparql-results+json" http://kg-hub-rdf.berkeleybop.io/blazegraph/sparql

Documentation for specifying sparql paths on Blazegraph

Does PR #49 fix issues #39 and #45 ?
If so, can you please post documentation? I've tried:

%%graph_notebook_config
{
  "host": "http://kg-hub-rdf.berkeleybop.io",
  "port": 80,
  "auth_mode": "DEFAULT",
  "iam_credentials_provider_type": "ROLE",
  "load_from_s3_arn": "",
  "ssl": false,
  "aws_region": "us-east-1"
  "sparql": {
       "blazegraph/sparql"
   }
}

and

%%graph_notebook_config
{
  "host": "http://kg-hub-rdf.berkeleybop.io",
  "port": 80,
  "auth_mode": "DEFAULT",
  "iam_credentials_provider_type": "ROLE",
  "load_from_s3_arn": "",
  "ssl": false,
  "aws_region": "us-east-1"
  "sparql_path": "blazegraph/sparql"
}

But I receive syntax errors.

[BUG] Long bulk loads lock up all other cells

While %load is monitoring a load, no other cells can be run. They can be edited but not run. It would be nice if %load was more async or had an option to not monitor the load status,

Add elementMap() support

With the 1.0.4.0 release and support for TinkerPop version to 3.4.8 we should include support for the elementMap() step.

http://tinkerpop.apache.org/docs/current/reference/#elementmap-step

This visualization should add support for this step in two ways:

It will remove the need to use path patterns to determine the direction of an arrow as the in and out vertices of the edge are specified in the output. Below are some example queries which should be able to be visualized without path patterns.

g.V().has('code', 'ANC').outE().inV().path().by(elementMap())
g.V().has('code', 'ANC').outE().path().by(elementMap())

Additionally, we should also now be able to graph edges returned without the path object if the elementMap() is specified. e.g

g.V().has('code', 'ANC').outE().elementMap()

Allow coloring of the results of a SPARQL query

Is your feature request related to a problem? Please describe.
I would like to be able to set the color of the results of my SPARQL query in the visualization

Describe the solution you'd like
When running a SPARQL query that is visualized I would like it to color the results by label by rdf:type. I would also like it to allow me to specify the property it will use to color by.

Create a guide to setup access via SageMaker

Is your feature request related to a problem? Please describe.
I'd like to connect to a Neptune cluster. The easiest thing for me would be to launch a SageMaker notebook in that VPC and work on it. However, it seems like installing the extensions is not as straight-forward as a pip install.

Describe the solution you'd like
A guide on how to install all the required dependencies, via a lifecycle configuration if possible.

Add flag to group by the result text

As a user it would be nice to be able to group all the results by the raw result returned instead of requiring a map value.

e.g.
g.V().has('code', 'ANC').out().path().by('region')

This should highlight based on the region text returned.

[BUG] Missing documentation on connecting to Neptune from MacOS

Describe the bug
There are some missing details for how to connect to Neptune from a MacOS device, we should add them to our doc on connecting to neptune via ssh-tunnel found here

One main piece that we are missing is that a host alias needs to be made in order to get things working properly.

Additional context
This is coming from a bug report from connectivity not working as found in #40

[BUG] %seed status bar does not finish

Describe the bug
When loading air-routes the %seed status bar does not get beyond about 60% filled in. The files load fine. This is just a comment on the UX (progress bar).

To Reproduce
Steps to reproduce the behavior:

  1. Load Gremlin airports from %seed

Expected behavior
Progress bar gets to 100% filled in

Cannot install: No module named 'graph_notebook'

Describe the bug
I cannot install your graph notebook. I receive the error No module named 'graph_notebook' when following your installation steps.

To Reproduce
Steps to reproduce the behavior:

  1. Create virtual environment with venv: python -m venv .env
  2. Activate the virtual environment (.e.g, source .env/bin/activate).
  3. Upgrade pip to 20.3.1: pip install -U pip)
  4. Install requirements: pip install -r requirements.txt
  5. Per instructions, install and enable the visualization widget: jupyter nbextension install --py --sys-prefix graph_notebook.widgets
    I receive the ERROR: ModuleNotFoundError: No module named 'graph_notebook'

I have tried running the jupyter nbextension install --py --sys-prefix graph_notebook.widgets command from the src directory and I received the same error.

Desktop (please complete the following information):

  • OS: OS X 11 (Big Sur)
  • Browser Safari
  • terminal: running zsh

Support for virtuoso sparql endpoint

Is your feature request related to a problem? Please describe.
I've been trying to have graph-notebook connect to a virtuoso sparql endpoint, without success.

Describe the solution you'd like
Support for virtuoso sparql endpoints. Or, if already possible, documentation about connection setup.

How to specify a Blazegraph namespace

The current Blazegraph config works for me in my setup.

%%graph_notebook_config
{
  "host": "1.2.3.4",
  "port": 80,
  "auth_mode": "DEFAULT",
  "iam_credentials_provider_type": "ROLE",
  "load_from_s3_arn": "",
  "ssl": false,
  "aws_region": "us-east-1"
}

but this seems to only connect to the default "kb" namespace, distinct from the named graph concept.

Is there a value in the config I can use to specify a namespace I create in Blazegraph.

I use things like

http://1.2.3.4/blazegraph/namespace/dev/sparql>

for example for a namespace "dev".

Something like:

"namespace": "dev",

or even allow a specific SPARQL endpoint URl like

"endpoint": "blazegraph/namespace/dev/sparql"

JupyterLabs support

Is your feature request related to a problem? Please describe.
Related to #54
Query outputs are not rendered properly in JupyterLabs.

Describe the solution you'd like
All rendered widgets and magic commands which work in Notebooks should also work in labs. Ideally the extension would load automatically like it does in notebooks.

Additional context
Labs screenshot:
image

Note that there is not a tab output widget like there should be, and the table that is visible (you can see the first column) is not the fully formed Datatable which we use.

%load_status should accept parameters

Is your feature request related to a problem? Please describe.
Currently the %load_status command doesn't support the params which Neptune supports based on the Neptune loader documentation

Describe the solution you'd like
Parameters similar to other commands which expose these options using the line input on a line_magic.

[BUG] No documentation on how to connect local notebook to remote Neptune SSL

SSL Connection to remote Neptune not working
I am unable to figure out how can I specify the correct certificate SFSRootCAG2.pem when running queries against SSL-enabled Neptune.

To Reproduce
Steps to reproduce the behavior:

  1. I set up SSH tunnel via bastion to the Neptune cluster 'ssh -i keypairfilename.pem ec2-user@yourec2instanceendpoint -N -L 8182:yourneptuneendpoint:8182'

  2. I start graph-notebook as 'jupyter notebook notebook/destination_neptune'. This gives me the output Jupyter Notebook 6.1.5 is running at: http://localhost:8888/?token=13b2761a59217f9246aed1dab73e70c3ae42973c4339f328

  3. I open my notebook and run the following magic commands
    '%%graph_notebook_config
    {
    "host": "localhost",
    "port": 8182,
    "auth_mode": "DEFAULT",
    "iam_credentials_provider_type": "ROLE",
    "load_from_s3_arn": "",
    "aws_region": ,
    "ssl": true
    }'

  4. I run the command
    %%sparql
    SELECT * WHERE {?s ?p ?o} LIMIT 1

  5. It gives me the error
    {'error': SSLError(MaxRetryError('HTTPSConnectionPool(host='localhost', port=8182): Max retries exceeded with url: /sparql (Caused by SSLError(SSLCertVerificationError("hostname 'localhost' doesn't match either of '*.............

Expected behavior
I expect to be able to connect to a remote neptune that has ssl enabled.

Screenshots
None

Desktop (please complete the following information):

  • macOS 10.15.7 Catalina
  • Browser Chrome
  • Version 86.0.4240.198 (Official Build) (x86_64)

Additional context
Add any other context about the problem here.None

Add ability to specify the field to display for Gremlin by label

Is your feature request related to a problem? Please describe.
When returning values from Gremlin the label is currently a fixed format. It would be nice to be able to specify the field to display the text by per label.

e.g. If I returned a dataset that contained

{T.id: 1, T.label:'person', 'name', 'Dave'}
{T.id: 2, T.label:'document', 'country', 'US'}

It would be nice to be able to specify that I wanted the person vertices to display the 'name' property and the document vertices to display the 'country'

[BUG] Graph tab not displaying

Describe the bug
When running gremlin queries, only the console tab renders. The graph tab is missing.

To Reproduce
I'm running the Jupyter notebook in a local env using Python 3.6.12 and graph-notebook version 2.0.10.

python -m venv env
source env/bin/activate
pip install -r requirements.txt

jupyter nbextension install --py --sys-prefix graph_notebook.widgets
jupyter nbextension enable  --py --sys-prefix graph_notebook.widgets
python -m graph_notebook.static_resources.install
python -m graph_notebook.nbextensions.install

jupyter notebook ./index.ipynb

requirements.txt:

notebook==5.7.10
tornado==4.5.3
graph-notebook==2.0.10

Expected behavior
The graph tab renders and displays a visual of the graph.

Screenshots
image

Desktop (please complete the following information):

  • OS: iOS 10.14.6
  • Browser: chrome
  • Version: 88

Additional context
I do get a dependency clash when running pip install:

graph-notebook 2.0.10 has requirement Jinja2==2.10.1, but you'll have jinja2 2.11.3 which is incompatible.

And also the following errors in the console when running the notebook:

Could not open static file ''
404 GET /static/components/react/react-dom.production.min.js (::1) 16.75ms referer=http://localhost:8888/tree?token={token}
404 GET /static/components/react/react-dom.production.min.js (::1) 1.76ms referer=http://localhost:8888/tree?token={token}

I've also tried running

jupyter nbextension enable graph_notebook.nbextensions --py --sys-prefix

after running

python -m graph_notebook.nbextensions.install

Not sure if it's needed but didn't have any effect on the issue.

Support for Tornado 6.x

Is your feature request related to a problem? Please describe.
Tornado 4.5.3 is older and it is tough to get it approved in higher-security environments.

Describe the solution you'd like
I would like to see tornado 6.x supported.

Describe alternatives you've considered
We're dealing with the existing requirement, but it presents internal complexities.

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Neptune_ML widget error in 2.0.9

Describe the bug
Starting in version 2.0.9 the neptune_ml widget is having an issue where the json values being passed in are getting the following error

{'error': JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)}

To Reproduce
Steps to reproduce the behavior:

  1. Run through the 01-Introduction-to-Node-Classification-Gremlin notebook
  2. When you get to the export step the error occurs

Additional context
This is not a problem in version 2.0.7

[BUG] seed command errors when file has blank line

Describe the bug
When creating a seed file, the command fails if that file has a blank line in it. The error message is not descriptive of the problem:

{
  "code": "MalformedQueryException",
  "requestId": "a832c019-8ae5-4445-b9bf-5f8d098b225a",
  "detailedMessage": "Failed to interpret Gremlin query: Query parsing failed at line 1, character position at 0, error message : mismatched input '<EOF>' expecting {EmptyStringLiteral, 'g'}"
}

The seed command should ignore blank lines in the seed files

[BUG] Specifying the GroupBy Field when just a vertex is returned

Describe the bug
The group by field is not working when the value being returned is a vertex. I am unable to group by id.

To Reproduce
Steps to reproduce the behavior:

%%gremlin -g id

g.V().path()

Expected behavior
I would have expected that this would give a different color to each node

Workbench visualization multiple edges between two vertices have labels overlap

When you have two vertices with different edge labels between them they are being shown as two overlapping edges. This causes an issue where you cannot read either edge label. See attached images.

DETAILS:
I. We need to show different edge labels between two vertices as different edges, or we need to concatenate them so that they don't overlap making them unreadable
Screen Shot 2020-08-18 at 3 18 56 PM
Screen Shot 2020-08-18 at 3 19 00 PM

[BUG] Failed %load commands hang notebook

Describe the bug
When running a %load command that fails it continues to check which locks up the

To Reproduce
Steps to reproduce the behavior:

  1. Try use the %load command on an import that fails
  2. After the "LOAD_FAILED" message appears the cell keeps refreshing
  3. Attempt to run something in the other cell. It is stuck on [*] waiting for the load cell
  4. To fix you have to stop the kernel and reload the notebook

Expected behavior
After the LOAD_FAILED message is recieved I expect the retries to stop

Support grouping by different properties by label

Is your feature request related to a problem? Please describe.
The current group by feature only allows you to specify a single property to group all vertices by. As a user I would like to be able to specify different properties by label to group by.

i.e. For airports I want to group by region, for country I want to group by continent.

Describe the solution you'd like
I'd like some sort of JSON type structure that would let me define something like:

{
  "airport": {"groupby": "region"},
  "country": {"groupby": "continent"}
}

[BUG] Graph tab doesn't render in Amazon SageMaker Studio - Jupyter Lab

Describe the bug
When trying to execute a .path() query in Jupyter Lab the Graph tab doesn't render, instead it shows
"Tab(children=(Output(layout=Layout(max_height='600px', overflow='scroll', width='100%')), Force(network=<graph…"

To Reproduce
Steps to reproduce the behavior:

  1. Go to Jupyter Lab
  2. Run a query with .path()

Current behavior
Screenshot taken from JupyterLab

image

Expected behavior
Screenshot taken from Jupyter

image

Allow Coloring by property in Gremlin Query Visualization

Is your feature request related to a problem? Please describe.
I would like to be able to set the color of the results of my Gremlin query in the visualization

Describe the solution you'd like
When running a Gremlin query that is visualized I would like it to color the results by label by default. I would also like it to allow me to specify the property it will use to color by. (e.g. I might want to color by age/sex/user type in a social network)

Make graph_notebook_config parameters optional for non-Neptune servers

Is your feature request related to a problem? Please describe.
Currently you must specify all the fields below when connecting to any graph database, even though a lot of them are Neptune and AWS specific (e.g. iam_credentials_provider_type, aws_region). When I connected to a local Gremlin server, I would get errors for not specifying all of the parameters even though the fields for them didn't make sense.

%%graph_notebook_config
{
"host": "localhost",
"port": 8182,
"auth_mode": "DEFAULT",
"iam_credentials_provider_type": "ROLE",
"load_from_s3_arn": "",
"ssl": false,
"aws_region": "us-east-1"
}

Describe the solution you'd like
Make the following parameters optional to specify for non-Neptune endpoints:
auth_mode
iam_credentials_provider_type
load_from_s3_arn
aws_region

Describe alternatives you've considered
Make the parameters optional for all endpoints, which would be less validation logic, but for users connecting to Neptune there would be less validation feedback.

Add total execution time to %load command

Is your feature request related to a problem? Please describe.
When you use the %load command it would be nice to have the total execution time displayed so you know how long it ran

[BUG] --ignore-groups flag is not working

Describe the bug
If you have a result where the graph is grouped then adding the --ignore-groups flag it is not removing the grouping

To Reproduce
Use the --ignore-groups flag on a grouped result

[BUG] Incorrect Path pattern blocks graph tab

Describe the bug
When using the Neptune workbench to visualize queries if an incorrect path pattern is specified such as:

-p v,outv,inv

The results are shown but the graph tab is not available. This looks like the error is being silently swallowed by Jupyter. We should provide some sort of notification to the end user that the path pattern is wrong and then possibly ignore it when displaying the graph.

On closer investigation this appears to only be an issue when you include the edges in the path. If you change the traversal above to g.V().outE().inV().path() the Graph tab is not appearing for me

Sort node/edge properties by alphabetical order

Is your feature request related to a problem? Please describe.
Sort the node/edge properties by alphabetical order

Describe the solution you'd like
The node/edge properties on the details view are not sorted. I would like to see them sorted by alphabetical order to provide a consistent view of the items I am looking at.

Add option to allow the physics simulation duration to be set by the user

Today the Physics simulation is hard coded to never run for longer than 1.5 seconds (https://github.com/aws/graph-notebook/blob/main/src/graph_notebook/widgets/src/force_widget.ts#L142).

This makes sense in a lot of cases as simulations can run for a very long time otherwise and it needs to be stopped at some point. However, some simulations really do need to run for longer (say 30 seconds or a minute) to achieve a pleasant looking result - especially when more than a few nodes and edges are part of the simulation. It would be nice to expose a way for a user to adjust the maximum. simulation duration. Today you can jiggle vertices with the mouse pointer to get some additional computation but it is not sufficient.

Add ability to have %seed load from a specific file

Currently the %seed line magic loads data by looking for files in a specific location. Today a user could place one or more files into that location and load them using %seed. This requires using the correct file naming scheme and also finding the correct directory to put the files in. A nicer user experience would be to add a parameter allowing them to provide the name of a file to the %seed command. The assumption is that this would be a file on their local disk or the disk of a remote Jupyter server. I am not proposing the file be on an HTTP server or in an S3 bucket. As part of this we would need to briefly also document the syntax expected for Gremlin or SPARQL data in files loaded by %seed.

Something simple like %seed --file filename would be a good start. It may also make sense to add this to the widget at some time.

Allow variable injection into `sparql` and `gremlin` magics

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
A mechanism to inject things into cell magics to help parameterize queries. Such as

%%gremlin
g.V('${label}').out().path()

where label is a variable that exists in the notebook's namespace. We would then inject this value into the query. The ending query that would be submitted might look like:

g.V('Person').out().path()

Describe alternatives you've considered
Line magics have built-in parameter injection, however cell magics do not seem to have an equivalent for the cell component of the magic. Additionally, the same syntax will not work for cell magics because query languages can have {...} syntax which is valid.

Allow specifying the property shown on the node

Is your feature request related to a problem? Please describe.
I would like to be able to provide the property name that I would like displayed on the node.

Describe the solution you'd like
Currently, the name of the property displayed on the node. I would like to be able to specify the property that will be displayed

Cell magics cannot handle --help when the cell is empty

It is not possible to get help using a command like %%gremlin --help unless there is something else in the cell on another line. It would be nice if that was not required. Perhaps for argument processing we should move from directly using argparse to using @magic_arguments() decorators

Also an exception is displayed beneath the help text.

[BUG]Queries that return a map with a list as the key cause an error

Any Gremlin query that returns a map where the key is a list causes an error in the Python Gremlin client. The existing patch that handles the case when the key is a map needs to be extended to also handle lists. This is fixed in TinkerPop 3.5.0 (not yet released) but for now we could update the existing patch we have in place.

The area that needs changing is https://github.com/aws/graph-notebook/blob/65a69d5df36c7ee30d8f6fd16c30abbcf7dd917[…]gremlin/client_provider/graphsonV3d0_MapType_objectify_patch.py#L42

Using the airports data set from %seed you can see the error with this query

g.V('44').group().by(out().fold()).by(out().count())

{'error': TypeError("unhashable type: 'list'",)}

Note that even this does not fix all of the issues that the TinkerPop 3.5.0 fix addresses where a new HashableDict type is used. The other option is to back port those fixes until 3.5.0 releases.

Error installing widgets nbextensions

Hi. When running through the setup I hit an error while trying to install the widget nbextensions.

I'm running the Jupyter notebook in a local env using Python 3.6.12 and graph-notebook version 2.0.9.

Steps followed:

python -m venv env
source env/bin/activate
pip install -r requirements.txt

jupyter nbextension install --py --sys-prefix graph_notebook.widgets

requirements.txt

notebook==5.7.10
tornado==4.5.3
graph-notebook==2.0.9

Running the last command jupyter nbextension install --py --sys-prefix graph_notebook.widgets returns the error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/{user}/repos/graph-notebook/env/lib/python3.6/site-packages/graph_notebook/widgets/nbextension/static'

running which jupyter returns /Users/{user}/repos/graph-notebook/env/bin/jupyter, so its running in the local environment.

Any ideas?
Thanks.

Ability to return query results for use in other notebook cells

Is your feature request related to a problem? Please describe.
Customers want a way to receive query results from our magic commands for use in other notebook cells. For instance, some way to obtain the result from the below cell as a variable foo:

%%gremlin

g.V().out().path()

Add pre-commit hook check for Flake8 compatibility

Is your feature request related to a problem? Please describe.
One of the checks performed by the CI system is to pass Flake 8. We should add a script or pre-commit hook information about how to run a flake8 check to ensure that this CI test won't fail

[BUG] Selection behavior is not consistent

Describe the bug
When grouping is enabled the selection behavior is not consistent.

The border is different between the selected and searched vertices when you use the search bar as well as select a non-highlighted field.

When you have a query this the one below and select the edge vertex the selection is highlighted differently than others. It is completely highlighted, the text changes to white, and the text does not change back when unselected.
g.V().has('code', 'ANC').outE().inV().path().by().by('dist')

Include metadata like query time when executing a magic command

Is your feature request related to a problem? Please describe.
Query time is not displayed in the result of queries such as %%sparql. This can be obtained by appending %time over a cell but then we are timing the entire cell's execution and not the query itself.

Describe the solution you'd like
Another output tab for metadata such as time to complete the query, response size, etc for our %%sparql and %%gremlin magics

Describe alternatives you've considered
%time works for now, but it is going to record more than just the time to execute a query against whatever database is connected to.

Notebook variables in Gremlin cell magic

Is your feature request related to a problem? Please describe.
Related to a limitation. Often a value defined in the Python notebook needs to be used in a query, but today that is not possible, and the value needs to be copied and pasted in the cell query.

Describe the solution you'd like
A way of referencing a variable in the cell magic. Eg.: ${variable_name}

Describe alternatives you've considered
Not ideal, but string formatting a query and executing the query string.

query = f'g.V({variable_1})'
res = gremlin.execute(query)

Additional context
SQL and Spark notebook magics provide a similar feature.

Add ability to return details/errors from %load_status

Is your feature request related to a problem? Please describe.
It would be nice to have a switch(s) that would allow for retrieving the details and the errors associated with a specific load id. This is available via the REST endpoint with the errors=true and details=true query parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.