neo4j-labs / llm-graph-builder Goto Github PK
View Code? Open in Web Editor NEWNeo4j graph construction from unstructured data
License: Apache License 2.0
Neo4j graph construction from unstructured data
License: Apache License 2.0
On selection of LLM - user shall be able to process the files on basis of preferred LLM.
Need to create a common function to call the API's. Some function name changes and logic formatting
can we make all environment variables uppercase.
and add a section to the backend readme on configuration and the env file.
also call out in the file and readme what is optional and what can be overriden e.g. from the client.
It should be more aligned with the usual style of config variables that we use elsewhere:
#OPENAI_API_KEY="sk-..."
DIFFBOT_API_KEY=""
NEO4J_URI=""
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD=""
I tried to run the app, it still creates duplicates of the file with the same name
and when trying to process the file I get a 422 error
backend | INFO: 172.18.0.1:44740 - "GET /sources_list HTTP/1.1" 200 OK
backend | INFO: 172.18.0.1:44732 - "GET /health HTTP/1.1" 200 OK
backend | INFO: 172.18.0.1:44748 - "GET /health HTTP/1.1" 200 OK
backend | INFO: 172.18.0.1:44754 - "GET /sources_list HTTP/1.1" 200 OK
backend | INFO: 172.18.0.1:35078 - "POST /sources HTTP/1.1" 200 OK
backend | INFO: 172.18.0.1:55262 - "POST /extract HTTP/1.1" 422 Unprocessable Entity
Handle Model Bug fix
Change failed Response Alert position from top center to bottom left
Add a Check for disable state of Generate graph button and dropdown
change the API name from /predict to /extract
spell out knowledge graph in the description
rename the body object in the docs to something more consistent and descriptive from Body_kg_creation_predict_post
also add metadata about the file:
:Source
node (or equivalent if the graph transformer already creates a metadata node) in the graphand in the response at least prepare the numeric processingTime
and nodeCount
and relationshipCount
response fields
and status
and errorMessage
Working on handling the bug found while testing
When all the files are in processing then their respective records are populating correctly in table but when lets say processing on on going of 3 files and I upload a new large file with "New" status and doesn't start processing for it then data of records in UI table is getting shuffled .
On refresh retain back to their original data.
Create a setting panel.
Add settings for LLM Dropdown, Access key and Secret Key , Embedding checkbox
When user uploads a file, hit an API [/sources] to post the the file data.
Currently user is able to upload one file at a time. Allow users to add 5 files at a time.
Integrate the relationships created on file upload processing in the table.
As per our understanding the secret key and access key if already available in the source node, put a check of its existence , if its there show the available for the processing/New .
Deploy docker containers to google cloud run , generate a URL
There seems to be a CORS issue.
-> ok seems to be related to the GH codespaces, need to make the backend URL public to make it work for the time being, should be resolved when running it with docker or deploying it elsewhere.
But also connecting to a wrong back-end? Not sure if you hard-coded it, but it should just connect to localhost:8000 on the machine where the backend-is running or the configured base-URL.
Seems you have that hard-coded
https://github.com/neo4j-labs/llm-graph-builder/blob/main/frontend/src/components/DropZone.tsx#L10
https://github.com/neo4j-labs/llm-graph-builder/issues/new
Access to XMLHttpRequest at 'https://animated-space-broccoli-jpgjg6pg59qcp7pg-8000.app.github.dev/predict' from origin 'https://studious-dollop-979pxr45x3p4p4-5173.app.github.dev' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
POST https://animated-space-broccoli-jpgjg6pg59qcp7pg-8000.app.github.dev/predict net::ERR_FAILED 404 (Not Found)
my backend is running on: https://studious-dollop-979pxr45x3p4p4-8000.app.github.dev/docs
Later: Perhaps mid-term we can even serve them from the backend as static assets.
Source
-> Document
HAS_CHILD
relationship inverse (chunk)-[:PART_OF]->(:Document)
(:Document)-[:FIRST_CHUNK]->(:Chunk)
NEXT
relationship between chunks of each documentThe front end should indicate if the back-end is running.
Right now it shows the file-drop area if neo4j/the backend is connected but there should be a clearer indication.
Add a dropdown for user to select LLM of their choice.
If there is a separate connection information provided in the front-end it should pass that to the backend in a suitable way when making requests.
e.g. for processing files the connection information of the front-end (if available) should be passed on as an extra nested payload and be used in the processing.
Same for listing sources for the table, it should use the front-end connection information.
If the backend is configured with a neo4j connection but the front-end is not connected, it should still work, then automatically using the backends connection config inside the backend.
by default should come from environment variable / .env file (using dotenv)
if not set you can use the logic you have here right now, but probably want to add a check and also test for localhost
https://github.com/neo4j-labs/llm-graph-builder/blob/main/frontend/src/utils/utils.ts#L1
When adding :Source
nodes to the graph to represent the files, add a /sources/list
endpoint that returns the list of sources ordered by updatedAt
descending and returns all the metadata, that was added/updated when creating the nodes
Implement a frontend interface element allowing users to specify the S3 bucket name and credential details required for accessing resources.
s3-bucket (with path)
optional credentials (access-key, secret-key) and region (so that public buckets can be accessed without credentials)
https://github.com/neo4j-labs/llm-graph-builder/blob/main/backend/src/main.py#L42-L43
Also add relationship count to the UI
Update the table with the API response.
Handle validations when user adds invalid url.
Handle state of banner .
Secret and Access key params confirmation.
you have an inconsistentcy on how you use BACKEND_URL -> url() sometimes {url()}sources sometimes {url()}/extract
I changed it now to always use a slash / i.e. {url()}/extract
so that you have to set the environment variable like this without a trailing slash: export BACKEND_API_URL="https://studious-dollop-979pxr45x3p4p4-8000.app.github.dev/"
ideally in url() we would remove trailing slashes
Add some sort of feedback when user clicks on "Generate Graph". The button should show that the files are processing and then indicate completed once the job is done.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.