Giter VIP home page Giter VIP logo

llm-graph-builder's People

Contributors

aashipandya avatar jexp avatar kartikpersistent avatar prakriti-solankey avatar praveshkumar1988 avatar rakshita-arora avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

llm-graph-builder's Issues

Frontend- Code Cleanup

Need to create a common function to call the API's. Some function name changes and logic formatting

Backend configuration

can we make all environment variables uppercase.

and add a section to the backend readme on configuration and the env file.

also call out in the file and readme what is optional and what can be overriden e.g. from the client.

It should be more aligned with the usual style of config variables that we use elsewhere:

#OPENAI_API_KEY="sk-..."
DIFFBOT_API_KEY=""
NEO4J_URI=""
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD=""

Duplicate entities & PDF processing fails with 422

Bildschirmfoto 2024-02-07 um 16 16 08

I tried to run the app, it still creates duplicates of the file with the same name

and when trying to process the file I get a 422 error

backend   | INFO:     172.18.0.1:44740 - "GET /sources_list HTTP/1.1" 200 OK
backend   | INFO:     172.18.0.1:44732 - "GET /health HTTP/1.1" 200 OK
backend   | INFO:     172.18.0.1:44748 - "GET /health HTTP/1.1" 200 OK
backend   | INFO:     172.18.0.1:44754 - "GET /sources_list HTTP/1.1" 200 OK
backend   | INFO:     172.18.0.1:35078 - "POST /sources HTTP/1.1" 200 OK
backend   | INFO:     172.18.0.1:55262 - "POST /extract HTTP/1.1" 422 Unprocessable Entity

Model selection state management issue

Handle Model Bug fix
Change failed Response Alert position from top center to bottom left
Add a Check for disable state of Generate graph button and dropdown

Backend API

change the API name from /predict to /extract

  • spell out knowledge graph in the description

  • rename the body object in the docs to something more consistent and descriptive from Body_kg_creation_predict_post

also add metadata about the file:

  • filename
  • file-size
  • if available file date
  • store those a :Source node (or equivalent if the graph transformer already creates a metadata node) in the graph

and in the response at least prepare the numeric processingTime and nodeCount and relationshipCount response fields
and status and errorMessage

Bug: Issue on populating data of multiple files.

Working on handling the bug found while testing
When all the files are in processing then their respective records are populating correctly in table but when lets say processing on on going of 3 files and I upload a new large file with "New" status and doesn't start processing for it then data of records in UI table is getting shuffled .
On refresh retain back to their original data.

Bug Fixing for Frontend UI

  1. Fix the white space issue by dynamically adjusting the height and it is responsive even after changing the page size in the table
  2. Fix Auto Page Shifting
  3. New items should be shown on first page rather than last
  4. If the File is Already Processed show it as Completed
  5. Removed the extra check for Disabling the Dropdown and Generate Graph Button
  6. Connection Modal should display, if user is not connected with Neo4j Database
  7. Add the Neo4j Favicon

Add Access key and secret key Check

As per our understanding the secret key and access key if already available in the source node, put a check of its existence , if its there show the available for the processing/New .

front-end-backend communication

There seems to be a CORS issue.

-> ok seems to be related to the GH codespaces, need to make the backend URL public to make it work for the time being, should be resolved when running it with docker or deploying it elsewhere.

But also connecting to a wrong back-end? Not sure if you hard-coded it, but it should just connect to localhost:8000 on the machine where the backend-is running or the configured base-URL.

Seems you have that hard-coded

https://github.com/neo4j-labs/llm-graph-builder/blob/main/frontend/src/components/DropZone.tsx#L10

  1. it should not be hard-coded but come from an .env file (also provide an example.env)
  2. it should not just hidden inside a UI component but a proper backend/REST API component !!
  3. there should be a health check that valdiates that the backend is correctly running and indicate that to the user!
https://github.com/neo4j-labs/llm-graph-builder/issues/new
Access to XMLHttpRequest at 'https://animated-space-broccoli-jpgjg6pg59qcp7pg-8000.app.github.dev/predict' from origin 'https://studious-dollop-979pxr45x3p4p4-5173.app.github.dev' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.

POST https://animated-space-broccoli-jpgjg6pg59qcp7pg-8000.app.github.dev/predict net::ERR_FAILED 404 (Not Found)

my backend is running on: https://studious-dollop-979pxr45x3p4p4-8000.app.github.dev/docs

Later: Perhaps mid-term we can even serve them from the backend as static assets.

Data Model Cleanup

  • rename Source -> Document
  • HAS_CHILD relationship inverse (chunk)-[:PART_OF]->(:Document)
  • add a single first relationship: (:Document)-[:FIRST_CHUNK]->(:Chunk)
  • create a NEXT relationship between chunks of each document

Frontend Connection Status

The front end should indicate if the back-end is running.

Right now it shows the file-drop area if neo4j/the backend is connected but there should be a clearer indication.

Front-end pass neo4j connection information to backend

If there is a separate connection information provided in the front-end it should pass that to the backend in a suitable way when making requests.

e.g. for processing files the connection information of the front-end (if available) should be passed on as an extra nested payload and be used in the processing.

Same for listing sources for the table, it should use the front-end connection information.

If the backend is configured with a neo4j connection but the front-end is not connected, it should still work, then automatically using the backends connection config inside the backend.

Backend API

When adding :Source nodes to the graph to represent the files, add a /sources/list endpoint that returns the list of sources ordered by updatedAt descending and returns all the metadata, that was added/updated when creating the nodes

  • fileName (for the time being this can be the id - unique constraint)
  • fileType
  • fileSize
  • createdAt
  • updatedAt
  • processingTime
  • status
  • errorMessage
  • nodeCount
  • relationshipCount

Update the readme

  • instructions how to run / deploy / configure
  • link to the public google cloud run URL + link to neo4j workspace
  • list of features (upload, s3/gcs, connection to neo4j, file + chunk handling, extract entities with different models, create embeddings, create kNN graph)
  • screenshot or short animated gif
  • graph model
  • screenshot of the graph model in neo4j workspace + query that I shared

Backend URL handling

you have an inconsistentcy on how you use BACKEND_URL -> url() sometimes {url()}sources sometimes {url()}/extract
I changed it now to always use a slash / i.e. {url()}/extract
so that you have to set the environment variable like this without a trailing slash: export BACKEND_API_URL="https://studious-dollop-979pxr45x3p4p4-8000.app.github.dev/"
ideally in url() we would remove trailing slashes

To Create Generate Button progress Bar

Add some sort of feedback when user clicks on "Generate Graph". The button should show that the files are processing and then indicate completed once the job is done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.