Giter VIP home page Giter VIP logo

langchain-webscraper-demo's Introduction

langchain-webscraper-demo

This is a small demo project illustrating how to create a chatbot that can query a scraped website. It uses LangChain to manage the chatbot's framework, Gradio for a user friendly interface, OpenAI's gpt-3.5-turbo LLM model, and ChromaDB for as a vector store.

This project accompanies a blog post on my website, and can be read here.

Getting started

This project supports both pip and pipenv. I recommend using pipenv for the best (and least error prone) experience.

Installation

Pip

Run

pip install -r requirements.txt

if using pip.

Pipenv

Run

pipenv install

if using pipenv, followed by pipenv shell to start a shell with the installed packages.

Environment variables

You need to create a new .env file from the .env.example file with your OPENAI_API_KEY. You can create one of these on OpenAI's platform. This will require an OpenAI developer account.

Web scraping

To scrape a site, run

python scrape.py --site <site_url> --depth <int>

This will scrape a url and all links found at that url recursively up to the specified depth. This will only scrape sites with the same origin as the given <site_url>, so for example scraping https://python.langchain.com/docs will only scrape sites at https://python.langchain.com.

The data will be stored in a new scrape/ directory.

Data embeddings

To generate and persist the embeddings and create a vector store, run

python embed.py

A new persisted vector store will be created in the chroma/ directory.

Launching the chatbot

To launch the chatbot, you can run

python main.py

This will start a Gradio server at http://127.0.0.1:7860, allowing you to chat to the scraped website and data store.

NOTE: you must both first scrape a site and persist a vector store in order for this to work.

langchain-webscraper-demo's People

Contributors

jasonrobwebster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.