Giter VIP home page Giter VIP logo

ee16b-ai-chat's Introduction


Logo Logo Logo

EE16B AI Chatbot

EE16B AI Chatbot ~ trained on official course website

screen-recording.mp4

Table of Contents

    ๐Ÿ“ About
    ๐Ÿ’ป How to build ๐Ÿš€ Next steps ๐Ÿ”ง Tools used
    ๐Ÿ‘ค Contact

๐Ÿ“ About

More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on all Spring 2023 lectures. EE16B students, staff, and more generally anyone can use this repo and adjust to their liking.

UC Berkeley ๐Ÿป๐Ÿ”ต๐ŸŸก โ€ข EE16B: Designing Information Devices and Systems II โš™๏ธ โ€ข Spring 2023

(back to top)

๐Ÿ’ป How to Build

Note: macOS version, adjust accordingly for Windows / Linux

Initial setup

Clone the repo and install dependencies.

git clone https://github.com/vdutts7/ee16b-ai-chat
cd ee16b-ai-chat
pnpm install

Create a .env file and add your API keys (refer .env.local.example for this template):

OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""

Get API keys:

IMPORTANT: Verify that .gitignore contains .env in it.

Prepare Supabase environment

I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc

You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql. You should now have a documents table created with 4 columns.

Embed and upsert

Inside the config folder is the transcripts folder with all lectures as .txt files and the corresponding JSON files for the metadatas. .txt files were scraped from the lecture recordings separately ahead of time but OpenAI's Whisper is a great package for Speech-to-Text transcription). Change according to preferences. pageContent and metadata are by default stored in Supabase along with an int8 type for the 'id' column.

Manually run the embed-script.ipynb notebook in the scripts folder OR run the package script from terminal:

npm run embed

This is a one-time process and depending on size of data you wish to upsert, it can take a few minutes. Check Supabase database to see updates reflected in the rows of your table there.

Technical explanation

This code performs the following:

  • Installs the supabase Python library using pip. This allows interaction with a Supabase database.

  • Loads various libraries:

    supabase - For interacting with Supabase

    langchain - For text processing and vectorization

    json - For loading JSON metadata files

  • Loads the Supabase URL and API key from .env. This is used to create a supabase_client to connect to the Supabase database.

  • Loads text data from .txt lecture transcripts and JSON metadata files.

  • Uses a RecursiveCharacterTextSplitter to split the lecture text into chunks. This allows breaking the text into manageable pieces for processing. Chunk size and chunk overlap can be changed according to preference and basically control the amount of specificity. A larger chunk size and smaller overlap will result in fewer, broader chunks, while a smaller chunk size and larger overlap will produce more, narrower chunks.

  • Creates OpenAI text-embedding-ada-002 embeddings. This makes several vectors of 1536 dimensionality optimized for cosine similarity searches. These vectors are then combined with the metadata in the JSON files along with other lecture-specific info and upserted to the database as vector embeddings in row tabular format i.e. a SupabaseVectorStore.

visualized-flow-chart

Run app

Run app and verify everything went smoothly:

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done โœ…

๐Ÿš€ Next steps

Deploy

I used Vercel as this was a small project.

Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.

Customizations

UI/UX: change to your liking.

Bot behavior: edit prompt template in /utils/makechain.ts to fine-tune and add greater control on the bot's outputs.

Data: modify .txt files in /config/transcripts and main script in /scripts/embed-script.ipynb

(back to top)

๐Ÿ”ง Tools used

Next Typescript Langchain OpenAI Supabase Tailwind CSS Vercel

(back to top)

๐Ÿ‘ค Contact

[email protected]

๐Ÿ”— Project Link: https://github.com/vdutts7/ee16b-ai-chat

(back to top)

ee16b-ai-chat's People

Contributors

vdutts7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.