CrewAI RAG Deep Dive

This repository contains a deep dive into using CrewAI with RAG (Retrieval-Augmented Generation) techniques. The project showcases how to set up and utilize various agents, tools, and tasks within CrewAI to perform specific operations, such as analyzing PDFs and YouTube channels, extracting information, and generating structured outputs.

Project Structure
Setup and Installation
Agents and Tasks
Examples
YouTube API Setup
Goal
Contributing
License

Project Structure

crewai-rag-deep-dive/
├── .vscode/
│ └── settings.json
├── 1_pdf/
│ ├── .env
│ ├── 1_crew.py
│ ├── 2_crew_custom_model_and_embed.py
│ └── example_home_inspection.pdf
├── 2_youtube_and_web/
│ ├── tools/
│ │ ├── **init**.py
│ │ ├── AddVideoToVectorDBTool.py
│ │ └── FetchLatestVideosFromYouTubeChannelTool.py
│ ├── .env
│ ├── crew.py
│ └── main.py
├── .gitignore
├── poetry.lock
├── pyproject.toml
└── README.md

Overview of Key Files and Directories

1_pdf/: Contains code and environment configurations for working with PDF documents.
- 1_crew.py: Basic setup for processing home inspection PDFs.
- 2_crew_custom_model_and_embed.py: Custom configurations for processing PDFs using specific LLMs and embedders.
- example_home_inspection.pdf: Sample PDF document used for testing.
2_youtube_and_web/: Contains code and tools for processing YouTube channels and videos.
- tools/: Directory containing custom tools.
  - AddVideoToVectorDBTool.py: Tool for adding YouTube videos to a vector database.
  - FetchLatestVideosFromYouTubeChannelTool.py: Tool for fetching the latest videos from a YouTube channel.
- crew.py: Main script for setting up agents and tasks related to YouTube processing.
- main.py: Entry point for running YouTube processing tasks.

Setup and Installation

Clone the repository:

git clone https://github.com/bhancockio/crewai-rag-deep-dive.git
cd crewai-rag-deep-dive

Install dependencies: Ensure you have Poetry installed.
```
poetry install --no-root
```
Set up environment variables: Create a .env file in the root directory and in relevant subdirectories (1_pdf, 2_youtube_and_web) with your API keys and other configurations.
```
YOUTUBE_API_KEY=your_youtube_api_key
OPENAI_API_KEY=your_openai_api_key
# Add other necessary environment variables
```

Agents and Tasks

PDF Processing Agents and Tasks

Agents

Manager Agent: Manages the workflow and delegates tasks.
Research Agent: Searches through the PDF to find relevant answers.
Professional Writer Agent: Writes professional emails based on the research agent's findings.

Tasks

Answer Customer Question Task: Searches the PDF to find answers to customer questions.
Write Email Task: Generates a professional email to contractors based on the research findings.

YouTube Processing Agents and Tasks

Agents

Scrape Agent: Extracts content from YouTube videos and adds it to the vector database.
Vector DB Processor: Adds YouTube videos to the vector database.
General Research Agent: Gathers all required information from the YouTube channel.
Follow-up Agent: Performs thorough research to find any missing data.
Fallback Agent: Conducts final checks and searches the internet for any remaining information.

Tasks

Scrape YouTube Channel Task: Extracts information from the latest five videos of a specified YouTube channel.
Process Videos Task: Adds the extracted video URLs to the vector database.
Find Initial Information Task: Fills out the ContentCreatorInfo model with as much information as possible.
Follow-up Task: Searches for any missing data in the ContentCreatorInfo model.
Fallback Task: Performs final checks to ensure the ContentCreatorInfo model is fully populated.

Examples

Running PDF Processing

To run the PDF processing crew, navigate to the 1_pdf directory and execute the script:

cd 1_pdf
python 1_crew.py

Running YouTube Processing

To run the YouTube processing crew, navigate to the 2_youtube_and_web directory and execute the script:

cd 2_youtube_and_web
python crew.py

YouTube API Setup

To use the YouTube Data API v3 for this project, follow these steps:

Enable the YouTube Data API v3:
- Go to the YouTube Data API v3 page on Google Cloud Console.
- Click on Enable.
Create API Credentials:
- Go to the API Credentials page on Google Cloud Console.
- Click on Create Credentials and select API Key.
- Copy the generated API key and add it to your .env file as YOUTUBE_API_KEY.

Goal

The primary goal of this project is to help people get comfortable with using RAG (Retrieval-Augmented Generation) techniques. This includes:

Scraping: Extracting content from various sources.
Embedding: Adding content to a vector database.
Querying: Searching for information within the vector database.
Making and Using Tools: Creating custom tools and using existing tools effectively.

Use Cases

Searching for Information in a Vector Store: If the information is not found, look elsewhere.
- Example: Hiring a job candidate and searching their resume.
- Example: Sales job needing information about potential customers.
- Example: Company looking through internal docs to answer a question before falling back to the web.

Contributing

We welcome contributions to enhance the functionality and features of this project. Please follow these steps to contribute:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Make your changes.
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature-name).
Create a new Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

lucesgabriel / crewai-rag-deep-dive Goto Github PK