This Adaptive Code Assistant is designed to facilitate querying within large codebases through a natural language processing system. It leverages a combination of machine learning models and indexing technologies to parse, index, and retrieve code snippets efficiently, responding to user queries through a conversational interface.
-
File Reading and Indexing: The system scans specified directories for source code files (specifically
.cpp
,.c
,.h
,.hpp
,.pdf
,.md
files (supports other file formats)), reading and indexing them to create a searchable dataset. -
Embedding Generation: Utilizing the
SentenceTransformer
model, the script generates embeddings for each line of code. These embeddings capture the semantic meaning of the code, allowing for efficient similarity-based retrieval. -
FAISS Index Creation: The embeddings are then used to create a FAISS index, which facilitates fast and efficient similarity searches within the dataset.
-
Conversational Interface: The application uses Streamlit to create a chat-style interface, where users can input their queries and receive responses. The responses are generated by a language model, which uses the indexed data to provide relevant and context-aware answers.
-
Language Model Integration: At the core of the response generation is a language model from the
CTransformers
class, configured to handle conversational queries about the code. This model is integrated into a LangChain workflow that combines retrieval and generative capabilities.
Access all necessary files for the ACA project via the following Google Drive link:
ACA Project Files
Modify the target_path
in the file /AdaptiveCodeAssistant/src/ui/adaptive_code_assistant.py
at line 128. Change it from:
target_path = os.path.join(root_directory, 'dataset')
to:
target_path = "C:/x/x"
To activate the virtual environment, either double-click or run the following PowerShell script from the command line:
C:/XX/XX/AdaptiveCodeAssistant/Scripts/Activate.ps1
Start the project by running the following command:
streamlit run C:/XX/XX/AdaptiveCodeAssistant/src/ui/adaptive_code_assistant.py
Note: If you download the project from the provided Drive link, no additional steps are necessary.
- Python 3.8+
- Libraries:
streamlit
,transformers
,faiss
,numpy
,torch
,sentence-transformers
,langchain
- Adequate storage for embeddings and index files, depending on the size of the codebase.
Install the required Python packages using pip:
pip install streamlit transformers faiss-cpu numpy torch sentence-transformers langchain
To run the application, execute the following command from the terminal:
streamlit run adaptive_code_assistant.py