Hanan Gani, Rohit K Bharadwaj, Muhammad Huzaifa
Official code for our NLP701 Project "Analyzing the Robustness and the Reliability of Large Language Models"
Large Language Models (LLMs) are rapidly gaining traction in a variety of applications, performing impressively in numerous tasks. Despite their capabilities, there are rising concerns about the safety and the reliability of these systems, particularly when they are exploited by malicious users. This study aims to assess LLMs on two critical dimensions: Robustness and Reliability. For the Robustness component, we evaluate the robustness of LLMs against in-context attacks and adversarial suffix attacks. We further extend our analysis to Large Multi-modal models (LMMs) and examine the effect of visual perturbations on language output. Regarding Reliability, we examine the performance of well-known LLMs by generating passages about individuals from the WikiBio dataset and assessing the incidence of hallucinated responses. Our evaluation employs a black-box protocol conducted in a zero-resource setting. Despite security protocols embedded inside these models, our experiments demonstrate that these models are still vulnerable to different attacks.
- We observe that GPT-3.5, and GPT4 perform much better than existing open-source LLMs in terms of hallucinations as they have lower hallucination scores.
- Performance difference between GPT-3.5 and GPT-4 is almost negligible when we consider BERTScore to evaluate for hallucinations.
To replicate the environment for the project and run the code, follow these steps:
- Clone the Repository:
git clone [email protected]:rohit901/LLM-Robustness-Reliability.git cd LLM-Robustness-Reliability
- Create and Activate a Conda Environment:
conda create --name nlp_project python=3.11 conda activate nlp_project
- Install Pip Packages:
pip install -r requirements.txt
- To Generate Data From GPT3_5/GPT4:
Make sure that you have your OpenAI API key saved in a
python reliability/generate_gpt_data.py
.env
file, and place that file inreliability/.env
. The content of the file should be:OPENAI_API_KEY=sk-<your_api_key>
- To Evaluate Hallucination using SelfCheckGPT-Prompt:
python reliability/evaluate_selfcheckgpt_prompt.py
- To Evaluate Hallucination using BERTScore:
python reliability/evaluate_bertscore.py
The data for the reliability experiments can be downloaded from the following link:
Download Reliability Experiments Data
- For queries regarding Reliability part, contact [email protected]
- For queries regarding Robustness part, contact [email protected], [email protected]