qpxdesign / trec-ikat Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 52.21 MB

An attempt of the iKAT (Interactive Knowledge Assistance Track) Track of the 2023 run of the NIST's TREC.

Python 99.03% Shell 0.97%

trec-ikat ai bert llama2 llm information-retrieveal nist fastchat-t5 sentance-transformers

trec-ikat's Introduction

TREC-iKAT

Official Page

Resources

FastChat T5 - large language model thats much better at summarization than LLAMA2
Sentance-Transformers - a Python framework for state-of-the-art sentence, text and image embeddings, based on BERT
Pyserini - a Python toolkit for reproducible information retrieval research with sparse and dense representations (currently using to get passages from query, using BM25-based searcher)
LLAMA2 - large language model with a built-in chat model, on-par with ChatGPT (using 7B params chat rn)
WikiText - The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Produced by Cloudflare's Einstien AI Research Lab, we're using it to train text classifer on what kind of passages to look for
scikit-learn - toolkit for AI based classification, analysis, and more, currently we're using it to determine reliability of passages

Notes:

install pip dependencies: pip install -r requirements.txt MacOS instructions of llama.cpp python

run bash scripts/install-data.sh to install llama model (13B-Chat) from my server (~8 GB)

run python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0 to install fastchat-t5 model (~7 GB)

running ptkb_similarity or rank_passage_sentances (both are run in full-run) will download several BERT models (a few GB total)

set PYTHON_PATH variable: export PYTHONPATH=$PWD:$PYTHONPATH

install wikitext-103 from Cloudflare Einstien (~600MB Uncompressed), unzip it, and drag it into data/text-classification (used to train passage classifier)

install news articles corpus from Kaggle (~2GB) and place it in data/news-articles (used to train 'less strict' passage classifier)

if you're using ChatGPT instead of Llama, generate an OPENAI API Key, create a .env in the main dir, and put your key in it: OPENAI_API_KEY=<KEY GOES HERE> please note that a full-run using ChatGPT may use ~$1-3 of credit

System Requirements

This was tested/developed/ran from a computer running Ubuntu 22.04 with an RTX 3080 (10GB Version), an Intel i7-11700K (16 total CPU threads). Runs took between 3 and 28 hours, depending mostly on which LLM was used to generate the final responses. There are some tweaks you may need to make if you have different hardware:

Change n_threads in utils/llama2.py to however many CPU threads you have (if you're running on CPU)
- remove n_gpu_layers=30 if you're not running on GPU
change references to "device=cuda" in utils/ptkb_similarity.py and utils/rank_passage_sentences.py
switch which model of LLaMa you're running depending on your system's capabilities:
- Use quantized versions of LLaMa2, and, if you want to run on GPU or you have limited RAM, make sure you have more ram than the listed RAM usage (if you have too little VRAM to run a model but enough ram, uninstall llama-cpp-python and reinstall without GPU support: pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir)

Running Pyserini with clueweb22

Place ikat collections (named ikat_collection_2023_0n.json) into /data/clueweb/
Format by running bash scripts/format_ikat_collection.sh (this will take a long time)
Generate the index: python -m pyserini.index.lucene --collection JsonCollection --input ~/TREC-iKAT/data/clueweb --index indexes/ikat_collection_2023 --generator DefaultLuceneDocumentGenerator --threads 1 --storePositions --storeDocvectors --storeRaw

Helpful Commands

Format large output JSON files with cat output/SEP1_RUN_1-2.json | python -m json.tool > output/STAR3.json

Runs We're Submitting (SUBJECT TO CHANGE) (IN ORDER)

(Automatic) 2 shot approach
(Automatic) 1 shot approach
(Automatic) 1 shot approach without TF-IDF Reliability Model
(Manual, using provided PTKBs and resolved_utterance) using 1 shot approach, same as (1)

trec-ikat's People

Contributors

Stargazers

Watchers

Forkers

etanger

trec-ikat's Issues

unsure of proper JSON structure for submisson

should turns all be in one array or should turns be in a nested array for each topic?

trailing off after colon

Example: Here are some good ways to loose weight: (end of output)

Incorrect trimming of 'listing' response in prevent_trail_off

EXAMPLE: “Sure, I'd be happy to explain how to properly saute food!Sauteing is a cooking technique that involves quickly cooking ingredients in a small amount of oil or fat over high heat. This method helps to preserve the nutrients in the food, especially heat-sensitive vitamins and minerals like vitamin C and B vitamins. To properly saute food, follow these steps:1. Choose the right pan: A stainless steel or cast iron pan is ideal for sauteing, as they retain heat well and distribute it evenly.2. Add oil or fat: Use a small amount of oil or fat with a high smoke point, such as olive oil, avocado oil, or ghee. This will help to prevent the oil from burning or smoking during cooking.3. Heat the pan: Preheat the pan over medium-high heat for about 2-3 minutes before adding the ingredients. This will ensure that the pan is hot enough to quickly cook the food.4."

if passage doesn't contain enough/right info it will refuse to answer question

implement n shot support (rather than just one or zero being hardcoded)

answer_question_from_passage doesn't take into account conversation history

combined_passage_summaries length exceeds 250 spacy tokens

straight up didn't respond to question

final answer was just blank

double generating passage summaries, drastically diminishing performance

passage summaries are first generated in chatgpt where we use them to check passage relevance, and if the passage passes that check, we regenerate them for combined_passage_summaries

also report reliability score from passage classifier

dialogue history for LLAMA is based on responses given in the test json

ptkb selection could use some work

User Question: I prefer a natural diet, not a pill-based diet. Which of the aforementioned ones is natural?
PTKBs Chosen:
{
"id": "5",
"text": "I'm vegetarian.",
"score": 0.3468347191810608
},
{
"id": "10",
"text": "I'm an Android user.",
"score": 0.309503972530365
},
{
"id": "6",
"text": "I'm lactose intolerant.",
"score": 0.2509726285934448
}

put passages we used (ones marked with "used":True) at top of passage_provenance in run json for easier readability

responses mention fact they are based off passages, not sure if thats desired behavior

example: "Sure! Based on the passages provided, here are some recommendations..."

switch from basic LucerneSearcher to a HybridSearcher/DenseSearcher for better results when finding passages

https://github.com/castorini/pyserini/blob/master/docs/usage-search.md

retrieved passages arent being used in llm to generate responses

potential fix: use double llm method

user input & PTKBs →LLama2 response →BM25/Pyserni to get passages →use Fastchat T5 Summerazation LLM to summerize passages →gen response from summaries and LLama2 response

llama-cpp-python not saturating GPU

when llama-cpp-python is generating a response, only ~1200mb out of 8000mb of VRAM gets used, potentially leading to increased gen times

llm responses cut off due to hard 250 token limit

EXAMPLE: "Congratulations on getting an interview with Google! That's definitely something to be proud of. I can give you some general tips that might help you prepare for your interview.

First, make sure you understand the basics of the company and the position you are applying for. Research is key in preparing for an interview. Look up information about Google on their website and other reputable sources to get a better understanding of what they do and how they operate. This will also give you an idea of what they might be looking for in a candidate.

Next, think about your own skills and experiences that could relate to the position you are applying for. Consider specific examples of projects or work that demonstrate these skills. Prepare concise and clear explanations of how you applied these skills in real-world situations. It's also helpful to have a few questions prepared in advance to ask your interviewer(s). This shows that you are genuinely interested in the company and the position, as well as any other relevant details you might want to know.

Finally, relax! It's normal to feel nervous before an interview, especially if it's your first one. Remember to (cuts off)"

weird 'meta' response

"generated_prompt": "No, I think that's all for now. Thank you for your help!",
"text": " Sure, here is a response to the question "No, I think that's all for now. Thank you for your help!" using the information provided earlier:"You're welcome! It was my pleasure to assist you with your gourmet dishes and plating techniques. If you have any other questions or need further guidance in the future, don't hesitate to reach out.",

can't use 70B param llama2 - doesn't have option for n_ctx

use two or n shot approach

sometimes it takes multiple cycles for llama to elaborate enough on the question to get relevant keyworks that can actually find good passages

Example:
"user_utterance": "Can you eat fish in any of them?"
"preliminary_response": " No, none of these diets allow for fish consumption because all are strictly vegan."
"secondary_response": All of the Vegan Mediterranean, Vegan Keto, Eco-Atkins, and Ornish Diets diets mentioned before are strictly plant-based and do not include any animal products, including fish. So, if you're a vegetarian who excludes fish, these diets may not be suitable for you. The Ornish Diet is the only one of these diets that allows for limited exercise. The others all emphasize whole grains, legumes, and vegetables with minimal or no animal products. It's important to note that while transition diets allow for some animal products, they are designed to help individuals gradually move towards a strictly plant-based eating plan. Pescatarians include fish, eggs, dairy products, and vegetables, while vegans exclude all animal products, including fish. Vegetarian diets offer flexibility in terms of the types of food consumed, while vegan diets are more restrictive but offer a broader range of options. Ultimately, choosing a diet that aligns with your personal preferences and health goals is important.

possible solution - keep finding passages until we're finding passages of a certain score / use LLM relevancy check

Conduct run without TF-IDF reliability check

just to see how that changes which passages are selected

passage scores not indicator of which passages we're using, since some passages that were thrown out by TF-IDF reliability check or ChatGPT relevancy check may still have higher pyserni scores

unclear whether we should use only the first topic in our submission or if we can choose which topic to submit

if passage selection is bad it will generate some nonsense

User: “Ok, good info. Now, tell me about the next option of weight loss.”

Chatbot: “I would suggest that the author of the hCG Diet Workbook for Phase 2 seek legal advice from a solicitor specializing in consumer law to understand their rights and options for recovering their money. They may also want to consider consulting with a financial advisor or accountant to assess the feasibility of their business plan and to explore potential tax implications. Additionally, the author could reach out to local business organizations or entrepreneurial networks to seek guidance on how to navigate this situation and protect themselves from future financial loss. Finally, it may be beneficial for the author to speak with a mental health professional to address any emotional stress or anxiety they may be experiencing due to this situation.”

Likely Cause: Bad passage selection