The caption_tuning from jasonz777

caption_tuning's Introduction

Problem statement:

In previous experiments, we have obtained the captions of traffic scenario images, which are supposed to reflect whether the image is dangerous or safe. A naive idea is to send the caption to a per-trained model such as llama2, and then generate the response by applying appropriate instruction and few shots. However, it turns out that the pre-trained model may not be very effective in processing the caption description in traffic domain, which thus restrict the performance on our specific task.

Fine-tuning strategy:

To improve the performance on determining whether the scenario is safe or not with limit numbers of datasets, we fine-tuned a Bert model and qwen1.5-1.8b-chat LLM model using PEFT to predict the result, which is compared with baseline models. To some degree, if fined-tuned models could achieve better performance on validation datasets, the caption about images are valuable to be adopted

Recommend Projects

jasonz777 / caption_tuning Goto Github PK

caption_tuning's Introduction

caption_tuning's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent