Giter VIP home page Giter VIP logo

llmpapers's Introduction

Resources on ChatGPT and Large Language Models

Collection of papers and related works for Large Language Models (ChatGPT, GPT-3, Codex etc.).

Contributors

This repository is contributed by the following contributors.

The automation script of this repo is powered by Auto-Bibfile. If you'd like to commit to this repo, please modify bibtex.bib or related_works.json and re-generate README.md using python scripts/run.py.

Papers

Outline

Hyperlinks

Evaluation

Survey

In-Context Learning

  • img A Survey for In-context Learning,
    by Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu et al.
    This paper surveys and summarizes the progress and challenges of ICL, including ICL's formal definition, correlation to related studies, advanced techniques (training strategies, related analysis) and potential directions.

  • img Explanation Selection Using Unlabeled Data for In-Context Learning,
    by Xi Ye and Greg Durrett

  • img In-Context Learning with Many Demonstration Examples,
    by Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu and Lingpeng Kong
    This paper proposes a LM named EvaLM to scale up the sequence length (trained with 8k tokens per batch line). Experiments based on EvaLM prove that in-context learning can achieve higher performance with more demonstrations under many-shot instruction tuning (8k) and further extending the length of instructions (16k) can further improve the upper bound of scaling in-context learning.

  • img Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning,
    by Xinyi Wang, Wanrong Zhu and William Yang Wang

  • img Finding Supporting Examples for In-Context Learning,
    by Xiaonan Li and Xipeng Qiu

  • img The Learnability of In-Context Learning,
    by Noam Wies, Yoav Levine and Amnon Shashua

  • img Meta-learning via Language Model In-context Tuning, img img img
    by Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis and He He
    This paper proposes in-context tuning, which recasts task adaptation and prediction as a simple sequence prediction problem: to form the input sequence, concatenate the task instruction, labeled in-context examples, and the target input to predict; to meta train the model to learn from in-context examples, finetune a PLM to predict the target label given the input sequence on a collection of tasks (very similar to MetaICL). On LAMA and BinaryClfs, the proposed method outperforms MAML.

  • img MetaICL: Learning to Learn In Context, img img
    by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi
    MetaICL proposes a supervised meta-training framework to enable LMs to more effectively learn a new task in context. In MetaICL, each meta-training example includes several training examples from one task that will be presented together as a single sequence to the LM, and the prediction of the final example is used to calculate the loss.

  • img Selective Annotation Makes Language Models Better Few-Shot Learners, img img img img img img img
    by Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf et al.
    This paper proposes a graph-based selective annotation method named vote-k to
    (1) select a pool of examples to annotate from unlabeled data,
    (2) retrieve prompts (contexts) from the annotated data pool for in-context learning.
    Specifically, the selection method first selects a small set of unlabeled examples iteratively and then labels them to serve as contexts for LLMs to predict the labels of the rest unlabeled data. The method selects the predictions with highest confidence (log probability of generation output) to fill up the selective annotation pool.

  • img Improving In-Context Few-Shot Learning via Self-Supervised Training, img
    by Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov and Zornitsa Kozareva
    This paper proposes to use self-supervision (MLM, NSP, CL, etc.) between pre-training and downstream usage to teach the LM to perform in-context learning. Analysis reveals that:
    (1) benefits of self-supervised depends on the amount of training data,
    (2) semantic similarity between training and evaluation tasks matters,
    (3) adding training objectives without diversity does not help,
    (4) model performance improves when choosing similar templates for both self-supervised and downstream tasks,
    (5) self-supervised tasks and human-annotated datasets are complementary,
    (6) self-supervised-trained models are better at following task instructions.

  • img Instruction Induction: From Few Examples to Natural Language Task Descriptions,
    by Or Honovich, Uri Shaham, Samuel R. Bowman and Omer Levy
    (1) 探索了利用LLM在几个样本的情况下归纳出任务指令的能力;
    (2) 测量两个指标:1. 模型归纳指令与人类归纳的指令对比,2. 利用模型归纳的指令作为prompt进行预测的执行准确率;
    (3) 相比于GPT-3,InstructGPT效果更好,理所当然。

  • img Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity, img img
    by Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel and Pontus Stenetorp
    (1) This work demonstrates that few-shot prompts suffer from order sensitivity, in that for the same prompt the order in which samples are provided can make a difference to model performance.
    (2) This work introduces a probing method which constructs an artificial development set by language models themselves to alleviate the order sensitivity problem.

  • img Learning To Retrieve Prompts for In-Context Learning, img img img img img img
    by Ohad Rubin, Jonathan Herzig and Jonathan Berant
    This paper proposes a method to retrieve good contexts for in-context learning. Specifically, the method
    (1) uses an unsupervised retriever (BM25/SBERT) to obtain a set of context candidates,
    (2) passes the candidates to a scoring model (GPT-Neo/GPT-J/GPT-3/Codex) and select the top/bottom k as positive/negative examples,
    (3) uses the examples to train a dense retriever (BERT-based).

  • img Active Example Selection for In-Context Learning, img img img
    by Yiming Zhang, Shi Feng and Chenhao Tan
    (1) This paper revisits the effect of example selection (re-ordering & calibration) for ICL, observing that a large variance across set of demonstration examples still exists.
    (2) This paper applies reinforcement learning (Q-Learning) to optimize example selection by formulating this task as sequential decision-making problem, which is appropriate for example selection from unlabeled datasets.

  • img Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator,
    by Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo and Sang-goo Lee

  • img Measuring Convergence Inertia: Online Learning in Self-adaptive Systems with Context Shifts,
    by Elvin Alberts and Ilias Gerostathopoulos

  • img An Explanation of In-context Learning as Implicit Bayesian Inference,
    by Sang Michael Xie, Aditi Raghunathan, Percy Liang and Tengyu Ma

  • img Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?,
    by Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi and Luke Zettlemoyer

  • img The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning,
    by Hanlin Zhang, Yi-Fan Zhang, Li Erran Li and Eric P. Xing

  • img What Makes Good In-Context Examples for GPT-3?, img img img img img
    by Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin and Weizhu Chen
    (1) 探索了在in-context learning中什么样的demonstration example可以对GPT-3的效果取得帮助;
    (2) 利用roberta对样本进行编码,并计算demonstration与test example的向量距离(欧氏距离),最终发现与test example越相近的demonstration越能取得较好的效果。

  • img Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again,
    by Bernal Jimenez Gutierrez, Nikolas McNeal, Clayton Washington, You Chen, Lang Li, Huan Sun and Yu Su

  • img Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers,
    by Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Zhifang Sui and Furu Wei
    (1) 与The Dual Form of Neural Networks Revisited结合一起看,可以进一步理解in-context learning,通过与NN线性层对偶形式的类比,可以将ICL流程描述为:1. 基于Transformer的预训练语言模型作为元优化器;2. 通过正向计算,根据示范例子产生元梯度;3. 通过关注,将元梯度应用于原始语言模型,建立一个ICL模型;
    (2)与Fine-tune类似,ICL也是在zero-shot learning参数的基础上,提供了一个更新量。

  • img The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention,
    by Kazuki Irie, R'obert Csord'as and J"urgen Schmidhuber
    (1) 很有意思的一篇,回顾神经网络(NN)线性层Y=WX(省略偏置b)的原始形式与对偶形式,两种形式完全等价;
    (2) 从对偶形式中可以发现,通过反向传播训练的NN线性层的输出主要是该层在训练期间的训练误差信号et的线性组合,其中权重是通过比较测试查询x和每个训练输入计算出来的;进一步可以得出,如果测试时输入的x和训练时的输入是正交的,那么梯度下降所得到的参数更新对于该样本x完全没有影响。

  • img Self-adaptive In-context Learning,
    by Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye and Lingpeng Kong

  • img Careful Data Curation Stabilizes In-context Learning,
    by Ting-Yun Chang and Robin Jia

  • img Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale,
    by Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff and Dan Roth

Instruction Tuning

RLHF

Pre-Training Techniques

Mixtures of Experts

Knowledge Enhanced

Knowledge Distillation

Knowledge Generation

Knowledge Editing

Reasoning

Chain of Thought

Multi-Step Reasoning

Arithmetic Reasoning

Symbolic Reasoning

Federated Learning

Distributed AI

Selective Annotation

  • img Selective Annotation Makes Language Models Better Few-Shot Learners, img img img img img img img
    by Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf et al.
    This paper proposes a graph-based selective annotation method named vote-k to
    (1) select a pool of examples to annotate from unlabeled data,
    (2) retrieve prompts (contexts) from the annotated data pool for in-context learning.
    Specifically, the selection method first selects a small set of unlabeled examples iteratively and then labels them to serve as contexts for LLMs to predict the labels of the rest unlabeled data. The method selects the predictions with highest confidence (log probability of generation output) to fill up the selective annotation pool.

  • img Selective Data Acquisition in the Wild for Model Charging,
    by Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li and Yuyu Luo

Program&Code Generation

Code Representation

Code Fixing

Code Review

Program Generation

Software Engineering

AIGC

Controllable Text Generation

Continual Learning

Prompt Engineering

Natural Language Understanding

Multimodal

Multilingual

Reliability

Robustness

Dialogue System

Recommender System

Event Extraction

Event Relation Extraction

Data Argumentation

Data Annotation

Information Extraction

Domain Adaptive

Question Answering

Application

Meta Learning

  • img Meta-learning via Language Model In-context Tuning, img img img
    by Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis and He He
    This paper proposes in-context tuning, which recasts task adaptation and prediction as a simple sequence prediction problem: to form the input sequence, concatenate the task instruction, labeled in-context examples, and the target input to predict; to meta train the model to learn from in-context examples, finetune a PLM to predict the target label given the input sequence on a collection of tasks (very similar to MetaICL). On LAMA and BinaryClfs, the proposed method outperforms MAML.

  • img MetaICL: Learning to Learn In Context, img img
    by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi
    MetaICL proposes a supervised meta-training framework to enable LMs to more effectively learn a new task in context. In MetaICL, each meta-training example includes several training examples from one task that will be presented together as a single sequence to the LM, and the prediction of the final example is used to calculate the loss.

Generalizability

Language Model as Knowledge Base

Retrieval-Augmented Language Model

Quality

Interpretability/Explainability

Data Generation

Others

Related Works

Git Repos

  • Awesome-ChatGPT,
    ChatGPT资料汇总学习,持续更新......

  • Awesome ChatGPT Prompts,
    In this repository, you will find a variety of prompts that can be used with ChatGPT.

  • ChatRWKV,
    ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model, which is the only RNN (as of now) that can match transformers in quality and scaling, while being faster and saves VRAM. Training sponsored by Stability EleutherAI.

  • ChatGPT-Hub,
    ChatGPT资源汇总

  • PaLM-rlhf-pytorch,
    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.

  • BAAI-WuDao/Data,
    “悟道”项目构建了高质量的数据集,用于支撑大模型的训练和测评工作,本仓库提供所有开源数据集的链接。

  • Colossal-AI,
    Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart distributed training and inference in a few lines.

Articles

Blogs

Demos

  • CPM-Bee,
    CPM-Bee是一个开源的双语预训练语言模型,参数量为10B,拥有十余种原生能力和强大的通用语言能力,并支持结构化输入和输出。

Reports

Lectures

Related Works

Git Repos

  • Awesome-ChatGPT,
    ChatGPT资料汇总学习,持续更新......

  • Awesome ChatGPT Prompts,
    In this repository, you will find a variety of prompts that can be used with ChatGPT.

  • ChatRWKV,
    ChatRWKV is like ChatGPT but powered by my RWKV (100% RNN) language model, which is the only RNN (as of now) that can match transformers in quality and scaling, while being faster and saves VRAM. Training sponsored by Stability EleutherAI.

  • ChatGPT-Hub,
    ChatGPT资源汇总

  • PaLM-rlhf-pytorch,
    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture.

  • BAAI-WuDao/Data,
    “悟道”项目构建了高质量的数据集,用于支撑大模型的训练和测评工作,本仓库提供所有开源数据集的链接。

  • Colossal-AI,
    Colossal-AI provides a collection of parallel components for you. We aim to support you to write your distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart distributed training and inference in a few lines.

Articles

Blogs

Demos

  • CPM-Bee,
    CPM-Bee是一个开源的双语预训练语言模型,参数量为10B,拥有十余种原生能力和强大的通用语言能力,并支持结构化输入和输出。

Reports

Lectures

img Researcher Recruitment 科研人员招聘

Knowledge Science and Engineering Lab is recruiting researchers! You are welcome to apply for the following positions:

  • Research Assistant: Bachelor degree or above, proficient in Python/Java, familiar with machine learning espicially deep learning models.
  • Postdoctoral Fellow: Doctoral research in Artificial Intelligence, published at least 3 high-quality papers.
  • Lecturer, Associate Professor and Professor

If you are interested in our research and meet the above requirements, feel free to contact Prof. Guilin Qi.

知识科学与工程实验室正在招聘科研人员!欢迎申请以下岗位:

  • 科研助理:本科学历以上,精通Python/Java,熟悉机器学习,特别是深度学习模型。
  • 博士后:博士研究人工智能相关方向,发表至少3篇高水平论文。
  • 讲师、副教授、教授等教职

如果您对我们的研究工作感兴趣并满足以上要求,欢迎您与漆桂林教授联系。

llmpapers's People

Contributors

zsy-sz avatar sid0527 avatar alphadl avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.