spacetrip-1004 / agi-papers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gyunggyung/agi-papers

0.0 0.0 0.0 36.2 MB

Papers and Book to look at when starting AGI 📚

License: MIT License

agi-papers's Introduction

🌟 AGI-Papers 🌟

LLM · NLP
Text2All · All2All
Multi-modal · Multi-task

_{Let's find out the latest and various LLM-related papers. 🙇‍♂️🙇‍♀️ by Stargazers}

AGI

÷🦎 Chameleon: Plug-and-Play Compositional Reasoning with GPT-4

Large language models (LLMs) have achieved remarkable progress in various natural language processing tasks with emergent abilities. However, they face inherent limitations, such as an inability to access up-to-date information, utilize external tools, or perform precise mathematical reasoning. In this paper, we introduce Chameleon, a plug-and-play compositional reasoning framework that augments LLMs to help address these challenges. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, Chameleon infers the appropriate sequence of tools to compose and execute in order to generate a final response. We showcase the adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP. Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.

Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Reflexion: an autonomous agent with dynamic memory and self-reflection

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

Self-Refine: Iterative Refinement with Self-Feedback

Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace

Solving complicated AI tasks with different domains and modalities is a key step toward advanced artificial intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards advanced artificial intelligence.

Auto-GPT: An Autonomous GPT-4 Experiment

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

LiOnConnect

사내의 제품 정보, 물류 정보, 인사규정, 회계기준과 같은 정보는 사내에 유지되어야 하며, 해당 사항에 대한 질의와 답변에 대해서도 비밀이 유지되어야 합니다. 기존 외부 클라우드에서 제공되는 언어모델의 경우 사내 정보가 유출될 가능성을 통제할 수 있는 기술적인 방법이 없으므로, 언어모델을 사내에 설치하여 사용하는 방법이 유일합니다. LiOn은 사내에 설치하여 사용할 수 있는 경량화된 초거대 언어모델로서 사내의 정보를 안전하게 유지하면서 구성원들이 안전하게 사용할 수 있는 대안을 제공할 수 있습니다. 아래는 그중 하나의 예시이며사내에서의 직원들과의 불화에 대한 상담에 있어 LiOn이 상담하는 사례를 보실 수 있습니다. 이 외에도 LiOn은 사내에서 일어날 수 있는 수많은 상황에서 다양한 해결방법을 제공함으로서 24/7 구성원들의 업무를 돕는 것이 가능합니다.

Read 2023

Paper to read

before 2023

[2013/01] Efficient Estimation of Word Representations in Vector Space
[2014/12] Dependency-Based Word Embeddings
[2015/07] Neural Machine Translation of Rare Words with Subword Units
[2014/07] GloVe: Global Vectors for Word Representation : GloVe
[2016/06] Siamese CBOW: Optimizing Word Embeddings for Sentence Representations : Siamese CBOW
[2016/07] Enriching Word Vectors with Subword Information : fastText
[2014/09] Sequence to Sequence Learningwith Neural Networks : seq2seq
[2017/07] Attention Is All You Need : Transformer
[2017/08] Learned in Translation: Contextualized Word Vectors : CoVe
[2018/01] Universal Language Model Fine-tuning for Text Classification : ULMFIT
[2018/02] Deep contextualized word representations : ELMo
[2018/06] Improving Language Understanding by Generative Pre-Training : GPT-1
[2018/10] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : BERT
[2019/02] Language Models are Unsupervised Multitask Learners : GPT-2
[2019/04] Language Models with Transformers
[2019/08] Neural Text Generation with Unlikelihood Training
[2019/01] Cross-lingual Language Model Pretraining XLM
[2019/01] Multi-Task Deep Neural Networks for Natural Language Understanding : MT-DNN
[2019/01] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context : Transformer-XL
[2019/06] XLNet: Generalized Autoregressive Pretraining for Language Understanding : XLNet
[2019/04] The Curious Case of Neural Text Degeneration
[2019/09] Fine-Tuning Language Models from Human Preferences
[2019/01] BioBERT: a pre-trained biomedical language representation model for biomedical text mining : BioBERT
[2019/03] SciBERT: A Pretrained Language Model for Scientific Text : SciBERT
[2019/04] ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission : ClinicalBERT
[2019/06] HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization : HIBERT
[2019/07] SpanBERT: Improving Pre-training by Representing and Predicting Spans : SpanBERT
[2019/04] Publicly Available Clinical BERT Embeddings
[2019/08] Pre-Training with Whole Word Masking for Chinese BERT
[2019/07] Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
[2019/07] R-Transformer: Recurrent Neural Network Enhanced Transformer : R-Transformer
[2019/09] FREELB: ENHANCED ADVERSARIAL TRAINING FOR LANGUAGE UNDERSTANDING : FREELB
[2019/09] Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks
[2019/10] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer : T5
[2018/07] Subword-level Word Vector Representations for Korean
[2019/08] Zero-shot Word Sense Disambiguation using Sense Definition Embeddings
[2019/06] Bridging the Gap between Training and Inference for Neural Machine Translation
[2019/06] Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
[2019/07] A Simple Theoretical Model of Importance for Summarization
[2019/05] Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
[2019/07] We need to talk about standard splits
[2019/07] ERNIE 2.0: A Continual Pre-training Framework for Language Understanding : ERNIE 2.0
[2019/05] SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems : SuperGLUE
[2020/01] Towards a Human-like Open-Domain Chatbot + Google AI Blog
[2020/03] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators : ELECTRA
[2019/04] Mask-Predict: Parallel Decoding of Conditional Masked Language Models : Mask-Predict
[2020/01] Reformer: The Efficient Transformer : Reformer
[2020/04] Longformer: The Long-Document Transformer : Longformer
[2019/11] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation : DialoGPT
[2020/01] Towards a Human-like Open-Domain Chatbot
[2020/04] You Impress Me: Dialogue Generation via Mutual Persona Perception
[2020/04] Recipes for building an open-domain chatbot
[2020/04] ToD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogues : ToD-BERT
[2020/04] SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model : SOLOIST
[2020/05] A Simple Language Model for Task-Oriented Dialogue
[2019/07] ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation : ReCoSa
[2020/04] FastBERT: a Self-distilling BERT with Adaptive Inference Time : FastBERT
[2020/01] PoWER-BERT: Accelerating BERT inference for Classification Tasks : PoWER-BERT
[2019/10] DistillBERT, a distilled version of BERT: smaller, faster, cheaper and lighter : DistillBERT
[2019/10] TinyBERT: Distilling BERT for Natural Language Understanding : TinyBERT
[2019/11] Not Enough Data? Deep Learning to the Rescue!
[2018/12] Conditional BERT Contextual Augmentation
[2020/03] Data Augmentation using Pre-trained Transformer Models
[2020/04] FLAT: Chinese NER Using Flat-Lattice Transformer : FLAT
[2019/12] Big Transfer (BiT): General Visual Representation Learning : BiT
[2019/04] ERNIE: Enhanced Representation through Knowledge Integration : ERNIE
[2019/07] ERNIE 2.0: A Continual Pre-training Framework for Language Understanding : ERNIE 2.0
[2020/06] ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph : ERNIE-ViL
[2020/12] ERNIE-Doc: A Retrospective Long-Document Modeling Transformer : ERNIE-Doc
[2021/07] ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation : ERNIE 3.0
[2022/10] Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
[2017/03] Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
[2020/10] DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling : DiPair
[2021/08] Distilling Transformers for Neural Cross-Domain Search
[2020/06] DeBERTa: Decoding-enhanced BERT with Disentangled Attention : DeBERTa
[2020/11] VEGA: Towards an End-to-End Configurable AutoML Pipeline : VEGA
[2020/12] FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding : FILTER
[2019/12] StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding : StructBERT
[2019/04] Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding : MT-DNN
[2021/05] Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

MLLMArxivTalk

최신 MLLM 관련 스터디. 기본 오후에 진행. 논문, 강의, 코드, 뉴스, 블로그 등 다양한 자료로 학습.

MLLM, LLM, NLG, Dialogue, Reinforcement learning, Distillation, Efficient, Sentence similarity, multiple tasks, multimodal, Stable diffusion, TTS, Text-To-Video, All-To-All, 우주, 생명, 지능, 윤리, 규제, 법, 노화, 의학, 투자, 개발, 인프라, 디자인, 경영, ETC...

유망 스타트업 C레벨, 국내외 탑티어 연구자, 국내외 탑티어 대학, 대학원 재학생과 졸업생, 석학, 교수 등 A급 인재들이 최신 논문, 강의 등 스터디 및 프로젝트 진행.

기본 매주 수요일 오후 7시반. 사전 학습 없이 논문 읽기 최대 20분, 토론 최대 40분. 한 번에 1 ~ 10개 논문, 강의 등 진행. 지금까지는 항상 3개. 주제 논문 선정은 자유. 탑티어 학회 논문 및 프로젝트 제작 예정.

주말을 포함하여, 거의 매일 추가 스터디 존재. 흥미로운 주제거나 참여 되는 날만 중간에 들어와서 중간에 나가도 무관. 모든 규칙은 협의 가능. 오프라인 모임도 예정. 자율 참여.

스터디 규칙

영어만 사용은 금지. 한국어 중심 사용. 특수 용어는 영어 사용.
1주일에 논문 2개 이상 스터디. 되는 사람은 10개 이상.
3분에서 20분 현장에서 논문 읽기. 5분에서 30분 토론.
1시간 스터디 시, 바로 나가도 됨. 원할 때 10분 이하 참여도 무관. 자유롭게 진행. 2시간 매일도 가능.
각자 더 뛰어난 게 있다는 것을 인지. 다들 대단한 분들이니 질문 많이 하고, 정보 공유 자주.
본인이 하기로 한 일만은 수행. 한다고 말하고, 안 하는 것은 민폐다.
기본적으로 녹화 후 내부 공유.
정보를 혼자 알게 쓰지 말고, 다 같이 알게 말하기.
개인 사정으로 스터디 탈퇴 시, 자기소개에 인사 작성.
여러 기관 좋은 규칙 붙여넣기.
팀에 도움이 된다고 판단하면, 위 규칙을 모두 무시하고 행동.
추가.

Basic knowledge

mathematics	machine learning	Transformer	Hugging Face

mathematics for machine learning	Pattern Recognition and Machine Learning	Getting Started with Google BERT	Natural Language Processing with Transformers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.