跳转到主要内容

目录

LLM时间表

LLM

List of LLMs

Category model Release Time Size(B) Link
Publicly
Accessbile
T5 2019/10 11 Paper
mT5 2021/03 13 Paper
PanGu-α 2021/05 13 Paper
CPM-2 2021/05 198 Paper
T0 2021/10 11 Paper
GPT-NeoX-20B 2022/02 20 Paper
CodeGen 2022/03 16 Paper
Tk-Instruct 2022/04 11 Paper
UL2 2022/02 20 Paper
OPT 2022/05 175 Paper
YaLM 2022/06 100 Github
NLLB 2022/07 55 Paper
BLOOM 2022/07 176 Paper
GLM 2022/08 130 Paper
Flan-T5 2022/10 11 Paper
mT0 2022/11 13 Paper
Galatica 2022/11 120 Paper
BLOOMZ 2022/11 176 Paper
OPT-IML 2022/12 175 Paper
Pythia 2023/01 12 Paper
LLaMA 2023/02 65 Paper
Vicuna 2023/03 13 Blog
ChatGLM 2023/03 6 Github
CodeGeeX 2023/03 13 Paper
Koala 2023/04 13 Blog
Closed
Source
GShard 2020/01 600 Paper
GPT-3 2020/05 175 Paper
LaMDA 2021/05 137 Paper
HyperCLOVA 2021/06 82 Paper
Codex 2021/07 12 Paper
ERNIE 3.0 2021/07 10 Paper
Jurassic-1 2021/08 178 Paper
FLAN 2021/10 137 Paper
MT-NLG 2021/10 530 Paper
Yuan 1.0 2021/10 245 Paper
Anthropic 2021/12 52 Paper
WebGPT 2021/12 175 Paper
Gopher 2021/12 280 Paper
ERNIE 3.0 Titan 2021/12 260 Paper
GLaM 2021/12 1200 Paper
InstructGPT 2022/01 175 Paper
AlphaCode 2022/02 41 Paper
Chinchilla 2022/03 70 Paper
PaLM 2022/04 540 Paper
Cohere 2022/06 54 Homepage
AlexaTM 2022/08 20 Paper
Luminous 2022/09 70 Docs
Sparrow 2022/09 70 Paper
WeLM 2022/09 10 Paper
U-PaLM 2022/10 540 Paper
Flan-PaLM 2022/10 540 Paper
Flan-U-PaLM 2022/10 540 Paper
Alpaca 2023/03 7 Blog
GPT-4 2023/3 - Paper
PanGU-Σ 2023/3 1085 Paper

Resources of LLMs

Publicly Available Models

  1. T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"Colin Raffel et al. JMLR 2019. [Paper] [Checkpoint]
  2. mT5: "mT5: A massively multilingual pre-trained text-to-text transformer"Linting Xue et al. NAACL 2021. [Paper] [Checkpoint]
  3. PanGu-α: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation"Wei Zeng et al. arXiv 2021. [Paper] [Checkpoint]
  4. CPM-2: "CPM-2: Large-scale Cost-effective Pre-trained Language Models"Zhengyan Zhang et al. arXiv 2021. [Paper] [Checkpoint]
  5. T0: "Multitask Prompted Training Enables Zero-Shot Task Generalization"Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  6. GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model"Sid Black et al. arXiv 2022. [Paper] [Checkpoint]
  7. CodeGen: "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis"Erik Nijkamp et al. arXiv 2022. [Paper] [Checkpoint]
  8. Tk-Instruct: "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks"Yizhong Wang et al. EMNLP 2022. [Paper] [Checkpoint]
  9. UL2: "UL2: Unifying Language Learning Paradigms"Yi Tay et al. arXiv 2022. [Paper] [Checkpoint]
  10. OPT: "OPT: Open Pre-trained Transformer Language Models"Susan Zhang et al. arXiv 2022. [Paper] [Checkpoint]
  11. NLLB: "No Language Left Behind: Scaling Human-Centered Machine Translation"NLLB Team. arXiv 2022. [Paper] [Checkpoint]
  12. BLOOM: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model"BigScience Workshop. arXiv 2022. [Paper] [Checkpoint]
  13. GLM: "GLM-130B: An Open Bilingual Pre-trained Model"Aohan Zeng et al. arXiv 2022. [Paper] [Checkpoint]
  14. Flan-T5: "Scaling Instruction-Finetuned Language Models"Hyung Won Chung et al. arXiv 2022. [Paper] [Checkpoint]
  15. mT0 && BLOOMZ: "Crosslingual Generalization through Multitask Finetuning"Niklas Muennighoff et al. arXiv 2022. [Paper] [Checkpoint]
  16. Galactica: "Galactica: A Large Language Model for Science"Ross Taylor et al. arXiv 2022. [Paper] [Checkpoint]
  17. OPT-IML: "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization"Srinivasan et al. . arXiv 2022. [Paper] [Checkpoint]
  18. CodeGeeX: "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X"Qinkai Zheng et al. . arXiv 2023. [Paper] [Checkpoint]
  19. Pythia: "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling"Stella Biderman et al. . arXiv 2023. [Paper] [Checkpoint]
  20. LLaMA: "LLaMA: Open and Efficient Foundation Language Models"Hugo Touvron et al. arXiv 2023. [Paper] [Checkpoint]

Closed-source Models

  1. GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding"Dmitry Lepikhin et al. ICLR 2021. [Paper]
  2. GPT-3: "Language Models are Few-Shot Learners"Tom B. Brown et al. NeurIPS 2020. [Paper]
  3. LaMDA: "LaMDA: Language Models for Dialog Applications"Romal Thoppilan et al. arXiv 2021. [Paper]
  4. HyperCLOVA: "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers"Boseop Kim et al. EMNLP 2021. [Paper]
  5. CodeX: "Evaluating Large Language Models Trained on Code"Mark Chen et al. arXiv 2021. [Paper]
  6. ERNIE 3.0: "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation"Yu Sun et al. arXiv 2021. [Paper]
  7. Jurassic-1: "Jurassic-1: Technical details and evaluation"Opher Lieber et al. 2021. [Paper]
  8. FLAN: "Finetuned Language Models Are Zero-Shot Learners"Jason Wei et al. ICLR 2021. [Paper]
  9. MT-NLG: "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"Shaden Smith et al. arXiv 2021. [Paper]
  10. Yuan 1.0: "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning"Shaohua Wu et al. arXiv 2021. [Paper]
  11. Anthropic: "A General Language Assistant as a Laboratory for Alignment" . Amanda Askell et al. arXiv 2021. [Paper]
  12. WebGPT: "WebGPT: Browser-assisted question-answering with human feedback" . Reiichiro Nakano et al. arXiv 2021. [Paper]
  13. Gopher: "Scaling Language Models: Methods, Analysis & Insights from Training Gopher"Jack W. Rae et al. arXiv 2021. [Paper]
  14. ERNIE 3.0 Titan: "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". *Shuohuan Wang et al. *arXiv 2021. [Paper]
  15. GLaM: "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"Nan Du et al. ICML 2022. [Paper]
  16. InstructGPT: "Training language models to follow instructions with human feedback"Long Ouyang et al. arXiv 2022. [Paper]
  17. AlphaCode: "Competition-Level Code Generation with AlphaCode"Yujia Li et al. arXiv 2022. [Paper]
  18. Chinchilla: "Training Compute-Optimal Large Language Models"Jordan Hoffmann et al. arXiv. [Paper]
  19. PaLM: "PaLM: Scaling Language Modeling with Pathways"Aakanksha Chowdhery et al. arXiv 2022. [Paper]
  20. AlexaTM: "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model"Saleh Soltan et al. arXiv 2022. [Paper]
  21. Sparrow: "Improving alignment of dialogue agents via targeted human judgements"Amelia Glaese et al. . arXiv 2022. [Paper]
  22. WeLM: "WeLM: A Well-Read Pre-trained Language Model for Chinese"Hui Su et al. . arXiv 2022. [Paper]
  23. U-PaLM: "Transcending Scaling Laws with 0.1% Extra Compute"Yi Tay et al. arXiv 2022. [Paper]
  24. Flan-PaLM && Flan-U-PaLM: "Scaling Instruction-Finetuned Language Models"Hyung Won Chung et al. arXiv. [Paper]
  25. GPT-4: "GPT-4 Technical Report"OpenAI. arXiv 2023. [Paper]
  26. PanGu-Σ: "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing"Xiaozhe Ren et al. arXiv 2023. [Paper]

Commonly Used Corpora

  1. BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books"Yukun Zhu et al. ICCV 2015. [Paper] [Source]
  2. Guntenburg: [Source]
  3. CommonCrawl: [Source]
  4. C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"Colin Raffel et al. JMLR 2019. [Paper] [Source]
  5. CC-stories-R: "A Simple Method for Commonsense Reasoning"Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
  6. CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach"Yinhan Liu et al. arXiv 2019. [Paper] [Source]
  7. REALNEWs: "Defending Against Neural Fake News"Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
  8. OpenWebText: [Source]
  9. Pushshift.io: "The Pushshift Reddit Dataset"Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
  10. Wikipedia: [Source]
  11. BigQuery: [Source]
  12. The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling"Leo Gao et al. arxiv 2021. [Paper] [Source]
  13. ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset"Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

Library Resource

  1. Transformers: "Transformers: State-of-the-Art Natural Language Processing"Thomas Wolf et al. EMNLP 2020. [Paper] [Source]
  2. DeepSpeed: "Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters"Rasley et al. KDD 2020. [Paper] [Source]
  3. Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism"Mohammad Shoeybi et al. arXiv 2019. [Paper] [Source]
  4. JAX: [Source]
  5. Colossal-AI: "Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training"Zhengda Bian et al. arXiv 2021. [Paper] [Source]
  6. BMTrain: [Source]
  7. FastMoE: "FastMoE: A Fast Mixture-of-Expert Training System"Jiaao He et al. arXiv 2021. [Paper] [Source]

Deep Learning Frameworks

  1. Pytorch: "PyTorch: An Imperative Style, High-Performance Deep Learning Library"Adam Paszke el al. NeurIPS 2019. [Paper] [Source]
  2. TensorFlow: "TensorFlow: A system for large-scale machine learning"Martín Abadi et al. OSDI 2016. [Paper] [Source]
  3. MXNet: "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems"Tianqi Chen et al. arXiv 2015. [Paper] [Source]
  4. PaddlePaddle: "PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice" . Yanjun Ma et al. Frontiers of Data and Domputing 2019. [Paper] [Source]
  5. MindSpore: "Huawei MindSpore AI Development Framework" . Huawei Technologies Co., Ltd. Artificial Intelligence Technology 2022. [Paper] [Source]
  6. OneFlow: "OneFlow: Redesign the Distributed Deep Learning Framework from Scratch" . Jinhui Yuan et al. arXiv 2021. [Paper] [Source]

Pre-training

Data Collection

  1. "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset"Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]
  2. "Deduplicating Training Data Makes Language Models Better"Katherine Lee et al. ACL 2022. [paper]
  3. "Deduplicating Training Data Mitigates Privacy Risks in Language Models"Nikhil Kandpal et al. ICML 2022. [paper]
  4. "Scaling Laws and Interpretability of Learning from Repeated Data"Danny Hernandez et al. arXiv 2022. [paper]

Architecture

Mainstream Architectures

Causal Decoder

  1. "Language Models are Few-Shot Learners"Tom B. Brown et al. NeurIPS 2020. [paper]
  2. "OPT: Open Pre-trained Transformer Language Models"Susan Zhang et al. arXiv 2022. [paper]
  3. "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model"Teven Le Scao et al. arXiv 2022. [paper]
  4. "Training Compute-Optimal Large Language Models"Jordan Hoffmann et al. arXiv 2022. [paper]
  5. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher"Jack W. Rae et al. arXiv 2021. [paper]
  6. "Galactica: A Large Language Model for Science"Ross Taylor et al. arXiv 2022. [paper]
  7. "PaLM: Scaling Language Modeling with Pathways"Aakanksha Chowdhery et al. arXiv 2022. [paper]
  8. "Jurassic-1: Technical Details and Evaluation"Opher Lieber et al. AI21 Labs. [paper]
  9. "LaMDA: Language Models for Dialog Applications"Romal Thoppilan et al. arXiv 2022. [paper]

Prefix Decoder

  1. "GLM-130B: An Open Bilingual Pre-trained Model"Aohan Zeng et al. arXiv 2022. [paper]
  2. "GLM: General Language Model Pretraining with Autoregressive Blank Infilling"Zhengxiao Du et al. ACL 2022. [paper]
  3. "Transcending Scaling Laws with 0.1% Extra Compute"Yi Tay et al. arXiv 2022. [paper]

MoE

  1. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"William Fedus et al. JMLR. [paper]
  2. "Unified Scaling Laws for Routed Language Models"Aidan Clark et al. ICML 2022. [paper]

SSM

  1. "Pretraining Without Attention"Junxiong Wang et al. arXiv 2022. [paper]
  2. "Efficiently Modeling Long Sequences with Structured State Spaces"Albert Gu et al. ICLR 2022. [paper]
  3. "Long Range Language Modeling via Gated State Spaces"Harsh Mehta et al. arXiv 2022. [paper]

Detailed Configuration

Layer Normalization

  1. "DeepNet: Scaling Transformers to 1,000 Layers"Hongyu Wang et al. arXiv 2022. [paper]
  2. "Root Mean Square Layer Normalization"Biao Zhang et al. NeurIPS 2019. [paper]

Position Encoding

  1. "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation"Ofir Press et al. ICLR 2022. [paper]
  2. "RoFormer: Enhanced Transformer with Rotary Position Embedding"Jianlin Su et al. arXiv 2021. [paper]

Analysis

  1. "What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?"Thomas Wang et al. ICML 2022. [paper]
  2. "What Language Model to Train if You Have One Million GPU Hours?"Teven Le Scao et al. Findings of EMNLP 2022. [paper]
  3. "Examining Scaling and Transfer of Language Model Architectures for Machine Translation"Biao Zhang et al. ICML 2022. [paper]
  4. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?"Yi Tay et al. arXiv 2022. [paper]
  5. "Do Transformer Modifications Transfer Across Implementations and Applications?"Sharan Narang et al. EMNLP 2021. [paper]

Training Algorithms

  1. "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism"Mohammad Shoeybi et al. arXiv 2019. [paper]
  2. "An Efficient 2D Method for Training Super-Large Deep Learning Models"Qifan Xu et al. arXiv 2021. [paper]
  3. "Tesseract: Parallelize the Tensor Parallelism Efficiently"Boxiang Wang et al. ICPP 2022. [paper]
  4. "Maximizing Parallelism in Distributed Training for Huge Neural Networks"Zhengda Bian et al. arXiv 2021. [paper]
  5. "GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism"Yanping Huang et al. NeurIPS 2019. [paper]
  6. "PipeDream: Fast and Efficient Pipeline Parallel DNN Training"Aaron Harlap et al. arXiv 2018. [paper]
  7. "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models"Samyam Rajbhandari et al. SC 2020. [paper]
  8. "ZeRO-Offload: Democratizing Billion-Scale Model Training"Jie Ren et al. USENIX 2021. [paper]

Pre-training on Code

LLMs for Program Synthesis

  1. "Evaluating Large Language Models Trained on Code"Mark Chen et al. arXiv 2021. [paper]
  2. "Program Synthesis with Large Language Models"Jacob Austin et al. arXiv 2021. [paper]
  3. "Show Your Work: Scratchpads for Intermediate Computation with Language Models"Maxwell Nye et al. arXiv 2021. [paper]
  4. "A Systematic Evaluation of Large Language Models of Code"Frank F. Xu et al. arXiv 2022. [paper]
  5. "Competition-Level Code Generation with AlphaCode"Yujia Li et al. Science. [paper]
  6. "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis"Erik Nijkamp et al. ICLR 2023. [paper]
  7. "InCoder: A Generative Model for Code Infilling and Synthesis"Daniel Fried et al. ICLR 2023. [paper]
  8. "CodeT: Code Generation with Generated Tests"Bei Chen et al. ICLR 2023. [paper]

NLP Tasks Formatted as Code

  1. "Language Models of Code are Few-Shot Commonsense Learners"Aman Madaan et al. EMNLP 2022. [paper]
  2. "Autoformalization with Large Language Models"Yuhuai Wu et al. NeurIPS 2022. [paper]

Adaptation Tuning

Instruction Tuning

  1. "Multi-Task Deep Neural Networks for Natural Language Understanding"Xiaodong Liu et al. ACL 2019. [Paper] [Homepage]
  2. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"Colin Raffel et al. JMLR 2020. [Paper] [Checkpoint]
  3. "Muppet: Massive Multi-task Representations with Pre-Finetuning"Armen Aghajanyan et al. EMNLP 2021. [Paper] [Checkpoint]
  4. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions"Swaroop Mishra et al. ACL 2022. [Paper] [Collection]
  5. "CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP"Qinyuan Ye et al. EMNLP 2021. [Paper] [Collection]
  6. "Finetuned Language Models Are Zero-Shot Learners"Jason Wei et al. ICLR 2022. [Paper] [Homepage]
  7. "Multitask Prompted Training Enables Zero-Shot Task Generalization"Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
  8. "ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning"Vamsi Aribandi et al. ICLR 2022. [Paper]
  9. "UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models"Tianbao Xie et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  10. "PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts"Stephen H. Bach et al. ACL 2022. [Paper] [Collection]
  11. "Training language models to follow instructions with human feedback"Long Ouyang et al. arXiv 2022. [Paper]
  12. "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks"Yizhong Wang et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
  13. "MVP: Multi-task Supervised Pre-training for Natural Language Generation"Tianyi Tang et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  14. "Crosslingual Generalization through Multitask Finetuning"Niklas Muennighoff et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
  15. "Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization"Yuxian Gu et al. EMNLP 2022. [Paper] [Homepage]
  16. "Scaling Instruction-Finetuned Language Models"Hyung Won Chung et al. arXiv 2022. [Paper] [Homepage]
  17. "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"Or Honovich et al. arXiv 2022. [Paper] [Homepage]
  18. "Self-Instruct: Aligning Language Model with Self Generated Instructions"Yizhong Wang et al. arXiv 2022. [Paper] [Homepage]
  19. "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization"Srinivasan Iyer et al. arXiv 2022. [Paper] [Checkpoint]
  20. "The Flan Collection: Designing Data and Methods for Effective Instruction Tuning"Shayne Longpre et al. arXiv 2023. [Paper] [Homepage]
  21. "Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning"Renze Lou et al. arXiv 2023. [Paper]

Alignment Tuning

  1. "TAMER: Training an Agent Manually via Evaluative Reinforcement"W. Bradley Knox et al. ICDL 2008. [Paper]
  2. "Interactive Learning from Policy-Dependent Human Feedback"James MacGlashan et al. ICML 2017. [Paper]
  3. "Deep Reinforcement Learning from Human Preferences"Paul Christiano et al. NIPS 2017. [Paper]
  4. "Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces"Garrett Warnell et al. AAAI 2018. [Paper]
  5. "Fine-Tuning Language Models from Human Preferences"Daniel M. Ziegler et al. arXiv 2019. [Paper]
  6. "Learning to summarize from human feedback"Nisan Stiennon et al. NeurIPS 2020. [Paper]
  7. "Alignment of Language Agents"Zachary Kenton et al. arXiv 2021. [Paper]
  8. "Recursively Summarizing Books with Human Feedback"Jeff Wu et al. arXiv 2021. [Paper]
  9. "A General Language Assistant as a Laboratory for Alignment"Amanda Askell et al. arXiv 2021. [Paper]
  10. "WebGPT: Browser-assisted question-answering with human feedback"Reiichiro Nakano et al. arXiv 2021. [Paper]
  11. "Training language models to follow instructions with human feedback"Long Ouyang et al. arXiv 2022. [Paper]
  12. "Teaching language models to support answers with verified quotes"Jacob Menick et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning"Deborah Cohen et al. arXiv 2022. [Paper]
  15. "Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned"Deep Ganguli et al. arXiv 2022. [Paper]
  16. "Improving alignment of dialogue agents via targeted human judgements"Amelia Glaese et al. arXiv 2022. [Paper]
  17. "Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization"Rajkumar Ramamurthy et al. arXiv 2022. [Paper]
  18. "Scaling Laws for Reward Model Overoptimization"Leo Gao et al. arXiv 2022. [Paper]
  19. "The Wisdom of Hindsight Makes Language Models Better Instruction Followers"Tianjun Zhang et al. arXiv 2023. [Paper]

Utilization

  1. "An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels"Taylor Sorensen et al. ACL 2022. [Paper]
  2. "What Makes Good In-Context Examples for GPT-3?"Jiachang Liu et al. ACL 2022. [Paper]
  3. "Learning to retrieve prompts for in-context learning"Ohad Rubin et al. NAACL 2022. [Paper]
  4. "Diverse demonstrations improve in-context compositional generalization"Itay Levy et al. arxiv 2022. [Paper]
  5. "Automatic Chain of Thought Prompting in Large Language Models"Zhuosheng Zhang et al. arxiv 2022. [Paper]
  6. "Demystifying Prompts in Language Models via Perplexity Estimation"Hila Gonen et al. arxiv 2022. [Paper]
  7. "Active Example Selection for In-Context Learning"Yiming Zhang et al. EMNLP 2022. [Paper]
  8. "Self-adaptive In-context Learning"Zhiyong Wu et al. arxiv 2022. [Paper]
  9. "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity"Yao Lu et al. ACL 2022. [Paper]
  10. "Structured Prompting: Scaling In-Context Learning to 1,000 Examples"Hao, Yaru et al. arxiv 2022. [Paper]
  11. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning"Ye, Xi et al. arxiv 2022. [Paper]
  12. "Cross-Task Generalization via Natural Language Crowdsourcing Instructions"Swaroop Mishra et al. ACL 2022. [Paper]
  13. "Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner"Hyunsoo Cho et al. arxiv 2022. [Paper]
  14. "Self-instruct: Aligning language model with self generated instructions"Yizhong Wang et al. arxiv 2022. [Paper]
  15. "An Explanation of In-context Learning as Implicit Bayesian Inference". Sang Michael Xie et al. ICLR 2022. [Paper]
  16. "Calibrate Before Use: Improving Few-Shot Performance of Language Models"Zihao Zhao et al. ICML 2021. [Paper]
  17. "Data distributional properties drive emergent in-context learning in transformers"Stephanie C. Y. Chan et al. arxiv 2022. [Paper]
  18. "Emergent Abilities of Large Language Models"Jason Wei et al. arxiv 2022. [Paper]
  19. "In-context Learning and Induction Heads"Catherine Olsson et al. arxiv 2022. [Paper]
  20. "Language Models are Few-Shot Learners"Tom B. Brown et al. NeurIPS 2020. [Paper]
  21. "On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model"Seongjin Shin et al. NAACL 2022. [Paper]
  22. "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?"Sewon Min et al. EMNLP 2022. [Paper]
  23. "Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale"Hritik Bansal et al. arxiv 2022. [Paper]
  24. "Transformers as algorithms: Generalization and implicit model selection in in-context learning"Yingcong Li et al. arxiv 2023. [Paper]
  25. "Transformers learn in-context by gradient descent"Johannes von Oswald et al. arxiv 2022. [Paper]
  26. "What learning algorithm is in-context learning? investigations with linear models"Ekin Aky{"{u}}rek et al. arxiv 2022. [Paper]
  27. "Chain of Thought Prompting Elicits Reasoning in Large Language Models"Jason Wei et al. arxiv 2022. [Paper]
  28. "STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning"Zelikman et al. arxiv 2022. [Paper]
  29. "Large language models are zero-shot reasoners"Takeshi Kojima et al. arxiv 2022. [Paper]
  30. "Automatic Chain of Thought Prompting in Large Language Models"Zhuosheng Zhang et al. arxiv. [Paper]
  31. "Complexity-Based Prompting for Multi-Step Reasoning"Yao Fu et al. arxiv 2022. [Paper]
  32. "Language Models are Multilingual Chain-of-Thought Reasoners"Freda Shi et al. arxiv 2022. [Paper]
  33. "Rationale-Augmented Ensembles in Language Models"Xuezhi Wang et al. arxiv 2022. [Paper]
  34. "Least-to-Most Prompting Enables Complex Reasoning in Large Language Models"Denny Zhou et al. arxiv 2022. [Paper]
  35. "Multimodal Chain-of-Thought Reasoning in Language Models"Zhuosheng Zhang et al. arxiv 2023. [Paper]
  36. "Self-Consistency Improves Chain of Thought Reasoning in Language Models"Xuezhi Wang et al. arxiv 2022. [Paper]
  37. "Large Language Models Can Self-Improve"Jiaxin Huang et al. arxiv 2022. [Paper]
  38. "Training Verifiers to Solve Math Word Problems"Karl Cobbe et al. arxiv 2021. [Paper]
  39. "On the Advance of Making Language Models Better Reasoners"Yifei Li et al. arxiv 2022. [Paper]
  40. "Large Language Models are reasoners with Self-Verification"Yixuan Weng et al. arxiv 2022. [Paper]
  41. "Teaching small language models to reason"Lucie Charlotte Magister et al. arxiv 2022. [Paper]
  42. "Large language models are reasoning teachers"Namgyu Ho et al. arxiv 2022. [Paper]
  43. "The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning"Ye, Xi et al. arxiv 2022. [Paper]
  44. "Scaling Instruction-Finetuned Language Models"Hyung Won Chung et al. arxiv 2022. [Paper]
  45. "Solving Quantitative Reasoning Problems with Language Models"Aitor Lewkowycz et al. arxiv 2022. [Paper]
  46. "Text and patterns: For effective chain of thought, it takes two to tango"Aman Madaan et al. arxiv 2022. [Paper]
  47. "Challenging BIG-Bench tasks and whether chain-of-thought can solve them"Mirac Suzgun et al. arxiv 2022. [Paper]
  48. "A Survey for In-context Learning"Qingxiu Dong et al. arxiv 2023. [Paper]
  49. "Reasoning with Language Model Prompting: A Survey"Shuofei Qiao et al. arxiv 2022. [Paper]
  50. "Towards Reasoning in Large Language Models: A Survey"Jie Huang et al. arxiv 2022. [Paper]
  51. "Reward Design with Language Models"Minae Kwon et al. arxiv 2023. [Paper]
  52. "Promptagator: Few-shot Dense Retrieval From 8 Examples"Zhuyun Dai et al. arxiv 2022. [Paper]
  53. "On the Feasibility of Specialized Ability Stealing for Large Language Code Models"Zongjie Li et al. arxiv 2023. [Paper]
  54. "MathPrompter: Mathematical Reasoning using Large Language Models"Imani, Shima et al. arxiv 2023. [Paper]
  55. "ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction"Jiabang He et al. arxiv 2023. [Paper]
  56. "Selective Annotation Makes Language Models Better Few-Shot Learners"Hongjin Su et al. arxiv 2022. [Paper]

Capacity Evaluation

  1. "Measuring Massive Multitask Language Understanding"Dan Hendrycks et al. ICLR 2021. [Paper]
  2. "Persistent Anti-Muslim Bias in Large Language Models"Abubakar Abid et al. AIES 2021. [Paper]
  3. "Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models"Alex Tamkin et al. arXiv 2021. [Paper]
  4. "BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments"Sanjana Srivastava et al. CoRL 2021. [Paper]
  5. "Program Synthesis with Large Language Models"Jacob Austin et al. arXiv 2021. [Paper]
  6. "Training Verifiers to Solve Math Word Problems"Karl Cobbe et al. arXiv 2021. [Paper]
  7. "Show Your Work: Scratchpads for Intermediate Computation with Language Models"Maxwell I. Nye et al. arXiv 2021. [Paper]
  8. "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"Wenlong Huang et al. ICML 2022. [Paper]
  9. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"Jason Wei et al. NeurIPS 2022. [Paper]
  10. "Training language models to follow instructions with human feedback"Long Ouyang et al. arXiv 2022. [Paper]
  11. "Competition-Level Code Generation with AlphaCode"Yujia Li et al. Science 2022. [Paper]
  12. "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances"Michael Ahn et al. arXiv 2022. [Paper]
  13. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"Yuntao Bai et al. arXiv 2022. [Paper]
  14. "Autoformalization with Large Language Models"Yuhuai Wu et al. NeurIPS 2022. [Paper]
  15. "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models"Aarohi Srivastava et al. arXiv 2022. [Paper]
  16. "Exploring Length Generalization in Large Language Models"Cem Anil et al. NeurIPS 2022. [Paper]
  17. "Few-shot Learning with Retrieval Augmented Language Models"Gautier Izacard et al. arXiv 2022. [Paper]
  18. "Limitations of Language Models in Arithmetic and Symbolic Induction"Jing Qian et al. arXiv 2022. [Paper]
  19. "Code as Policies: Language Model Programs for Embodied Control"Jacky Liang et al. arXiv 2022. [Paper]
  20. "ProgPrompt: Generating Situated Robot Task Plans using Large Language Models"Ishika Singh et al. arXiv 2022. [Paper]
  21. "Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans"John J. Nay et al. arXiv 2022. [Paper]
  22. "Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought"Abulhair Saparov et al. ICLR 2023. [Paper]
  23. "Language Models are Multilingual Chain-of-Thought Reasoners"Freda Shi et al. ICLR 2023. [Paper]
  24. "Re3: Generating Longer Stories With Recursive Reprompting and Revision"Kevin Yang et al. EMNLP 2022. [Paper]
  25. "Language Models of Code are Few-Shot Commonsense Learners"Aman Madaan et al. EMNLP 2022. [Paper]
  26. "Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them"Mirac Suzgun et al. arXiv 2022. [Paper]
  27. "Large Language Models Can Self-Improve"Jiaxin Huang et al. arXiv 2022. [Paper]
  28. "Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs"Albert Q. Jiang et al. ICLR 2023. [Paper]
  29. "Holistic Evaluation of Language Models"Percy Liang et al. arXiv 2022. [Paper]
  30. "PAL: Program-aided Language Models"Luyu Gao et al. arXiv 2022. [Paper]
  31. "Legal Prompt Engineering for Multilingual Legal Judgement Prediction"Dietrich Trautmann et al. arXiv 2022. [Paper]
  32. "How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment"Aidan Gilson et al. medRxiv 2022. [Paper]
  33. "ChatGPT: The End of Online Exam Integrity?"Teo Susnjak et al. arXiv 2022. [Paper]
  34. "Large Language Models are reasoners with Self-Verification"Yixuan Weng et al. arXiv 2022. [Paper]
  35. "Self-Instruct: Aligning Language Model with Self Generated Instructions"Yizhong Wang et al. arXiv 2022. [Paper]
  36. "ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports"Katharina Jeblick et al. arXiv 2022. [Paper]
  37. "The End of Programming"Matt Welsh et al. ACM 2023. [Paper]
  38. "Chatgpt goes to law school"Choi Jonathan H et al. SSRN 2023. [Paper]
  39. "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection"Biyang Guo et al. arXiv 2023. [Paper]
  40. "Is ChatGPT A Good Translator? A Preliminary Study"Wenxiang Jiao et al. arXiv 2023. [Paper]
  41. "Could an Artificial-Intelligence agent pass an introductory physics course?"Gerd Kortemeyer et al. arXiv 2023. [Paper]
  42. "Mathematical Capabilities of ChatGPT"Simon Frieder et al. arXiv 2023. [Paper]
  43. "Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models"Zhihong Shao et al. arXiv 2023. [Paper]
  44. "Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning"Thomas Carta et al. arXiv 2023. [Paper]
  45. "Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making"Arya Yao et al. medRxiv 2023. [Paper]
  46. "Theory of Mind May Have Spontaneously Emerged in Large Language Models"Michal Kosinski et al. arXiv 2023. [Paper]
  47. "A Categorical Archive of ChatGPT Failures"Ali Borji et al. arXiv 2023. [Paper]
  48. "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"Yejin Bang et al. arXiv 2023. [Paper]
  49. "Toolformer: Language Models Can Teach Themselves to Use Tools"Timo Schick et al. arXiv 2023. [Paper]
  50. "Is ChatGPT a General-Purpose Natural Language Processing Task Solver?"Chengwei Qin et al. arXiv 2023. [Paper]
  51. "How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation"Hendy Amr et al. arXiv 2023. [Paper]
  52. "Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT"Qihuang Zhong et al. arXiv 2023. [Paper]
  53. "Zero-Shot Information Extraction via Chatting with ChatGPT"Xiang Wei et al. arXiv 2023. [Paper]
  54. "ChatGPT: Jack of all trades, master of none"Jan Kocon et al. arXiv 2023. [Paper]
  55. "On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective"Jindong Wang et al. arXiv 2023. [Paper]
  56. "Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback"Baolin Peng et al. arXiv 2023. [Paper]
  57. "An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)"Paulo Shakarian et al. arXiv 2023. [Paper]
  58. "How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks"Chen Xuanting et al. arXiv 2023. [Paper]
  59. "The utility of ChatGPT for cancer treatment information"Shen Chen et al. medRxiv 2023. [Paper]
  60. "Can ChatGPT Assess Human Personalities? A General Evaluation Framework"Haocong Rao et al. arXiv 2023. [Paper]
  61. "Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT."Mostafa M. Amin et al. arXiv 2023. [Paper]
  62. "Exploring the Feasibility of ChatGPT for Event Extraction."Jun Gao et al. arXiv 2023. [Paper]
  63. "Does Synthetic Data Generation of LLMs Help Clinical Text Mining?"Tang Ruixiang et al. arXiv 2023. [Paper]
  64. "Consistency Analysis of ChatGPT"Myeongjun Jang et al. arXiv 2023. [Paper]
  65. "Self-planning Code Generation with Large Language Model"Shun Zhang et al. ICLR 2023. [Paper]
  66. "Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions"Yiming Tan et al. arXiv 2023. [Paper]
  67. "GPT-4 Technical Report"OpenAI et al. OpenAI 2023. [Paper]
  68. "A Short Survey of Viewing Large Language Models in Legal Aspect"Zhongxiang Sun et al. arXiv 2023. [Paper]
  69. "ChatGPT Participates in a Computer Science Exam"Sebastian Bordt et al. arXiv 2023. [Paper]
  70. "A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models"Junjie Ye et al. arXiv 2023. [Paper]
  71. "On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree?"Kamil Malinka et al. arXiv 2023. [Paper]
  72. "Sparks of Artificial General Intelligence: Early experiments with GPT-4"S'ebastien Bubeck et al. arXiv 2023. [Paper]
  73. "Is ChatGPT A Good Keyphrase Generator? A Preliminary Study"Mingyang Song et al. arXiv 2023. [Paper]
  74. "Capabilities of GPT-4 on Medical Challenge Problems"Harsha Nori et al. arXiv 2023. [Paper]
  75. "Can we trust the evaluation on ChatGPT?"Rachith Aiyappa et al. arXiv 2023. [Paper]
  76. "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks"Fabrizio Gilardi et al. arXiv 2023. [Paper]
  77. "Evaluation of ChatGPT for NLP-based Mental Health Applications"Bishal Lamichhane et al. arXiv 2023. [Paper]
  78. "ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models"Bian Ning et al. arXiv 2023. [Paper]
  79. "Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams"Desnes Nunes et al. arXiv 2023. [Paper]
  80. "Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure"Philipp Koralus et al. arXiv 2023. [Paper]
  81. "Yes but.. Can ChatGPT Identify Entities in Historical Documents?"Carlos-Emiliano González-Gallardo et al. arXiv 2023. [Paper]
  82. "Uncovering ChatGPT's Capabilities in Recommender Systems"Sunhao Dai et al. arXiv 2023. [Paper]
文章链接