【LLM】大型语言模型综述 | 开发者人工智能社区 --开发者开聊

pgmr.cloud

19 May 2023

SEO Title

LLM时间表

LLM

List of LLMs

Category	model	Release Time	Size(B)	Link
Category	model	Release Time	Size(B)	Link
Publicly Accessbile	T5	2019/10	11	Paper
	mT5	2021/03	13	Paper
	PanGu-α	2021/05	13	Paper
	CPM-2	2021/05	198	Paper
	T0	2021/10	11	Paper
	GPT-NeoX-20B	2022/02	20	Paper
	CodeGen	2022/03	16	Paper
	Tk-Instruct	2022/04	11	Paper
	UL2	2022/02	20	Paper
	OPT	2022/05	175	Paper
	YaLM	2022/06	100	Github
	NLLB	2022/07	55	Paper
	BLOOM	2022/07	176	Paper
	GLM	2022/08	130	Paper
	Flan-T5	2022/10	11	Paper
	mT0	2022/11	13	Paper
	Galatica	2022/11	120	Paper
	BLOOMZ	2022/11	176	Paper
	OPT-IML	2022/12	175	Paper
	Pythia	2023/01	12	Paper
	LLaMA	2023/02	65	Paper
	Vicuna	2023/03	13	Blog
	ChatGLM	2023/03	6	Github
	CodeGeeX	2023/03	13	Paper
	Koala	2023/04	13	Blog
Closed Source	GShard	2020/01	600	Paper
	GPT-3	2020/05	175	Paper
	LaMDA	2021/05	137	Paper
	HyperCLOVA	2021/06	82	Paper
	Codex	2021/07	12	Paper
	ERNIE 3.0	2021/07	10	Paper
	Jurassic-1	2021/08	178	Paper
	FLAN	2021/10	137	Paper
	MT-NLG	2021/10	530	Paper
	Yuan 1.0	2021/10	245	Paper
	Anthropic	2021/12	52	Paper
	WebGPT	2021/12	175	Paper
	Gopher	2021/12	280	Paper
	ERNIE 3.0 Titan	2021/12	260	Paper
	GLaM	2021/12	1200	Paper
	InstructGPT	2022/01	175	Paper
	AlphaCode	2022/02	41	Paper
	Chinchilla	2022/03	70	Paper
	PaLM	2022/04	540	Paper
	Cohere	2022/06	54	Homepage
	AlexaTM	2022/08	20	Paper
	Luminous	2022/09	70	Docs
	Sparrow	2022/09	70	Paper
	WeLM	2022/09	10	Paper
	U-PaLM	2022/10	540	Paper
	Flan-PaLM	2022/10	540	Paper
	Flan-U-PaLM	2022/10	540	Paper
	Alpaca	2023/03	7	Blog
	GPT-4	2023/3	-	Paper
	PanGU-Σ	2023/3	1085	Paper

Resources of LLMs

Publicly Available Models

T5: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Checkpoint]
mT5: "mT5: A massively multilingual pre-trained text-to-text transformer". Linting Xue et al. NAACL 2021. [Paper] [Checkpoint]
PanGu-α: "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". Wei Zeng et al. arXiv 2021. [Paper] [Checkpoint]
CPM-2: "CPM-2: Large-scale Cost-effective Pre-trained Language Models". Zhengyan Zhang et al. arXiv 2021. [Paper] [Checkpoint]
T0: "Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
GPT-NeoX-20B: "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". Sid Black et al. arXiv 2022. [Paper] [Checkpoint]
CodeGen: "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. arXiv 2022. [Paper] [Checkpoint]
Tk-Instruct: "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Checkpoint]
UL2: "UL2: Unifying Language Learning Paradigms". Yi Tay et al. arXiv 2022. [Paper] [Checkpoint]
OPT: "OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [Paper] [Checkpoint]
NLLB: "No Language Left Behind: Scaling Human-Centered Machine Translation". NLLB Team. arXiv 2022. [Paper] [Checkpoint]
BLOOM: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". BigScience Workshop. arXiv 2022. [Paper] [Checkpoint]
GLM: "GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [Paper] [Checkpoint]
Flan-T5: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Checkpoint]
mT0 && BLOOMZ: "Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Checkpoint]
Galactica: "Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [Paper] [Checkpoint]
OPT-IML: "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan et al. . arXiv 2022. [Paper] [Checkpoint]
CodeGeeX: "CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X". Qinkai Zheng et al. . arXiv 2023. [Paper] [Checkpoint]
Pythia: "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling". Stella Biderman et al. . arXiv 2023. [Paper] [Checkpoint]
LLaMA: "LLaMA: Open and Efficient Foundation Language Models". Hugo Touvron et al. arXiv 2023. [Paper] [Checkpoint]

Closed-source Models

GShard: "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding". Dmitry Lepikhin et al. ICLR 2021. [Paper]
GPT-3: "Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
LaMDA: "LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2021. [Paper]
HyperCLOVA: "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". Boseop Kim et al. EMNLP 2021. [Paper]
CodeX: "Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [Paper]
ERNIE 3.0: "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". Yu Sun et al. arXiv 2021. [Paper]
Jurassic-1: "Jurassic-1: Technical details and evaluation". Opher Lieber et al. 2021. [Paper]
FLAN: "Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2021. [Paper]
MT-NLG: "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". Shaden Smith et al. arXiv 2021. [Paper]
Yuan 1.0: "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". Shaohua Wu et al. arXiv 2021. [Paper]
Anthropic: "A General Language Assistant as a Laboratory for Alignment" . Amanda Askell et al. arXiv 2021. [Paper]
WebGPT: "WebGPT: Browser-assisted question-answering with human feedback" . Reiichiro Nakano et al. arXiv 2021. [Paper]
Gopher: "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [Paper]
ERNIE 3.0 Titan: "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". *Shuohuan Wang et al. *arXiv 2021. [Paper]
GLaM: "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". Nan Du et al. ICML 2022. [Paper]
InstructGPT: "Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
AlphaCode: "Competition-Level Code Generation with AlphaCode". Yujia Li et al. arXiv 2022. [Paper]
Chinchilla: "Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv. [Paper]
PaLM: "PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [Paper]
AlexaTM: "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". Saleh Soltan et al. arXiv 2022. [Paper]
Sparrow: "Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. . arXiv 2022. [Paper]
WeLM: "WeLM: A Well-Read Pre-trained Language Model for Chinese". Hui Su et al. . arXiv 2022. [Paper]
U-PaLM: "Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [Paper]
Flan-PaLM && Flan-U-PaLM: "Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv. [Paper]
GPT-4: "GPT-4 Technical Report". OpenAI. arXiv 2023. [Paper]
PanGu-Σ: "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". Xiaozhe Ren et al. arXiv 2023. [Paper]

Commonly Used Corpora

BookCorpus: "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books". Yukun Zhu et al. ICCV 2015. [Paper] [Source]
Guntenburg: [Source]
CommonCrawl: [Source]
C4: "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2019. [Paper] [Source]
CC-stories-R: "A Simple Method for Commonsense Reasoning". Trieu H. Trinh el al. arXiv 2018. [Paper] [Source]
CC-NEWS: "RoBERTa: A Robustly Optimized BERT Pretraining Approach". Yinhan Liu et al. arXiv 2019. [Paper] [Source]
REALNEWs: "Defending Against Neural Fake News". Rowan Zellers et al. NeurIPS 2019. [Paper] [Source]
OpenWebText: [Source]
Pushshift.io: "The Pushshift Reddit Dataset". Jason Baumgartner et al. AAAI 2020. [Paper] [Source]
Wikipedia: [Source]
BigQuery: [Source]
The Pile: "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". Leo Gao et al. arxiv 2021. [Paper] [Source]
ROOTS: "The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]

Library Resource

Transformers: "Transformers: State-of-the-Art Natural Language Processing". Thomas Wolf et al. EMNLP 2020. [Paper] [Source]
DeepSpeed: "Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters". Rasley et al. KDD 2020. [Paper] [Source]
Megatron-LM: "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [Paper] [Source]
JAX: [Source]
Colossal-AI: "Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training". Zhengda Bian et al. arXiv 2021. [Paper] [Source]
BMTrain: [Source]
FastMoE: "FastMoE: A Fast Mixture-of-Expert Training System". Jiaao He et al. arXiv 2021. [Paper] [Source]

Deep Learning Frameworks

Pytorch: "PyTorch: An Imperative Style, High-Performance Deep Learning Library". Adam Paszke el al. NeurIPS 2019. [Paper] [Source]
TensorFlow: "TensorFlow: A system for large-scale machine learning". Martín Abadi et al. OSDI 2016. [Paper] [Source]
MXNet: "MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems". Tianqi Chen et al. arXiv 2015. [Paper] [Source]
PaddlePaddle: "PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice" . Yanjun Ma et al. Frontiers of Data and Domputing 2019. [Paper] [Source]
MindSpore: "Huawei MindSpore AI Development Framework" . Huawei Technologies Co., Ltd. Artificial Intelligence Technology 2022. [Paper] [Source]
OneFlow: "OneFlow: Redesign the Distributed Deep Learning Framework from Scratch" . Jinhui Yuan et al. arXiv 2021. [Paper] [Source]

Pre-training

Data Collection

"The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset". Laurençon et al. NeurIPS 2022 Datasets and Benchmarks Track. [paper]
"Deduplicating Training Data Makes Language Models Better". Katherine Lee et al. ACL 2022. [paper]
"Deduplicating Training Data Mitigates Privacy Risks in Language Models". Nikhil Kandpal et al. ICML 2022. [paper]
"Scaling Laws and Interpretability of Learning from Repeated Data". Danny Hernandez et al. arXiv 2022. [paper]

Architecture

Mainstream Architectures

Causal Decoder

"Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [paper]
"OPT: Open Pre-trained Transformer Language Models". Susan Zhang et al. arXiv 2022. [paper]
"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". Teven Le Scao et al. arXiv 2022. [paper]
"Training Compute-Optimal Large Language Models". Jordan Hoffmann et al. arXiv 2022. [paper]
"Scaling Language Models: Methods, Analysis & Insights from Training Gopher". Jack W. Rae et al. arXiv 2021. [paper]
"Galactica: A Large Language Model for Science". Ross Taylor et al. arXiv 2022. [paper]
"PaLM: Scaling Language Modeling with Pathways". Aakanksha Chowdhery et al. arXiv 2022. [paper]
"Jurassic-1: Technical Details and Evaluation". Opher Lieber et al. AI21 Labs. [paper]
"LaMDA: Language Models for Dialog Applications". Romal Thoppilan et al. arXiv 2022. [paper]

Prefix Decoder

"GLM-130B: An Open Bilingual Pre-trained Model". Aohan Zeng et al. arXiv 2022. [paper]
"GLM: General Language Model Pretraining with Autoregressive Blank Infilling". Zhengxiao Du et al. ACL 2022. [paper]
"Transcending Scaling Laws with 0.1% Extra Compute". Yi Tay et al. arXiv 2022. [paper]

MoE

"Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". William Fedus et al. JMLR. [paper]
"Unified Scaling Laws for Routed Language Models". Aidan Clark et al. ICML 2022. [paper]

SSM

"Pretraining Without Attention". Junxiong Wang et al. arXiv 2022. [paper]
"Efficiently Modeling Long Sequences with Structured State Spaces". Albert Gu et al. ICLR 2022. [paper]
"Long Range Language Modeling via Gated State Spaces". Harsh Mehta et al. arXiv 2022. [paper]

Detailed Configuration

Layer Normalization

"DeepNet: Scaling Transformers to 1,000 Layers". Hongyu Wang et al. arXiv 2022. [paper]
"Root Mean Square Layer Normalization". Biao Zhang et al. NeurIPS 2019. [paper]

Position Encoding

"Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". Ofir Press et al. ICLR 2022. [paper]
"RoFormer: Enhanced Transformer with Rotary Position Embedding". Jianlin Su et al. arXiv 2021. [paper]

Analysis

"What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?". Thomas Wang et al. ICML 2022. [paper]
"What Language Model to Train if You Have One Million GPU Hours?". Teven Le Scao et al. Findings of EMNLP 2022. [paper]
"Examining Scaling and Transfer of Language Model Architectures for Machine Translation". Biao Zhang et al. ICML 2022. [paper]
"Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?". Yi Tay et al. arXiv 2022. [paper]
"Do Transformer Modifications Transfer Across Implementations and Applications?". Sharan Narang et al. EMNLP 2021. [paper]

Training Algorithms

"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". Mohammad Shoeybi et al. arXiv 2019. [paper]
"An Efficient 2D Method for Training Super-Large Deep Learning Models". Qifan Xu et al. arXiv 2021. [paper]
"Tesseract: Parallelize the Tensor Parallelism Efficiently". Boxiang Wang et al. ICPP 2022. [paper]
"Maximizing Parallelism in Distributed Training for Huge Neural Networks". Zhengda Bian et al. arXiv 2021. [paper]
"GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism". Yanping Huang et al. NeurIPS 2019. [paper]
"PipeDream: Fast and Efficient Pipeline Parallel DNN Training". Aaron Harlap et al. arXiv 2018. [paper]
"ZeRO: Memory Optimizations Toward Training Trillion Parameter Models". Samyam Rajbhandari et al. SC 2020. [paper]
"ZeRO-Offload: Democratizing Billion-Scale Model Training". Jie Ren et al. USENIX 2021. [paper]

Pre-training on Code

LLMs for Program Synthesis

"Evaluating Large Language Models Trained on Code". Mark Chen et al. arXiv 2021. [paper]
"Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [paper]
"Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell Nye et al. arXiv 2021. [paper]
"A Systematic Evaluation of Large Language Models of Code". Frank F. Xu et al. arXiv 2022. [paper]
"Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science. [paper]
"CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". Erik Nijkamp et al. ICLR 2023. [paper]
"InCoder: A Generative Model for Code Infilling and Synthesis". Daniel Fried et al. ICLR 2023. [paper]
"CodeT: Code Generation with Generated Tests". Bei Chen et al. ICLR 2023. [paper]

NLP Tasks Formatted as Code

"Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [paper]
"Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [paper]

Adaptation Tuning

Instruction Tuning

"Multi-Task Deep Neural Networks for Natural Language Understanding". Xiaodong Liu et al. ACL 2019. [Paper] [Homepage]
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Colin Raffel et al. JMLR 2020. [Paper] [Checkpoint]
"Muppet: Massive Multi-task Representations with Pre-Finetuning". Armen Aghajanyan et al. EMNLP 2021. [Paper] [Checkpoint]
"Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper] [Collection]
"CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP". Qinyuan Ye et al. EMNLP 2021. [Paper] [Collection]
"Finetuned Language Models Are Zero-Shot Learners". Jason Wei et al. ICLR 2022. [Paper] [Homepage]
"Multitask Prompted Training Enables Zero-Shot Task Generalization". Victor Sanh et al. ICLR 2022. [Paper] [Checkpoint]
"ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning". Vamsi Aribandi et al. ICLR 2022. [Paper]
"UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models". Tianbao Xie et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
"PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts". Stephen H. Bach et al. ACL 2022. [Paper] [Collection]
"Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
"Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". Yizhong Wang et al. EMNLP 2022. [Paper] [Collection] [Checkpoint]
"MVP: Multi-task Supervised Pre-training for Natural Language Generation". Tianyi Tang et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
"Crosslingual Generalization through Multitask Finetuning". Niklas Muennighoff et al. arXiv 2022. [Paper] [Collection] [Checkpoint]
"Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization". Yuxian Gu et al. EMNLP 2022. [Paper] [Homepage]
"Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arXiv 2022. [Paper] [Homepage]
"Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor". Or Honovich et al. arXiv 2022. [Paper] [Homepage]
"Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper] [Homepage]
"OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". Srinivasan Iyer et al. arXiv 2022. [Paper] [Checkpoint]
"The Flan Collection: Designing Data and Methods for Effective Instruction Tuning". Shayne Longpre et al. arXiv 2023. [Paper] [Homepage]
"Is Prompt All You Need No. A Comprehensive and Broader View of Instruction Learning". Renze Lou et al. arXiv 2023. [Paper]

Alignment Tuning

"TAMER: Training an Agent Manually via Evaluative Reinforcement". W. Bradley Knox et al. ICDL 2008. [Paper]
"Interactive Learning from Policy-Dependent Human Feedback". James MacGlashan et al. ICML 2017. [Paper]
"Deep Reinforcement Learning from Human Preferences". Paul Christiano et al. NIPS 2017. [Paper]
"Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces". Garrett Warnell et al. AAAI 2018. [Paper]
"Fine-Tuning Language Models from Human Preferences". Daniel M. Ziegler et al. arXiv 2019. [Paper]
"Learning to summarize from human feedback". Nisan Stiennon et al. NeurIPS 2020. [Paper]
"Alignment of Language Agents". Zachary Kenton et al. arXiv 2021. [Paper]
"Recursively Summarizing Books with Human Feedback". Jeff Wu et al. arXiv 2021. [Paper]
"A General Language Assistant as a Laboratory for Alignment". Amanda Askell et al. arXiv 2021. [Paper]
"WebGPT: Browser-assisted question-answering with human feedback". Reiichiro Nakano et al. arXiv 2021. [Paper]
"Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
"Teaching language models to support answers with verified quotes". Jacob Menick et al. arXiv 2022. [Paper]
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
"Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning". Deborah Cohen et al. arXiv 2022. [Paper]
"Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned". Deep Ganguli et al. arXiv 2022. [Paper]
"Improving alignment of dialogue agents via targeted human judgements". Amelia Glaese et al. arXiv 2022. [Paper]
"Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization". Rajkumar Ramamurthy et al. arXiv 2022. [Paper]
"Scaling Laws for Reward Model Overoptimization". Leo Gao et al. arXiv 2022. [Paper]
"The Wisdom of Hindsight Makes Language Models Better Instruction Followers". Tianjun Zhang et al. arXiv 2023. [Paper]

Utilization

"An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels". Taylor Sorensen et al. ACL 2022. [Paper]
"What Makes Good In-Context Examples for GPT-3?". Jiachang Liu et al. ACL 2022. [Paper]
"Learning to retrieve prompts for in-context learning". Ohad Rubin et al. NAACL 2022. [Paper]
"Diverse demonstrations improve in-context compositional generalization". Itay Levy et al. arxiv 2022. [Paper]
"Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arxiv 2022. [Paper]
"Demystifying Prompts in Language Models via Perplexity Estimation". Hila Gonen et al. arxiv 2022. [Paper]
"Active Example Selection for In-Context Learning". Yiming Zhang et al. EMNLP 2022. [Paper]
"Self-adaptive In-context Learning". Zhiyong Wu et al. arxiv 2022. [Paper]
"Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Yao Lu et al. ACL 2022. [Paper]
"Structured Prompting: Scaling In-Context Learning to 1,000 Examples". Hao, Yaru et al. arxiv 2022. [Paper]
"The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arxiv 2022. [Paper]
"Cross-Task Generalization via Natural Language Crowdsourcing Instructions". Swaroop Mishra et al. ACL 2022. [Paper]
"Prompt-Augmented Linear Probing: Scaling Beyond the Limit of Few-shot In-Context Learner". Hyunsoo Cho et al. arxiv 2022. [Paper]
"Self-instruct: Aligning language model with self generated instructions". Yizhong Wang et al. arxiv 2022. [Paper]
"An Explanation of In-context Learning as Implicit Bayesian Inference". Sang Michael Xie et al. ICLR 2022. [Paper]
"Calibrate Before Use: Improving Few-Shot Performance of Language Models". Zihao Zhao et al. ICML 2021. [Paper]
"Data distributional properties drive emergent in-context learning in transformers". Stephanie C. Y. Chan et al. arxiv 2022. [Paper]
"Emergent Abilities of Large Language Models". Jason Wei et al. arxiv 2022. [Paper]
"In-context Learning and Induction Heads". Catherine Olsson et al. arxiv 2022. [Paper]
"Language Models are Few-Shot Learners". Tom B. Brown et al. NeurIPS 2020. [Paper]
"On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model". Seongjin Shin et al. NAACL 2022. [Paper]
"Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?". Sewon Min et al. EMNLP 2022. [Paper]
"Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale". Hritik Bansal et al. arxiv 2022. [Paper]
"Transformers as algorithms: Generalization and implicit model selection in in-context learning". Yingcong Li et al. arxiv 2023. [Paper]
"Transformers learn in-context by gradient descent". Johannes von Oswald et al. arxiv 2022. [Paper]
"What learning algorithm is in-context learning? investigations with linear models". Ekin Aky{"{u}}rek et al. arxiv 2022. [Paper]
"Chain of Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. arxiv 2022. [Paper]
"STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning". Zelikman et al. arxiv 2022. [Paper]
"Large language models are zero-shot reasoners". Takeshi Kojima et al. arxiv 2022. [Paper]
"Automatic Chain of Thought Prompting in Large Language Models". Zhuosheng Zhang et al. arxiv. [Paper]
"Complexity-Based Prompting for Multi-Step Reasoning". Yao Fu et al. arxiv 2022. [Paper]
"Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. arxiv 2022. [Paper]
"Rationale-Augmented Ensembles in Language Models". Xuezhi Wang et al. arxiv 2022. [Paper]
"Least-to-Most Prompting Enables Complex Reasoning in Large Language Models". Denny Zhou et al. arxiv 2022. [Paper]
"Multimodal Chain-of-Thought Reasoning in Language Models". Zhuosheng Zhang et al. arxiv 2023. [Paper]
"Self-Consistency Improves Chain of Thought Reasoning in Language Models". Xuezhi Wang et al. arxiv 2022. [Paper]
"Large Language Models Can Self-Improve". Jiaxin Huang et al. arxiv 2022. [Paper]
"Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arxiv 2021. [Paper]
"On the Advance of Making Language Models Better Reasoners". Yifei Li et al. arxiv 2022. [Paper]
"Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arxiv 2022. [Paper]
"Teaching small language models to reason". Lucie Charlotte Magister et al. arxiv 2022. [Paper]
"Large language models are reasoning teachers". Namgyu Ho et al. arxiv 2022. [Paper]
"The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning". Ye, Xi et al. arxiv 2022. [Paper]
"Scaling Instruction-Finetuned Language Models". Hyung Won Chung et al. arxiv 2022. [Paper]
"Solving Quantitative Reasoning Problems with Language Models". Aitor Lewkowycz et al. arxiv 2022. [Paper]
"Text and patterns: For effective chain of thought, it takes two to tango". Aman Madaan et al. arxiv 2022. [Paper]
"Challenging BIG-Bench tasks and whether chain-of-thought can solve them". Mirac Suzgun et al. arxiv 2022. [Paper]
"A Survey for In-context Learning". Qingxiu Dong et al. arxiv 2023. [Paper]
"Reasoning with Language Model Prompting: A Survey". Shuofei Qiao et al. arxiv 2022. [Paper]
"Towards Reasoning in Large Language Models: A Survey". Jie Huang et al. arxiv 2022. [Paper]
"Reward Design with Language Models". Minae Kwon et al. arxiv 2023. [Paper]
"Promptagator: Few-shot Dense Retrieval From 8 Examples". Zhuyun Dai et al. arxiv 2022. [Paper]
"On the Feasibility of Specialized Ability Stealing for Large Language Code Models". Zongjie Li et al. arxiv 2023. [Paper]
"MathPrompter: Mathematical Reasoning using Large Language Models". Imani, Shima et al. arxiv 2023. [Paper]
"ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction". Jiabang He et al. arxiv 2023. [Paper]
"Selective Annotation Makes Language Models Better Few-Shot Learners". Hongjin Su et al. arxiv 2022. [Paper]

Capacity Evaluation

"Measuring Massive Multitask Language Understanding". Dan Hendrycks et al. ICLR 2021. [Paper]
"Persistent Anti-Muslim Bias in Large Language Models". Abubakar Abid et al. AIES 2021. [Paper]
"Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models". Alex Tamkin et al. arXiv 2021. [Paper]
"BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments". Sanjana Srivastava et al. CoRL 2021. [Paper]
"Program Synthesis with Large Language Models". Jacob Austin et al. arXiv 2021. [Paper]
"Training Verifiers to Solve Math Word Problems". Karl Cobbe et al. arXiv 2021. [Paper]
"Show Your Work: Scratchpads for Intermediate Computation with Language Models". Maxwell I. Nye et al. arXiv 2021. [Paper]
"Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents". Wenlong Huang et al. ICML 2022. [Paper]
"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". Jason Wei et al. NeurIPS 2022. [Paper]
"Training language models to follow instructions with human feedback". Long Ouyang et al. arXiv 2022. [Paper]
"Competition-Level Code Generation with AlphaCode". Yujia Li et al. Science 2022. [Paper]
"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances". Michael Ahn et al. arXiv 2022. [Paper]
"Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". Yuntao Bai et al. arXiv 2022. [Paper]
"Autoformalization with Large Language Models". Yuhuai Wu et al. NeurIPS 2022. [Paper]
"Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models". Aarohi Srivastava et al. arXiv 2022. [Paper]
"Exploring Length Generalization in Large Language Models". Cem Anil et al. NeurIPS 2022. [Paper]
"Few-shot Learning with Retrieval Augmented Language Models". Gautier Izacard et al. arXiv 2022. [Paper]
"Limitations of Language Models in Arithmetic and Symbolic Induction". Jing Qian et al. arXiv 2022. [Paper]
"Code as Policies: Language Model Programs for Embodied Control". Jacky Liang et al. arXiv 2022. [Paper]
"ProgPrompt: Generating Situated Robot Task Plans using Large Language Models". Ishika Singh et al. arXiv 2022. [Paper]
"Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans". John J. Nay et al. arXiv 2022. [Paper]
"Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought". Abulhair Saparov et al. ICLR 2023. [Paper]
"Language Models are Multilingual Chain-of-Thought Reasoners". Freda Shi et al. ICLR 2023. [Paper]
"Re3: Generating Longer Stories With Recursive Reprompting and Revision". Kevin Yang et al. EMNLP 2022. [Paper]
"Language Models of Code are Few-Shot Commonsense Learners". Aman Madaan et al. EMNLP 2022. [Paper]
"Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them". Mirac Suzgun et al. arXiv 2022. [Paper]
"Large Language Models Can Self-Improve". Jiaxin Huang et al. arXiv 2022. [Paper]
"Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs". Albert Q. Jiang et al. ICLR 2023. [Paper]
"Holistic Evaluation of Language Models". Percy Liang et al. arXiv 2022. [Paper]
"PAL: Program-aided Language Models". Luyu Gao et al. arXiv 2022. [Paper]
"Legal Prompt Engineering for Multilingual Legal Judgement Prediction". Dietrich Trautmann et al. arXiv 2022. [Paper]
"How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment". Aidan Gilson et al. medRxiv 2022. [Paper]
"ChatGPT: The End of Online Exam Integrity?". Teo Susnjak et al. arXiv 2022. [Paper]
"Large Language Models are reasoners with Self-Verification". Yixuan Weng et al. arXiv 2022. [Paper]
"Self-Instruct: Aligning Language Model with Self Generated Instructions". Yizhong Wang et al. arXiv 2022. [Paper]
"ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports". Katharina Jeblick et al. arXiv 2022. [Paper]
"The End of Programming". Matt Welsh et al. ACM 2023. [Paper]
"Chatgpt goes to law school". Choi Jonathan H et al. SSRN 2023. [Paper]
"How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection". Biyang Guo et al. arXiv 2023. [Paper]
"Is ChatGPT A Good Translator? A Preliminary Study". Wenxiang Jiao et al. arXiv 2023. [Paper]
"Could an Artificial-Intelligence agent pass an introductory physics course?". Gerd Kortemeyer et al. arXiv 2023. [Paper]
"Mathematical Capabilities of ChatGPT". Simon Frieder et al. arXiv 2023. [Paper]
"Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models". Zhihong Shao et al. arXiv 2023. [Paper]
"Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning". Thomas Carta et al. arXiv 2023. [Paper]
"Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making". Arya Yao et al. medRxiv 2023. [Paper]
"Theory of Mind May Have Spontaneously Emerged in Large Language Models". Michal Kosinski et al. arXiv 2023. [Paper]
"A Categorical Archive of ChatGPT Failures". Ali Borji et al. arXiv 2023. [Paper]
"A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity". Yejin Bang et al. arXiv 2023. [Paper]
"Toolformer: Language Models Can Teach Themselves to Use Tools". Timo Schick et al. arXiv 2023. [Paper]
"Is ChatGPT a General-Purpose Natural Language Processing Task Solver?". Chengwei Qin et al. arXiv 2023. [Paper]
"How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation". Hendy Amr et al. arXiv 2023. [Paper]
"Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT". Qihuang Zhong et al. arXiv 2023. [Paper]
"Zero-Shot Information Extraction via Chatting with ChatGPT". Xiang Wei et al. arXiv 2023. [Paper]
"ChatGPT: Jack of all trades, master of none". Jan Kocon et al. arXiv 2023. [Paper]
"On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective". Jindong Wang et al. arXiv 2023. [Paper]
"Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback". Baolin Peng et al. arXiv 2023. [Paper]
"An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)". Paulo Shakarian et al. arXiv 2023. [Paper]
"How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks". Chen Xuanting et al. arXiv 2023. [Paper]
"The utility of ChatGPT for cancer treatment information". Shen Chen et al. medRxiv 2023. [Paper]
"Can ChatGPT Assess Human Personalities? A General Evaluation Framework". Haocong Rao et al. arXiv 2023. [Paper]
"Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT.". Mostafa M. Amin et al. arXiv 2023. [Paper]
"Exploring the Feasibility of ChatGPT for Event Extraction.". Jun Gao et al. arXiv 2023. [Paper]
"Does Synthetic Data Generation of LLMs Help Clinical Text Mining?". Tang Ruixiang et al. arXiv 2023. [Paper]
"Consistency Analysis of ChatGPT". Myeongjun Jang et al. arXiv 2023. [Paper]
"Self-planning Code Generation with Large Language Model". Shun Zhang et al. ICLR 2023. [Paper]
"Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions". Yiming Tan et al. arXiv 2023. [Paper]
"GPT-4 Technical Report". OpenAI et al. OpenAI 2023. [Paper]
"A Short Survey of Viewing Large Language Models in Legal Aspect". Zhongxiang Sun et al. arXiv 2023. [Paper]
"ChatGPT Participates in a Computer Science Exam". Sebastian Bordt et al. arXiv 2023. [Paper]
"A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models". Junjie Ye et al. arXiv 2023. [Paper]
"On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree?". Kamil Malinka et al. arXiv 2023. [Paper]
"Sparks of Artificial General Intelligence: Early experiments with GPT-4". S'ebastien Bubeck et al. arXiv 2023. [Paper]
"Is ChatGPT A Good Keyphrase Generator? A Preliminary Study". Mingyang Song et al. arXiv 2023. [Paper]
"Capabilities of GPT-4 on Medical Challenge Problems". Harsha Nori et al. arXiv 2023. [Paper]
"Can we trust the evaluation on ChatGPT?". Rachith Aiyappa et al. arXiv 2023. [Paper]
"ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks". Fabrizio Gilardi et al. arXiv 2023. [Paper]
"Evaluation of ChatGPT for NLP-based Mental Health Applications". Bishal Lamichhane et al. arXiv 2023. [Paper]
"ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models". Bian Ning et al. arXiv 2023. [Paper]
"Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams". Desnes Nunes et al. arXiv 2023. [Paper]
"Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure". Philipp Koralus et al. arXiv 2023. [Paper]
"Yes but.. Can ChatGPT Identify Entities in Historical Documents?". Carlos-Emiliano González-Gallardo et al. arXiv 2023. [Paper]
"Uncovering ChatGPT's Capabilities in Recommender Systems". Sunhao Dai et al. arXiv 2023. [Paper]

文章链接

https://pgmr.cloud/survey-large-language-models

登录发表评论

Search

List of LLMs

Resources of LLMs

Publicly Available Models

Closed-source Models

Commonly Used Corpora

Library Resource

Deep Learning Frameworks

Pre-training

Data Collection

Architecture

Mainstream Architectures

Detailed Configuration

Analysis

Training Algorithms

Pre-training on Code

LLMs for Program Synthesis

NLP Tasks Formatted as Code

Adaptation Tuning

Instruction Tuning

Alignment Tuning

Utilization

Capacity Evaluation

标签