New AI World's fundamental: Model, Data and Eval.
Mar 1, 2023 00:00 · 514 words · 3 minute read
In this blog/repo, we collect some paper about fundational model’s training methods, architecture, evaluation and training data processing, to find some key point to reproduce the more intelligent and more openful large language model!
(This repo is constantly updated.)
Pre-training large language models
This part maintains the paper about how to train a fundational models.
-
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah etc. from Openai 2020.15
[pdf] GPT-3
-
A General Language Assistant as a Laboratory for Alignment
Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli etc. from Anthropic 2021.12
[pdf] Anthropic-LM
-
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud , Arthur Mensch, Jordan Hoffmann etc. from deepmind 2021.12
[pdf] RETRO
-
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
deepmind 2021.12
[pdf] Gopher
-
JURASSIC-1: TECHNICAL DETAILS AND EVALUATION
Opher Lieber, Or Sharir, Barak Lenz, Yoav Shoham from AI21 Labs 2021
[pdf] J1-Jumbo
-
LaMDA: Language Models for Dialog Applications
google 2022.1
[pdf] LaMDA
-
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith , Mostofa Patwary etc. from Microsoft and Nvidia 2022.1
[pdf] MT-NLG
-
Training Compute-Optimal Large Language Models
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch etc. from deepmind 2022.3
[pdf] Chinchilla
-
PaLM: Scaling Language Modeling with Pathways
google 2022.4
[pdf] PaLM
-
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black, Stella Biderman, Eric Hallahan in EleutherAI 2022.4
[pdf] GPT-NeoX-20B
-
SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang, Swaroop Mishra in Allen AI and Univ of Washington 2022.4
[pdf] Tk-INSTRUCT
-
OPT: Open Pre-trained Transformer Language Models
Susan Zhang, Stephen Roller, Naman Goyal in Meta AI 2022.5
[pdf] OPT
-
Scaling Instruction-Finetuned Language Models
Hyung Won Chung, Le Hou, Shayne Longpre etc. in Google 2022.10
[pdf] Flan-PaLM
-
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard in MetaAI 2023.2
[pdf] LLaMA
-
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru etc. in 2022.12
[pdf] OPT-IML
Pre-training Dataset
-
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
EleutherAI
[pdf]
-
WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models
Beijing Academy of Artificial Intelligence, China
[pdf]
-
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Facebook AI
Towards AIGC
This filed will contains the paper of RLHF and other possible way to achieve AIGC.
-
Training language models to follow instructions with human feedback
Open AI 2022.3
[pdf] instruct-GPT
-
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Anthropic 2022.4
[pdf]
-
Training Language Models with Language Feedback
1New York University, 2University of the Basque Country, 3Genentech, 4CIFAR LMB 2022.4
[pdf]
Evaluation
-
Holistic Evaluation of Language Models
Stanford University
[pdf]
-
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Centre for Artificial Intelligence Research
[pdf]
-
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
1Microsoft Research, 2City University of Hong Kong etc.
[pdf]
-
ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS (A CRITICAL INVESTIGATION WITH A PROPOSED BENCHMARK)
Arizona State University, Tempe.
[pdf]