New AI World's fundamental: Model, Data and Eval.

Mar 1, 2023 00:00 · 514 words · 3 minute read LLM

In this blog/repo, we collect some paper about fundational model’s training methods, architecture, evaluation and training data processing, to find some key point to reproduce the more intelligent and more openful large language model!

(This repo is constantly updated.)

Pre-training large language models

This part maintains the paper about how to train a fundational models.

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah etc. from Openai 2020.15

[pdf] GPT-3
A General Language Assistant as a Laboratory for Alignment

Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli etc. from Anthropic 2021.12

[pdf] Anthropic-LM
Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud , Arthur Mensch, Jordan Hoffmann etc. from deepmind 2021.12

[pdf] RETRO
Scaling Language Models: Methods, Analysis & Insights from Training Gopher

deepmind 2021.12

[pdf] Gopher
JURASSIC-1: TECHNICAL DETAILS AND EVALUATION

Opher Lieber, Or Sharir, Barak Lenz, Yoav Shoham from AI21 Labs 2021

[pdf] J1-Jumbo
LaMDA: Language Models for Dialog Applications

google 2022.1

[pdf] LaMDA
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith , Mostofa Patwary etc. from Microsoft and Nvidia 2022.1

[pdf] MT-NLG
Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch etc. from deepmind 2022.3

[pdf] Chinchilla
PaLM: Scaling Language Modeling with Pathways

google 2022.4

[pdf] PaLM
GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, Stella Biderman, Eric Hallahan in EleutherAI 2022.4

[pdf] GPT-NeoX-20B
SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

Yizhong Wang, Swaroop Mishra in Allen AI and Univ of Washington 2022.4

[pdf] Tk-INSTRUCT
OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal in Meta AI 2022.5

[pdf] OPT
Scaling Instruction-Finetuned Language Models

Hyung Won Chung, Le Hou, Shayne Longpre etc. in Google 2022.10

[pdf] Flan-PaLM
LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard in MetaAI 2023.2

[pdf] LLaMA
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru etc. in 2022.12

[pdf] OPT-IML

Pre-training Dataset

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

EleutherAI

[pdf]
WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models

Beijing Academy of Artificial Intelligence, China

[pdf]
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Facebook AI

[pdf] [project]

Towards AIGC

This filed will contains the paper of RLHF and other possible way to achieve AIGC.

Training language models to follow instructions with human feedback

Open AI 2022.3

[pdf] instruct-GPT
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Anthropic 2022.4

[pdf]
Training Language Models with Language Feedback

1New York University, 2University of the Basque Country, 3Genentech, 4CIFAR LMB 2022.4

[pdf]

Evaluation

Holistic Evaluation of Language Models

Stanford University

[pdf]
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

Centre for Artificial Intelligence Research

[pdf]
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

1Microsoft Research, 2City University of Hong Kong etc.

[pdf]
ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS (A CRITICAL INVESTIGATION WITH A PROPOSED BENCHMARK)

Arizona State University, Tempe.

[pdf]