New AI World's fundamental: Model, Data and Eval.

Mar 1, 2023 00:00 · 514 words · 3 minute read LLM

25-paper

In this blog/repo, we collect some paper about fundational model’s training methods, architecture, evaluation and training data processing, to find some key point to reproduce the more intelligent and more openful large language model!

(This repo is constantly updated.)

Pre-training large language models

This part maintains the paper about how to train a fundational models.

  1. Language Models are Few-Shot Learners

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah etc. from Openai 2020.15

    [pdf] GPT-3

  2. A General Language Assistant as a Laboratory for Alignment

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli etc. from Anthropic 2021.12

    [pdf] Anthropic-LM

  3. Improving language models by retrieving from trillions of tokens

    Sebastian Borgeaud , Arthur Mensch, Jordan Hoffmann etc. from deepmind 2021.12

    [pdf] RETRO

  4. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    deepmind 2021.12

    [pdf] Gopher

  5. JURASSIC-1: TECHNICAL DETAILS AND EVALUATION

    Opher Lieber, Or Sharir, Barak Lenz, Yoav Shoham from AI21 Labs 2021

    [pdf] J1-Jumbo

  6. LaMDA: Language Models for Dialog Applications

    google 2022.1

    [pdf] LaMDA

  7. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

    Shaden Smith , Mostofa Patwary etc. from Microsoft and Nvidia 2022.1

    [pdf] MT-NLG

  8. Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch etc. from deepmind 2022.3

    [pdf] Chinchilla

  9. PaLM: Scaling Language Modeling with Pathways

    google 2022.4

    [pdf] PaLM

  10. GPT-NeoX-20B: An Open-Source Autoregressive Language Model

    Sid Black, Stella Biderman, Eric Hallahan in EleutherAI 2022.4

    [pdf] GPT-NeoX-20B

  11. SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

    Yizhong Wang, Swaroop Mishra in Allen AI and Univ of Washington 2022.4

    [pdf] Tk-INSTRUCT

  12. OPT: Open Pre-trained Transformer Language Models

    Susan Zhang, Stephen Roller, Naman Goyal in Meta AI 2022.5

    [pdf] OPT

  13. Scaling Instruction-Finetuned Language Models

    Hyung Won Chung, Le Hou, Shayne Longpre etc. in Google 2022.10

    [pdf] Flan-PaLM

  14. LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard in MetaAI 2023.2

    [pdf] LLaMA

  15. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

    Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru etc. in 2022.12

    [pdf] OPT-IML

Pre-training Dataset

  1. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    EleutherAI

    [pdf]

  2. WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models

    Beijing Academy of Artificial Intelligence, China

    [pdf]

  3. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

    Facebook AI

    [pdf] [project]

Towards AIGC

This filed will contains the paper of RLHF and other possible way to achieve AIGC.

  1. Training language models to follow instructions with human feedback

    Open AI 2022.3

    [pdf] instruct-GPT

  2. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Anthropic 2022.4

    [pdf]

  3. Training Language Models with Language Feedback

    1New York University, 2University of the Basque Country, 3Genentech, 4CIFAR LMB 2022.4

    [pdf]

Evaluation

  1. Holistic Evaluation of Language Models

    Stanford University

    [pdf]

  2. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

    Centre for Artificial Intelligence Research

    [pdf]

  3. On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

    1Microsoft Research, 2City University of Hong Kong etc.

    [pdf]

  4. ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS (A CRITICAL INVESTIGATION WITH A PROPOSED BENCHMARK)

    Arizona State University, Tempe.

    [pdf]