LLaMA: Open and Efficient Foundation Language Models Feb 2023 Hugo Touvr...
Scaling Laws vs Model Architectures: How does Inductive Bias Influence S...
UL2: Unifying Language Learning Paradigms https://arxiv.org/abs/2205.051...
Transcending Scaling Laws with 0.1% Extra Compute https://arxiv.org/abs/...
Emergent Abilities of Large Language Models https://arxiv.org/abs/2206.0...
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age...
Scaling Laws for Autoregressive Generative Modeling Oct 2020 https://arx...
Scaling Laws for Neural Language Models Jan 2020 https://arxiv.org/abs/2...
預(yù)訓(xùn)練數(shù)據(jù)域(如維基百科醉顽、書籍沼溜、網(wǎng)絡(luò)文本)的混合比例極大地影響了語(yǔ)言模型(LM)的性能。在本文中游添,我們提出了具有Minimax優(yōu)化的域重新加權(quán)(...