Mike Lewis (Meta) – “Science and Scaling: How (really) to Pre-train a Llama”
3400 N CHARLES ST
Abstract
Pre-trained language models form the basis for much of modern NLP, and while the basic pre-training recipe is well known, many details of model development are hidden in secretive research labs. Based on experience training Meta’s Llama models, I will shed light on the evolving science of pre-training. The central challenge of pre-training research is in designing small scale experiments that can confidently be extrapolated to main training runs at orders of magnitude greater scale – which has led to an approach that differs from much of academic research. While careful experiments can disentangle many of the factors that do and don’t matter for large scale training, I will also discuss gaps in the science that mean researcher judgements and intuitions remain critical to decision making.
Bio
Mike Lewis is the Pre-training Research Lead on the Llama team at Meta, and led pre-training for Llama 3. He has worked in many areas of NLP, including developing the Cicero bot for the game of Diplomacy, and early pre-trained language models including Bart and Roberta. Prior to Meta, he was a postdoc at the University of Washington (working with Luke Zettlemoyer), and has a PhD from the University of Edinburgh (advised by Mark Steedman). He has received several paper awards at ACL and EMNLP. His work has been extensively covered in the media, with varying levels of accuracy