Mike Lewis (Meta) – “Science and Scaling: How (really) to Pre-train a Llama”

When:
April 18, 2025 @ 12:00 pm – 1:15 pm
2025-04-18T12:00:00-04:00
2025-04-18T13:15:00-04:00
Where:
Hackerman Hall B17
3400 N CHARLES ST
Cost:
Free

Abstract

Pre-trained language models form the basis for much of modern NLP, and while the basic pre-training recipe is well known, many details of model development are hidden in secretive research labs. Based on experience training Meta’s Llama models, I will shed light on the evolving science of pre-training. The central challenge of pre-training research is in designing small scale experiments that can confidently be extrapolated to main training runs at orders of magnitude greater scale – which has led to an approach that differs from much of academic research. While careful experiments can disentangle many of the factors that do and don’t matter for large scale training, I will also discuss gaps in the science that mean researcher judgements and intuitions remain critical to decision making.

Bio

Mike Lewis is the Pre-training Research Lead on the Llama team at Meta, and led pre-training for Llama 3. He has worked in many areas of NLP, including developing the Cicero bot for the game of Diplomacy, and early pre-trained language models including Bart and Roberta. Prior to Meta, he was a postdoc at the University of Washington (working with Luke Zettlemoyer), and has a PhD from the University of Edinburgh (advised by Mark Steedman). He has received several paper awards at ACL and EMNLP. His work has been extensively covered in the media, with varying levels of accuracy

Center for Language and Speech Processing