Member-only story
LinkedIn’s Liger: The GPU Kernel Suite that Andrej Karpathy, Jeremy Howard, and Thomas Wolf Use for Efficient LLM Training
LinkedIn just open-sourced a collection of Triton-based kernels that are specifically designed for Large Language Model (LLM) training.
Honestly, it’s kind of a big deal.
I will shortly explain why companies bother with custom kernels, but I didn’t see this coming for 2024, it’s going to save us lots of time!
By adding a single line of code, you can boost throughput by over 20% and cut memory usage by 60%.
bf16
, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 8 A100s.As you can see, Hugging Face models are out of memory at a 4K context length, whereas Hugging Face + Liger Kernel scales up to 16K.
This will effectively allow for extended context lengths, larger batch sizes, and support for extensive vocabularies.
Let’s dive deeper!
What’s the Buzz About Liger Kernels?
Liger Kernels, or as they officially call it, the LinkedIn GPU Efficient Runtime Kernel, are a set of custom Triton kernels that have been supercharged for LLM training.