ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
Yannic Kilcher
25,403 views
667 likes
Scaling LLM Test-Time Compute Optimally can be More Effectiv
Yannic Kilcher
Energy-Based Transformers are Scalable Learners and Thinkers
Yannic Kilcher
Mixtral of Experts (Paper Explained)
Yannic Kilcher
Were RNNs All We Needed? (Paper Explained)
Yannic Kilcher
Physics-Informed AI Series | Scale-consistent Learning with
Autodesk Research
What the Books Get Wrong about AI [Double Descent]
Welch Labs
Richard Sutton – Father of RL thinks LLMs are a dead end
Dwarkesh Patel
Context Rot: How Increasing Input Tokens Impacts LLM Perform
Yannic Kilcher
SFT vs GRPO
Trelis Research
TokenFormer: Rethinking Transformer Scaling with Tokenized M
Yannic Kilcher
Contrastive Preference Optimization Explained
Unify
Flow Matching for Generative Modeling (Paper Explained)
Yannic Kilcher
How does DeepSeek learn? GRPO explained with Triangle Creatu
Dr Mihai Nica
Mamba: Linear-Time Sequence Modeling with Selective State Sp
Yannic Kilcher
The Strange Math That Predicts (Almost) Anything
Veritasium
Leave No Context Behind: Efficient Infinite Context Transfor
Yannic Kilcher
Scalable MatMul-free Language Modeling (Paper Explained)
Yannic Kilcher
Reinforcement Learning, RLHF, & DPO Explained
Mark Hennings
Beyond A*: Better Planning with Transformers via Search Dyna
Yannic Kilcher
Direct Preference Optimization (DPO) - How to fine-tune LLMs
Serrano.Academy