Optimizing vLLM Performance through Quantization | Ray Summit 2024
Anyscale
2,672 views
72 likes
The State of vLLM | Ray Summit 2024
Anyscale
Quantization in vLLM: From Zero to Hero
Siemens Knowledge Hub
Richard Sutton – Father of RL thinks LLMs are a dead end
Dwarkesh Patel
vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 202
Neural Magic
LLM inference optimization: Architecture, KV cache and Flash
YanAITalk
Which Quantization Method is Right for You? (GPTQ vs. GGUF v
Maarten Grootendorst
vLLM Office Hours - Advanced Techniques for Maximizing vLLM
Neural Magic
The Future of AI Assistants: Jerry Liu | Ray Summit 2024
Anyscale
The Strange Math That Predicts (Almost) Anything
Veritasium
Larry Ellison Keynote on Oracle's Vision and Strategy: Oracl
Oracle
Visualizing transformers and attention | Talk for TNG Big Te
Grant Sanderson
Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)
Adam Lucek
MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantizat
MIT HAN Lab
vLLM Office Hours - Distributed Inference with vLLM - Januar
Neural Magic
Databricks' vLLM Optimization for Cost-Effective LLM Inferen
Anyscale
How the VLLM inference engine works?
Vizuara
GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eif
AI Engineer
[vLLM Office Hours #25] Structured Outputs in vLLM - May 8,
Neural Magic
Ray + Kubernetes: The Distributed OS for AI/ML | Ray on the
Anyscale
Quantization explained with PyTorch - Post-Training Quantiza
Umar Jamil