Youtube Clone

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Anyscale

2,672 views

72 likes

The State of vLLM | Ray Summit 2024

Anyscale

Quantization in vLLM: From Zero to Hero

Siemens Knowledge Hub

Richard Sutton – Father of RL thinks LLMs are a dead end

Dwarkesh Patel

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 202

Neural Magic

LLM inference optimization: Architecture, KV cache and Flash

YanAITalk

Which Quantization Method is Right for You? (GPTQ vs. GGUF v

Maarten Grootendorst

vLLM Office Hours - Advanced Techniques for Maximizing vLLM

Neural Magic

The Future of AI Assistants: Jerry Liu | Ray Summit 2024

Anyscale

The Strange Math That Predicts (Almost) Anything

Veritasium

Larry Ellison Keynote on Oracle's Vision and Strategy: Oracl

Oracle

Visualizing transformers and attention | Talk for TNG Big Te

Grant Sanderson

Quantizing LLMs - How & Why (8-Bit, 4-Bit, GGUF & More)

Adam Lucek

MLSys'24 Best Paper - AWQ: Activation-aware Weight Quantizat

MIT HAN Lab

vLLM Office Hours - Distributed Inference with vLLM - Januar

Neural Magic

Databricks' vLLM Optimization for Cost-Effective LLM Inferen

Anyscale

How the VLLM inference engine works?

Vizuara

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eif

AI Engineer

[vLLM Office Hours #25] Structured Outputs in vLLM - May 8,

Neural Magic

Ray + Kubernetes: The Distributed OS for AI/ML | Ray on the

Anyscale

Quantization explained with PyTorch - Post-Training Quantiza

Umar Jamil