Developer Advocate at Red Hat β’ Organizer KCD New York β’ Previously at MongoDB β’ containers, k8s, & everything in between
Cedric Clyburn
Loading...
LLM Inference workloads are becoming monolithic, heavy, & hard to scale. That's where platform engineers can embrace llm_d, a new open-source effort thatβs starting to tackle a problem weβre seeing more and more in prod ML stacks!!! π§ to @cedricclyburn.com & Christopher Nulandπ to learn more
π Messy data meets LLMs!
Join Andy Igdal and Cedric Clyburn (@cedricclyburn.bsky.social) at DevBcn 2026.
Learn how Python and AI can tackle residential electrification data in the Big Data & AI track! ππ€
π June 16-17
π buff.ly/dz1tICU
#devbcn26
@cedricclyburn.com "Shows the 5 Podman Features You Should Know: Kubernetes & Containers Simplified" in this YouTube video: www.youtube.com/watch?v=dEy3.... Can you guess what they are? @opensource #podman
Self-hosting LLMβs typically includes a set of requirements for our infrastructure, ex:
ποΈ Hardware accelerators
βοΈ System & model configuration
π Specific libraries/dependencies
Thatβs why @cern.voxxeddays.ch I was honored to demo Ramalama.ai, a project that containers to safely run AI models π€
Had an AWESOME conversation with the @allthingsopen.bsky.social community about local LLMs on small hardware: model compression can quantize a model from 220 GB β 55 GB with <1% accuracy loss, and inference engines like vLLM help run them fast and efficiently.
π₯ www.youtube.com/watch?v=xGqV...
Live from #KubeCon Europe in Amsterdam! π³π±
Big announcement from the Red Hat booth: llm-d, a cloud-native way to run AI at scale with major performance & cost savings.
+ ran a workshop on RAG at scale using KubeFlow & Docling on Kubernetes. Slides below π
This year @devoxx.uk we asked a simple question: can we use local models as AI code assistants? π€
The answer? Yes⦠and no! But you should check out the recording though and see what worked (like MCP, Skills.MD, and more) as well as what to know about local LLMs for devs :)
π₯ youtu.be/Lhqp7gKXu2w
AI coding tools are about to get more expensive, and the @anthropic.com news yesterday is a good indicator of whatβs to come ($20 to $100 for Claude Code). Understandable, because GPU compute isnβt cheap π
Running models in-house with #vLLM is starting to look quite nice!
Check out the recording below!
π₯ RamaLama: Making working with AI Models Boring: www.youtube.com/watch?v=CYxw...
Building Intelligent Apps with RAG on Kubernetes From Raw Data to Real-Time Insights Cedric Clyburn, Natale Vinto, Christopher Nuland & Legare Kerrison, Red Hat