Inlay

Profile

Developer Advocate at Red Hat • Organizer KCD New York • Previously at MongoDB • containers, k8s, & everything in between

Cedric Clyburn

LLM Inference workloads are becoming monolithic, heavy, & hard to scale. That's where platform engineers can embrace llm_d, a new open-source effort that’s starting to tackle a problem we’re seeing more and more in prod ML stacks!!! 🎧 to @cedricclyburn.com & Christopher Nuland👇 to learn more

2mo

🔌 Messy data meets LLMs! Join Andy Igdal and Cedric Clyburn (@cedricclyburn.bsky.social) at DevBcn 2026. Learn how Python and AI can tackle residential electrification data in the Big Data & AI track! 📊🤖 📅 June 16-17 🔗 buff.ly/dz1tICU #devbcn26

@cedricclyburn.com "Shows the 5 Podman Features You Should Know: Kubernetes & Containers Simplified" in this YouTube video: www.youtube.com/watch?v=dEy3.... Can you guess what they are? @opensource #podman

2mo

1mo

Self-hosting LLM’s typically includes a set of requirements for our infrastructure, ex: 🏎️ Hardware accelerators ⚙️ System & model configuration 📚 Specific libraries/dependencies That’s why @cern.voxxeddays.ch I was honored to demo Ramalama.ai, a project that containers to safely run AI models 🤖

Had an AWESOME conversation with the @allthingsopen.bsky.social community about local LLMs on small hardware: model compression can quantize a model from 220 GB → 55 GB with <1% accuracy loss, and inference engines like vLLM help run them fast and efficiently. 🎥 www.youtube.com/watch?v=xGqV...

Live from #KubeCon Europe in Amsterdam! 🇳🇱 Big announcement from the Red Hat booth: llm-d, a cloud-native way to run AI at scale with major performance & cost savings. + ran a workshop on RAG at scale using KubeFlow & Docling on Kubernetes. Slides below 👇

This year @devoxx.uk we asked a simple question: can we use local models as AI code assistants? 🤔 The answer? Yes… and no! But you should check out the recording though and see what worked (like MCP, Skills.MD, and more) as well as what to know about local LLMs for devs :) 🎥 youtu.be/Lhqp7gKXu2w

AI coding tools are about to get more expensive, and the @anthropic.com news yesterday is a good indicator of what’s to come ($20 to $100 for Claude Code). Understandable, because GPU compute isn’t cheap 👀 Running models in-house with #vLLM is starting to look quite nice!

Check out the recording below! 🎥 RamaLama: Making working with AI Models Boring: www.youtube.com/watch?v=CYxw...

💻 Slides: red.ht/rag-slides

Barcelona Developers Conference

2mo

1mo

27d

1mo

2mo

YouTube video by Devoxx UK

youtu.be

Local Development in the AI Era by Cedrci Clyburn & Kevin Dubois

Cedric Clyburn

www.youtube.com

YouTube video by IBM Technology

5 Podman Features You Should Know: Kubernetes & Containers Simplified

Video

YouTube video by All Things Open

www.youtube.com

Local LLMs are about to change everything – here's why quantization matters

www.youtube.com

YouTube video by Devoxx

RamaLama: Making working with AI Models Boring by Cedric Clyburn

red.ht

Building Intelligent Apps with RAG on Kubernetes From Raw Data to Real-Time Insights Cedric Clyburn, Natale Vinto, Christopher Nuland & Legare Kerrison, Red Hat

[Public] Building Intelligent Apps with RAG on Kubernetes: From Raw Data to Real-Time Insights