Roberto Carratalá & Cedric Clyburn

Aug 20, 2025 • World Congress 2025

Self-Hosted LLMs: From Zero to Inference

Stop sending your private data to third-party AI. This talk shows you how to self-host powerful language models for complete control and security.

#1about 3 minutes

The rise of self-hosted open source AI models

Self-hosting large language models offers developers greater privacy, cost savings, and control compared to third-party cloud AI services.

#2about 2 minutes

Key benefits of local LLM deployment for developers

Running models locally improves the development inner loop, provides full data privacy, and allows for greater customization and control over the AI stack.

#3about 3 minutes

Comparing open source tools for serving LLMs

Explore different open source tools like Ollama for local development, vLLM for scalable production, and Podman AI Lab for containerized AI applications.

#4about 3 minutes

How to select the right open source LLM

Navigate the vast landscape of open source models by understanding different model families, their specific use cases, and naming conventions.

#5about 3 minutes

Using quantization to run large models locally

Model quantization compresses LLMs to reduce their memory footprint, enabling them to run efficiently on consumer hardware like laptops with CPUs or GPUs.

#6about 1 minute

Strategies for integrating local LLMs with your data

Learn three key methods for connecting local models to your data: Retrieval-Augmented Generation (RAG), local code assistants, and building agentic applications.

#7about 6 minutes

Demo: Building a RAG system with local models

Use Podman AI Lab to serve a local LLM and connect it to AnythingLLM to create a question-answering system over your private documents.

#8about 5 minutes

Demo: Setting up a local AI code assistant

Integrate a self-hosted LLM with the Continue VS Code extension to create a private, offline-capable AI pair programmer for code generation and analysis.

#9about 4 minutes

Demo: Building an agentic app with external tools

Create an agentic application that uses a local LLM with external tools via the Model Context Protocol (MCP) to perform complex, multi-step tasks.

#10about 1 minute

Conclusion and the future of open source AI

Self-hosting provides a powerful, private, and customizable alternative to third-party services, highlighting the growing potential of open source AI for developers.

Addressing data privacy and security in AI systems

01:40 MIN

Addressing data privacy and security in AI systems

Graphs and RAGs Everywhere... But What Are They? - Andreas Kollegger - Neo4j

Running large language models locally with Web LLM

12:42 MIN

Running large language models locally with Web LLM

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Understanding the benefits of self-hosting large language models

02:58 MIN

Understanding the benefits of self-hosting large language models

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Testing Spring AI applications with local LLMs

04:01 MIN

Testing Spring AI applications with local LLMs

What's (new) with Spring Boot and Containers?

Leveraging private data with local and small AI models

04:31 MIN

Leveraging private data with local and small AI models

Decoding Trends: Strategies for Success in the Evolving Digital Domain

Using local AI models for code assistance

02:11 MIN

Using local AI models for code assistance

Building APIs in the AI Era

The benefits and challenges of running AI models locally

05:39 MIN

The benefits and challenges of running AI models locally

WeAreDevelopers LIVE - Is AI replacing developers?, Stopping bots, AI on device & more

The developer's journey for building AI applications

06:44 MIN

The developer's journey for building AI applications

Supercharge your cloud-native applications with Generative AI

Featured Partners

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Unveiling the Magic: Scaling Large Language Models to Serve Millions

Patrick Koss

about 7 months ago • World Congress 2025

Inside the Mind of an LLM

Inside the Mind of an LLM

Emanuele Fabbiani

about 7 months ago • World Congress 2025

Unlocking the Power of AI: Accessible Language Model Tuning for All

Unlocking the Power of AI: Accessible Language Model Tuning for All

Cedric Clyburn & Legare Kerrison

about 2 years ago • World Congress 2024

Exploring LLMs across clouds

Exploring LLMs across clouds

Tomislav Tipurić

about 7 months ago • World Congress 2025

Three years of putting LLMs into Software - Lessons learned

Three years of putting LLMs into Software - Lessons learned

Simon A.T. Jiménez

about 7 months ago • World Congress 2025

One AI API to Power Them All

One AI API to Power Them All

Roberto Carratalá

about 7 months ago • World Congress 2025

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

How to Avoid LLM Pitfalls - Mete Atamel and Guillaume Laforge

Meta Atamel & Guillaume Laforge

about 11 months ago • Coffee With Developers

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

about 2 years ago • WeAreDevelopers LIVE

Related Articles

View all articles

Daniel Cranney

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

IntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

Luis Minvielle

What Are Large Language Models?

Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...

What Are Large Language Models?

Benedikt Bischof

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Bas Geerdink who gave advice on MLOps.‍About the speaker:‍Bas is a programmer, scientist, and IT manager. At ING, he is responsible for the Fast...

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

Daniel Cranney

Stephan Gillich - Bringing AI Everywhere

In the ever-evolving world of technology, AI continues to be the frontier for innovation and transformation. Stephan Gillich, from the AI Center of Excellence at Intel, dove into the subject in a recent session titled "Bringing AI Everywhere," sheddi...

Stephan Gillich - Bringing AI Everywhere

From learning to earning

Jobs that call for the skills explored in this talk.

Machine Learning & Data Engineer

vengine GmbH
Hamburg, Germany

Junior

Intermediate

Python

Product Owner/Projektleiter (m/w/d)

relyon AG
Tübingen, Germany

Junior

Intermediate

Senior

Scrum

Data Engineer (f/m/d) - AI

smartclip Europe GmbH
Hamburg, Germany

Intermediate

Senior

ETL

Java

Scala

Machine Learning Engineer (m/f/d)

evoila Frankfurt GmbH
Mainz, Germany

Senior

Keras

DevOps

Tensorflow

Machine Learning Expert (Llms)

NLP People
Municipality of Madrid, Spain

Senior

Machine Learning

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

Remote

GIT

Linux

PyTorch

Part-Time - AI Operations Support (Voice AI & Automation)

Auralinx

Remote