Roberto Carratalá

Aug 20, 2025 • World Congress 2025

One AI API to Power Them All

Stop wrestling with fragmented AI tools. Go from local development to a production cluster with one unified API for inference, RAG, and agents.

#1about 5 minutes

The challenge of building production-ready AI applications

The current AI landscape is fragmented with many tools, making it complex to build, scale, and maintain applications with features like RAG and agents.

#2about 3 minutes

Introducing Llama Stack for a unified AI API

Llama Stack, an open-source project from Meta, provides a standardized, modular framework to simplify AI development with a single API for various components.

#3about 3 minutes

Standardizing model inference and safety guardrails

Llama Stack abstracts away differences between local and remote LLMs and integrates safety shields to filter harmful inputs and outputs.

#4about 2 minutes

Simplifying retrieval-augmented generation (RAG) pipelines

Llama Stack organizes the complex RAG process into three distinct, swappable layers for vector embeddings, retrieval, and agentic workflows.

#5about 4 minutes

Building AI agents using the Model Context Protocol

Llama Stack simplifies agent creation by integrating tools, orchestration, and reasoning models through the standardized Model Context Protocol (MCP).

#6about 3 minutes

Gaining application observability with built-in telemetry

Llama Stack provides out-of-the-box telemetry using OpenTelemetry, enabling developers to trace multi-step agent workflows with tools like Jaeger.

#7about 4 minutes

A local demo of inference, safety, and agents

This live demo showcases running Llama Stack locally to perform inference, block unsafe prompts, use an agent to check the weather, and inspect traces in Jaeger.

#8about 1 minute

Transitioning AI applications from local to production

Llama Stack enables a seamless transition from a local development setup to a scalable production environment on Kubernetes by maintaining a consistent API.

#9about 5 minutes

A production demo of a multi-agent business workflow

A complex agent interacts with multiple MCP servers to query a CRM, analyze customer data, send Slack notifications, and generate a PDF report.

Comparing open source tools for serving LLMs

02:48 MIN

Comparing open source tools for serving LLMs

Self-Hosted LLMs: From Zero to Inference

Introducing RAGStack as an opinionated development framework

01:47 MIN

Introducing RAGStack as an opinionated development framework

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Integrating decentralized tech and AI into your stack

02:35 MIN

Integrating decentralized tech and AI into your stack

End-to-End TypeScript: Completing the Modern Development Stack

Three pillars for integrating LLMs in products

01:47 MIN

Three pillars for integrating LLMs in products

Using LLMs in your Product

Testing Spring AI applications with local LLMs

04:01 MIN

Testing Spring AI applications with local LLMs

What's (new) with Spring Boot and Containers?

Understanding the modern LLM application stack

02:08 MIN

Understanding the modern LLM application stack

Building AI Applications with LangChain and Node.js

The opaque and complex stack of modern LLM services

02:19 MIN

The opaque and complex stack of modern LLM services

You are not my model anymore - understanding LLM model behavior

Using Red Hat tools across the AI development lifecycle

01:51 MIN

Using Red Hat tools across the AI development lifecycle

Developer Experience, Platform Engineering and AI powered Apps

Featured Partners

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

about 2 years ago • WeAreDevelopers LIVE

Self-Hosted LLMs: From Zero to Inference

Self-Hosted LLMs: From Zero to Inference

Roberto Carratalá & Cedric Clyburn

about 7 months ago • World Congress 2025

The State of GenAI & Machine Learning in 2025

The State of GenAI & Machine Learning in 2025

Alejandro Saucedo

about 7 months ago • World Congress 2025

Agentic AI Systems for Critical Workloads

Agentic AI Systems for Critical Workloads

Mario Fusco

about 7 months ago • World Congress 2025

Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache Camel

Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache Camel

Bruno Meseguer & Markus Eisele

about 7 months ago • World Congress 2025

Azure AI Foundry for Developers: Open Tools, Scalable Agents, Real Impact

Azure AI Foundry for Developers: Open Tools, Scalable Agents, Real Impact

Oliver Will

about 7 months ago • World Congress 2025

Java Meets AI: Empowering Spring Developers to Build Intelligent Apps

Java Meets AI: Empowering Spring Developers to Build Intelligent Apps

Timo Salm

about 7 months ago • World Congress 2025

AI Agents Graph: Your following tool in your Java AI journey

AI Agents Graph: Your following tool in your Java AI journey

Alex Soto

about 7 months ago • World Congress 2025

Related Articles

View all articles

Daniel Cranney

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

IntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

Chris Heilmann

Coffee with Developers - Maria Apazoglou - Making AI understandable for all in production

Hello and welcome to another edition of Coffee with Developers. Today, we're excited to share an intriguing conversation with Maria Apazoglou, a leading figure in the AI space at Thomson Reuters. Maria's career journey, insights on AI, and the exciti...

Coffee with Developers - Maria Apazoglou - Making AI understandable for all in production

Chris Heilmann

Exploring AI: Opportunities and Risks for Developers

In today's rapidly evolving tech landscape, the integration of Artificial Intelligence (AI) in development presents both exciting opportunities and notable risks. This dynamic was the focus of a recent panel discussion featuring industry experts Kent...

Exploring AI: Opportunities and Risks for Developers

Chris Heilmann

All the videos of Halfstack London 2024!

Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...

All the videos of Halfstack London 2024!

From learning to earning

Jobs that call for the skills explored in this talk.

Product Owner/Projektleiter (m/w/d)

relyon AG
Tübingen, Germany

Junior

Intermediate

Senior

Scrum

Data Engineer (f/m/d) - AI

smartclip Europe GmbH
Hamburg, Germany

Intermediate

Senior

ETL

Java

Scala

Machine Learning & Data Engineer

vengine GmbH
Hamburg, Germany

Junior

Intermediate

Python

Senior Python Engineer

CONTIAMO GMBH
Berlin, Germany

Senior

Python

Docker

TypeScript

PostgreSQL

AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

Remote

GIT

Linux

PyTorch

Full Stack Developer focused on AI Development

SBI GmbH

DevOps

Gitlab

Pandas

Docker

PyTorch

+8

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

Part-Time - AI Operations Support (Voice AI & Automation)

Auralinx

Remote