Nathaniel Okenwa

Aug 22, 2024 • World Congress 2024

Performant Architecture for a Fast Gen AI User Experience

Stop blaming slow models for your AI app's latency. Your architecture is the real problem.

#1about 2 minutes

Building a real-time translator inspired by sci-fi

The Babel fish from "Hitchhiker's Guide to the Galaxy" serves as the inspiration for a real-time audio translation project.

#2about 4 minutes

Analyzing the latency of a basic AI architecture

A demonstration of the initial 2019 architecture using GCloud reveals a significant latency of over ten seconds for a simple translation.

#3about 2 minutes

Reducing latency by upgrading the AI service stack

Switching to modern, specialized APIs like Deepgram and 11 Labs significantly cuts the total processing time from twelve to five seconds.

#4about 2 minutes

Implementing streaming to reduce response wait times

Adopting a streaming approach provides a major performance boost, but a naive implementation results in chaotic and low-quality audio output.

#5about 2 minutes

Using chunking to balance streaming speed and quality

Chunking data based on sentence punctuation controls the streaming waterfall, improving the quality of generated audio without sacrificing speed.

#6about 6 minutes

Eliminating network latency with local and edge models

Running a smaller, local AI model like Whisper on the edge eliminates cross-continental network latency and provides near-instantaneous results.

#7about 3 minutes

Using caching to serve pre-generated AI responses

Implementing caching, from simple request matching to semantic search with vector databases, avoids redundant generation and speeds up common queries.

#8about 2 minutes

Optimizing prompts and user experience for speed

Fine-tuning performance involves optimizing prompts to generate fewer tokens and improving perceived speed with clear loading states for the user.

#9about 2 minutes

Summary of key performance optimization techniques

A final recap covers the essential strategies for building fast Gen AI experiences, including streaming, edge computing, caching, and prompt optimization.

Key architectural challenges in building GenAI apps

01:43 MIN

Key architectural challenges in building GenAI apps

Chatbots are going to destroy infrastructures and your cloud bills

Navigating the overwhelming wave of generative AI adoption

02:24 MIN

Navigating the overwhelming wave of generative AI adoption

Developer Experience, Platform Engineering and AI powered Apps

Generative AI use cases and cloud provider limitations

00:54 MIN

Generative AI use cases and cloud provider limitations

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Panelists' wishes for future AI capabilities

02:33 MIN

Panelists' wishes for future AI capabilities

The Future of Developer Experience with GenAI: Driving Engineering Excellence

The future of translation and human-AI collaboration

02:09 MIN

The future of translation and human-AI collaboration

Fireside Chat: Deep Learning, Deep Impact: Harnessing AI for Language Innovation

The technology behind in-browser AI execution

02:20 MIN

The technology behind in-browser AI execution

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Exploring new frontiers in coding and computer interaction

08:56 MIN

Exploring new frontiers in coding and computer interaction

WeAreDevelopers LIVE - Dapr / Pixels and Generative Art / Open Source and Communities / and more

Predicting the next era of generative AI

03:59 MIN

Predicting the next era of generative AI

Closing Keynote by Joel Spolsky

Featured Partners

Prompt API & WebNN: The AI Revolution Right in Your Browser

Prompt API & WebNN: The AI Revolution Right in Your Browser

Christian Liebel

about 7 months ago • World Congress 2025

Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based

Privacy-first in-browser Generative AI web apps: offline-ready, future-proof, standards-based

Maxim Salnikov

about 7 months ago • World Congress 2025

Chatbots are going to destroy infrastructures and your cloud bills

Chatbots are going to destroy infrastructures and your cloud bills

Stanislas Girard

about 2 years ago • World Congress 2024

Livecoding with AI

Livecoding with AI

Rainer Stropek

about 2 years ago • World Congress 2024

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Generative AI power on the web: making web apps smarter with WebGPU and WebNN

Christian Liebel

about 2 years ago • World Congress 2024

Make it simple, using generative AI to accelerate learning

Make it simple, using generative AI to accelerate learning

Duan Lightfoot

about 2 years ago • World Congress 2024

How AI Models Get Smarter

How AI Models Get Smarter

Ankit Patel

about 8 months ago • World Congress 2025

Supercharge your cloud-native applications with Generative AI

Supercharge your cloud-native applications with Generative AI

Cedric Clyburn

about 2 years ago • World Congress 2024

Related Articles

View all articles

Daniel Cranney

How to Use Generative AI to Accelerate Learning to Code

It’s undeniable that generative-AI and LLMs have transformed how developers work. Hours of hunting Stack Overflow can be avoided by asking your AI-code assistant, multi-file context can be fed to the AI from inside your IDE, and applications can be b...

How to Use Generative AI to Accelerate Learning to Code

Daniel Cranney

Stephan Gillich - Bringing AI Everywhere

In the ever-evolving world of technology, AI continues to be the frontier for innovation and transformation. Stephan Gillich, from the AI Center of Excellence at Intel, dove into the subject in a recent session titled "Bringing AI Everywhere," sheddi...

Stephan Gillich - Bringing AI Everywhere

Adrien Book

How AI Will Eat The World 🤖

Of generative-AI-for-everything and synthetic pleasuresRemember the web3 hype? Tech bros with easy access to cheap liquidity wanted to create a decentralised, peer-to-peer internet powered by blockchain technology. Spoiler alert, it did not work. And...

How AI Will Eat The World 🤖

Benedikt Bischof

How we Build The Software of Tomorrow

Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Thomas Dohmke who introduced us to the future of AI – coding.This is how Thomas describes himself:I am the CEO of GitHub and drive the company’s...

How we Build The Software of Tomorrow

From learning to earning

Jobs that call for the skills explored in this talk.

Data Engineer (f/m/d) - AI

smartclip Europe GmbH
Hamburg, Germany

Intermediate

Senior

ETL

Java

Scala

AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

Remote

GIT

Linux

PyTorch

AI Engineer - Schwerpunkt Generative KI Systeme

Thalia Thalia

GIT

Docker

PyTorch

Tensorflow

Kubernetes

+1

AI Engineer - Generative AI /pixelhead)

Conrad Electronic SE

Part-Time - AI Operations Support (Voice AI & Automation)

Auralinx

Remote

German Advice Conversation Partner for AI Training

Babel Audio

Remote

€71K

Generative AI Engineer

MNK Group SA

Intermediate

GIT

Conversational AI & Machine Learning Engineer

Deloitte

Machine Learning

Conversational AI & Machine Learning Engineer

Deloitte

DevOps

Docker

PyTorch

Tensorflow

Kubernetes

+2