Sergio Perez & Harshita Seth

Aug 20, 2025 • World Congress 2025

Adding knowledge to open-source LLMs

When is retrieval-augmented generation not enough? Learn the multi-stage process for deeply embedding new knowledge into an open-source LLM.

#1about 4 minutes

Understanding the LLM training pipeline and knowledge gaps

LLMs are trained through pre-training and alignment, but require new knowledge to stay current, adapt to specific domains, and acquire new skills.

#2about 5 minutes

Adding domain knowledge with continued pre-training

Continued pre-training adapts a foundation model to a specific domain by training it further on specialized, unlabeled data using self-supervised learning.

#3about 6 minutes

Developing skills and reasoning with supervised fine-tuning

Supervised fine-tuning uses instruction-based datasets to teach models specific tasks, chat capabilities, and complex reasoning through techniques like chain of thought.

#4about 8 minutes

Aligning models with human preferences using reinforcement learning

Preference alignment refines model behavior using reinforcement learning, evolving from complex RLHF with reward models to simpler methods like DPO.

#5about 2 minutes

Using frameworks like NeMo RL to simplify model alignment

Frameworks like the open-source NeMo RL abstract away the complexity of implementing advanced alignment algorithms like reinforcement learning.

Understanding the core capabilities of large language models

02:26 MIN

Understanding the core capabilities of large language models

Data Privacy in LLMs: Challenges and Best Practices

Introducing InstructLab for accessible LLM fine-tuning

01:12 MIN

Introducing InstructLab for accessible LLM fine-tuning

Unlocking the Power of AI: Accessible Language Model Tuning for All

How large language models are trained

08:05 MIN

How large language models are trained

Inside the Mind of an LLM

Addressing the core challenges of large language models

05:18 MIN

Addressing the core challenges of large language models

Accelerating GenAI Development: Harnessing Astra DB Vector Store and Langflow for LLM-Powered Apps

Addressing the key challenges of large language models

02:55 MIN

Addressing the key challenges of large language models

Large Language Models ❤️ Knowledge Graphs

Using large language models as a learning tool

03:42 MIN

Using large language models as a learning tool

Google Gemini: Open Source and Deep Thinking Models - Sam Witteveen

The training process of large language models

02:21 MIN

The training process of large language models

Google Gemini: Open Source and Deep Thinking Models - Sam Witteveen

Using RAG to enrich LLMs with proprietary data

03:17 MIN

Using RAG to enrich LLMs with proprietary data

RAG like a hero with Docling

Featured Partners

Inside the Mind of an LLM

Inside the Mind of an LLM

Emanuele Fabbiani

about 7 months ago • World Congress 2025

Unlocking the Power of AI: Accessible Language Model Tuning for All

Unlocking the Power of AI: Accessible Language Model Tuning for All

Cedric Clyburn & Legare Kerrison

about 2 years ago • World Congress 2024

LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices

LLMOps-driven fine-tuning, evaluation, and inference with NVIDIA NIM & NeMo Microservices

Anshul Jindal

about 7 months ago • World Congress 2025

Self-Hosted LLMs: From Zero to Inference

Self-Hosted LLMs: From Zero to Inference

Roberto Carratalá & Cedric Clyburn

about 7 months ago • World Congress 2025

Creating Industry ready solutions with LLM Models

Creating Industry ready solutions with LLM Models

Vijay Krishan Gupta & Gauravdeep Singh Lotey

about 2 years ago • WeAreDevelopers LIVE

Large Language Models ❤️ Knowledge Graphs

Large Language Models ❤️ Knowledge Graphs

Michael Hunger

about 2 years ago • World Congress 2024

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

about 2 years ago • WeAreDevelopers LIVE

Exploring LLMs across clouds

Exploring LLMs across clouds

Tomislav Tipurić

about 7 months ago • World Congress 2025

Related Articles

View all articles

Luis Minvielle

What Are Large Language Models?

Developers and writers can finally agree on one thing: Large Language Models, the subset of AIs that drive ChatGPT and its competitors, are stunning tech creations. Developers enjoying the likes of GitHub Copilot know the feeling: this new kind of te...

What Are Large Language Models?

Benedikt Bischof

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

Welcome to this issue of the WeAreDevelopers Live Talk series. This article recaps an interesting talk by Bas Geerdink who gave advice on MLOps.‍About the speaker:‍Bas is a programmer, scientist, and IT manager. At ING, he is responsible for the Fast...

MLops – Deploying, Maintaining And Evolving Machine Learning Models in Production

Krissy Davis

The Best Large Language Models on The Market

Large language models are sophisticated programs that enable machines to comprehend and generate human-like text. They have been the foundation of natural language processing for almost a decade. Although generative AI has only recently gained popula...

The Best Large Language Models on The Market

Daniel Cranney

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

IntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...

Panel Discussion: Responsible AI in Practice - Real-World Examples and Challenges

From learning to earning

Jobs that call for the skills explored in this talk.

Machine Learning Engineer (m/f/d)

evoila Frankfurt GmbH
Mainz, Germany

Senior

Keras

DevOps

Tensorflow

Machine Learning Expert (Llms)

NLP People
Municipality of Madrid, Spain

Senior

Machine Learning

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

ML Data Engineer - Object Detection & Active Learning

autonomous-teaming

Remote

NoSQL

NumPy

Pandas

Docker

Senior Machine Learning Engineer - LLM & Reinforcement Learning

Warmwind

€240K

Senior

PyTorch

Tensorflow

Machine Learning

Machine Learning Engineer with Reinforcement Learning Expertise

Warmwind

€120K

PyTorch

Tensorflow

Machine Learning

AI & Embedded ML Engineer (Real-Time Edge Optimization)

autonomous-teaming

Remote

GIT

Linux

PyTorch

Machine Learning Ops Engineer

Understanding Recruitment Group
Barcelona, Spain

Remote

Node.js

Computer Vision

Machine Learning

Data Engineering

Mirelo AI

Machine Learning