A node fails. Your pod is rescheduled in seconds, but it never starts. Discover the hidden storage problem in Kubernetes failover.
#1about 3 minutes
A DBA's journey to running SQL Server on Kubernetes
The speaker shares his background as a SQL Server DBA and the efficiency gains that led him to explore running stateful workloads in Kubernetes.
#2about 2 minutes
Why the default five-minute failover is unacceptable
When a node fails, Kubernetes waits five minutes by default before rescheduling pods, which is too long for stateful applications like SQL Server.
#3about 5 minutes
Demonstrating the default pod eviction delay in action
A live demo shows an nginx pod taking five minutes to be rescheduled to a healthy node after its original node is shut down in AKS.
#4about 2 minutes
How to configure faster pod eviction with tolerations
Pod eviction time can be reduced from five minutes to seconds by setting tolerations for not-ready and unreachable nodes in the deployment YAML.
#5about 3 minutes
Demo of a ten-second failover using tolerations
By adding tolerations to the nginx deployment, a new pod is spun up on a healthy node just ten seconds after the original node fails.
#6about 7 minutes
Why fast pod eviction fails for stateful apps
A demo with SQL Server shows that even with tolerations, the new pod gets stuck in a "ContainerCreating" state due to a multi-attach error on the persistent volume.
#7about 7 minutes
Achieving high availability with Portworx storage
A third-party tool called Portworx provides a storage class that correctly detaches and reattaches storage, enabling a SQL Server pod to fail over successfully in seconds.
#8about 1 minute
Key considerations for stateful app high availability
Achieving high availability for stateful apps in Kubernetes requires adjusting pod tolerations and using a storage solution that can handle volume reattachment across nodes.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
03:46 MIN
Understanding stateful application failures in Kubernetes
Databases on Kubernetes: Why you should care
03:37 MIN
Addressing unique data protection challenges in Kubernetes
It's all about the Data
02:05 MIN
Adopting Kubernetes for multi-datacenter redundancy
Scaling: from 0 to 20 million users
01:09 MIN
Key takeaways for running databases on Kubernetes
Databases on Kubernetes: Why you should care
03:41 MIN
How Kubernetes handles persistent application storage
It's all about the Data
05:36 MIN
Choosing the right storage for databases on Kubernetes
Databases on Kubernetes
05:02 MIN
Migrating a stateful application between Kubernetes clusters
It's all about the Data
02:45 MIN
Understanding the challenges of scaling Kubernetes with confidence
5 steps for running a Kubernetes environment at scale
Learning Kubernetes made easy with KubeCampusLearning to use Kubernetes? KubeCampus by Kasten offers free educational content for all skill levels to get you started!Kubernetes is an open-source system for deploying, scaling and managing containerized applications. It allows you to deploy your ...
Chris Heilmann
WWC24 Talk - Brenda Romero - Stay: Surviving and Thriving in TechBrenda Romero discusses her tech career journey, overcoming burnout, and inspiring future game developers at WWC24.Here is what she had to say in the video:Hey everyone! Thanks for joining us!Reflections on a Rough YearLast year, I gave a talk about ...
How Microsoft worked around a Git limitation to shrink a repository by 94%Imagine that you are responsible for a Git repository with 1000 users, and 20 million lines of code. You struggle to keep up with constant pull requests but the biggest problem is that the Git file size of the repository is mushrooming to over 170GB ...
From learning to earning
Jobs that call for the skills explored in this talk.