What if your Python code could achieve over 90% of a GPU's theoretical max performance? Learn how NVIDIA is making it possible.
#1about 6 minutes
Understanding the CUDA platform stack for Python developers
The CUDA platform is layered from high-level domain libraries to low-level hardware access, with new tools aiming to combine Python's productivity with GPU performance.
#2about 3 minutes
Improving performance by fusing GPU operations
The nvmath-python library enables kernel fusion using epilogues, which combines multiple operations like matrix multiplication and bias addition into a single GPU kernel launch.
#3about 5 minutes
Calling device-side functions directly from Python kernels
Python kernels can now directly call pre-compiled, high-performance device-side functions from libraries like cuBLAS, enabled by a just-in-time linker called nvJitLink.
#4about 2 minutes
Fine-grained parallelism with cooperative groups in Python
The CUB library is exposed to Python, allowing for cooperative operations and reductions at the block or warp level for fine-grained control over GPU parallelism.
#5about 3 minutes
Accelerating language support with numba-cuda and nupack
The numba-cuda module is separated to accelerate feature delivery, while nupack automatically generates Python bindings for C++ templated code.
#6about 4 minutes
A Pythonic object model for host-side GPU control
A new high-level object model allows Python developers to directly manage GPU resources like devices, contexts, streams, and linker objects without boilerplate code.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
05:12 MIN
Boosting Python performance with the Nvidia CUDA ecosystem
The weekly developer show: Boosting Python with CUDA, CSS Updates & Navigating New Tech Stacks
02:28 MIN
Navigating the CUDA Python software ecosystem
Accelerating Python on GPUs
01:07 MIN
The evolution of GPU programming with Python
Accelerating Python on GPUs
02:34 MIN
Understanding CUDA as a complete computing platform
Coffee with Developers - Stephen Jones - NVIDIA
06:37 MIN
Introducing the CUDA parallel computing platform
Accelerating Python on GPUs
10:18 MIN
A progressive approach to programming GPUs in Python
Accelerating Python on GPUs
01:33 MIN
A look at upcoming Python GPU programming tools
Accelerating Python on GPUs
04:05 MIN
Using NVIDIA libraries to easily accelerate applications
WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA
All the videos of Halfstack London 2024!Last month was Halfstack London, a conference about the web, JavaScript and half a dozen other things. We were there to deliver a talk, but also to record all the sessions and we're happy to share them with you. It took a bit as we had to wait for th...
Benedikt Bischof
Python Basics2021 was a year of celebration for one of the most popular programming languages (according to Stack Overflow) as it turned 30. We are talking about Python of course.Its story begins in 1989 around Christmas and Guido van Rossum thought about a holid...
Luis Minvielle
The 13 Best Python Libraries for Developers in 2025Python still stands as one of the three most popular programming languages because it’s incredibly useful for data scraping, data engineering, and data analysis — meaning non-programmers that are handy with numbers, such as accountants or Economics B...