PyCon Canada 2019

RAPIDS and cuDF: accelerating DataFrames on GPUs

by Ashwin Srinath and Keith Kraus

Machine Learning & Data Science

The Python data science stack is composed of a rich set of powerful libraries that work wonderfully well together, providing coherent, beautiful, Pythonic APIs that let the Data Scientist think less about programming and more about the data. However, many of these libraries are largely single_threaded (e.g., Pandas, Scikit-Learn), and as data workflows grow larger, they quickly run up against this limitation. RAPIDS is a suite of open-source libraries that provide APIs nearly identical to existing popular Python libraries. By leveraging the massively parallel processing capabilities of GPUs, RAPIDS libraries can provide speedups of 50x or more over their purely-CPU counterparts. cuDF is a GPU DataFrame library following the Pandas API. cuML is a GPU Machine Learning library following the Scikit-Learn API. cuGraph is a GPU Graph Analytics library with an API inspired by NetworkX. This talk will provide an overview of the RAPIDS ecosystem, with a focus on the cuDF library, its features and design. We'll show how cuDF combines the use of Numba, Cython, modern C++, CUDA, and Apache Arrow to build a highly performant DataFrame library that is also highly interoperable with other libraries in the PyData ecosystem. We'll show examples of workflows using cuDF both on a single GPU, and across multiple GPUs in conjunction with the Dask library. We'll also share some performance results, best practices, tips, and tricks.

About the Author

Ashwin Srinath is a Pythonista and Software Engineer at NVIDIA. He is part of the RAPIDS team, developing Python libraries for GPU-accelerated data science. He is also an enthusiastic teacher of Python as part of communities such as Software Carpentry. Keith Kraus is a Manager in the AI Infrastructure team at NVIDIA in the greater New York City area. Keith is a maintainer and lead developer on cuDF, as well as a contributor to other RAPIDS libraries. He works extensively on the Python interface, API design, distributed computation architecture, and big data integration. Prior to working for NVIDIA, Keith worked in cybersecurity focused on building a GPU-accelerated big data solution for advanced threat detection. Keith holds a Masters of Engineering in networked information systems from Stevens Institute of Technology.

Talk Details

Date: Saturday Nov. 16

Location: Sky Room

Begin time: 11:15

Duration: 25 minutes

If you are the author of this talk and want to make an edit, feel free to send us a PR!