BDI Codemonkeys: Scaling pandas workloads with Dask
Dask is a framework for distributed computing that can scale pandas workloads in a distributed environment. This talk will start with an overview of the pandas 2.0 release and where pandas is headed in the future before diving into Dask DataFrames. Dask DataFrames offer an API that is based on the pandas API. It can jump in when pandas struggles with the size of your data through parallelizing the computations over a cluster with many workers. We will look at the core concepts of Dask DataFrames based on some examples.
Date: 10 October 2023, 12:00 (Tuesday, 1st week, Michaelmas 2023)
Venue: Big Data Institute, Old Road Campus OX3 7LF
Venue Details: Seminar room 1
Speaker: Patrick Höfler (Coiled)
Organising department: Big Data Institute (NDPH)
Part of: BDI seminars
Booking required?: Not required
Audience: Members of the University only
Editor: Graham Bagley