BDI Codemonkeys: Scaling pandas workloads with Dask

Dask is a framework for distributed computing that can scale pandas workloads in a distributed environment. This talk will start with an overview of the pandas 2.0 release and where pandas is headed in the future before diving into Dask DataFrames. Dask DataFrames offer an API that is based on the pandas API. It can jump in when pandas struggles with the size of your data through parallelizing the computations over a cluster with many workers. We will look at the core concepts of Dask DataFrames based on some examples.