Dask (software)

Dask is an open source library for parallel computing written in Python.[2][3] Originally developed by Matthew Rocklin, Dask is a community project maintained and sponsored by developers and organizations.

Dask
Original author(s)Matthew Rocklin
Developer(s)Dask
Initial releaseOctober 28, 2018 (2018-10-28)
Stable release
2.25.0 / August 28, 2020 (2020-08-28)
RepositoryDask Repository
Written inPython[1]
Operating systemLinux, Microsoft Windows, macOS
Available inPython
TypeData analytics
LicenseNew BSD
Websitedask.org

Overview

Dask is a library composed of two parts. It includes a task scheduling component for building dependency graphs and scheduling tasks. Second, it includes the distributed data structures with APIs similar to Pandas Dataframes or NumPy arrays. Dask has a variety of use cases and can be run with a single node and scale to thousand node clusters.[4]

References

  1. "Dask: Parallel Computation with Blocked algorithms and Task Scheduling" (PDF). This paper introduces dask, a specification to encode parallel algorithms, using primitive Python dictionaries, tuples, and callables.
  2. Daniel, Jesse C. (2019). Data Science at Scale with Python and Dask. Manning Publications. ISBN 9781617295607.
  3. Rocklin, Matthew (2015). "Dask: Parallel Computation with Blocked algorithms and Task Scheduling". Proceedings of the 14th Python in Science Conference: 126–132. doi:10.25080/Majora-7b98e3ed-013.
  4. https://docs.dask.org/en/latest/


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.