# Top 38 Python Libraries for Data Science, Data Visualization & Machine Learning

The categories included in this post, which we see as taking into account common data science libraries — those likely to be used by practitioners in the data science space for generalized, non-neural network, non-research work — are:

**Data**- libraries for the management, manipulation, and other processing of data**Math**- while many libraries perform mathematical tasks, this small collection does so exclusively**Machine learning**- self explanatory; excludes libraries primarily meant for building neural networks or for automating machine learning processes**Automated machine learning**- libraries that primarily function to automate processes related to machine learning**Data visualization**- libraries that primarily serve a function related to visualizing data, as opposed to modeling, preprocessing, etc.**Explanation & exploration**- libraries primarily for exploring and explaining models or data

__Best Python Libraries for: Data__

**1. Apache Spark**

Apache Spark - A unified analytics engine for large-scale data processing

**2. Pandas**
Pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

**3. Dask**

Parallel computing with task scheduling

__Best Python Libraries For: Math__

**4. Scipy**

SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

**5. Numpy**

The fundamental package for scientific computing with Python.

__Best Python Libraries For: Machine Learning__

**6. Scikit-Learn**

Scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.

**7. XGBoost**

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow

**8. LightGBM**

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

**9. Catboost**

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

**10. Dlib**

Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. Can be used with Python via dlib API

**11. Annoy**

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

**12. H20ai**

Open Source Fast Scalable Machine Learning Platform For Smarter Applications: Deep Learning, Gradient Boosting & XGBoost, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

**13. StatsModels**

Statsmodels: statistical modeling and econometrics in Python

**14. mlpack**

mlpack is an intuitive, fast, and flexible C++ machine learning library with bindings to other languages

**15. Pattern**

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

**16. Prophet**

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

__Best Python Libraries For: Automated Machine Learning__

**17. TPOT**

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

**18. auto-sklearn**

auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.

**19. Hyperopt-sklearn**

Hyperopt-sklearn is Hyperopt-based model selection among machine learning algorithms in scikit-learn.

**20. SMAC-3**

Sequential Model-based Algorithm Configuration

**21. scikit-optimize**

Scikit-Optimize, or skopt, is a simple and efficient library to minimize (very) expensive and noisy black-box

functions. It implements several methods for sequential model-based optimization.

**22. Nevergrad**

A Python toolbox for performing gradient-free optimization

**23. Optuna**

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning.

__Best Python Libraries For: Data Visualization__

**24. Apache Superset**

Apache Superset is a Data Visualization and Data Exploration Platform

**25. Matplotlib**

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

**26. Plotly**

Plotly.py is an interactive, open-source, and browser-based graphing library for Python

**27. Seaborn**

Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.

**28. folium**

Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in a Leaflet map via folium.

**29. Bqplot**

Bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar of Graphics.

**30. VisPy**

VisPy is a high-performance interactive 2D/3D data visualization library. VisPy leverages the computational power of modern Graphics Processing Units (GPUs) through the OpenGL library to display very large datasets. Applications of VisPy include:

**31. PyQtgraph**

Fast data visualization and GUI tools for scientific / engineering applications

**32. Bokeh**

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets.

**33. Altair**

Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understanding your data and its meaning.

__Best Python Libraries For: Explanation & Exploration__

**34. eli5**

A library for debugging/inspecting machine learning classifiers and explaining their predictions

**35. LIME**

Lime: Explaining the predictions of any machine learning classifier

**36. SHAP**

A game theoretic approach to explain the output of any machine learning model.

**37. YellowBrick**

Visual analysis and diagnostic tools to facilitate machine learning model selection.

**38. pandas-profiling**

Create HTML profiling reports from pandas DataFrame objects

**Source**; KDNugget

The Tech Platform