
Getting Started With PANDAS
Introduction
Pandas is the most popular open-source Python library.
It is mainly used for Data Analysis.
This is a High-Level Data Manipulation Tool.
Pandas deals with data structures: Series and DataFrame
A one-dimensional data structure is called a Series.
A multi-dimensional data structure is called a DataFrame.
These data structures are built on the NumPy package (You may refer to my article about NumPy ).
The Key Data structure is DATAFRAME (Tabular Data).
Data in Pandas is often used to feed statistical analysis in SciPy (Refer to my article about SciPy), or plotting functions from Matplotlib(Refer to my article about MatplotLib).
Installing Pandas: pip install pandas
After installation, you need to import the package as well to start using it. import pandas as pd
Pandas is used to handle missing data, merging, concatenate, and reshaping the data, etc.
Did You Know??
Pandas stand for Panel Data; it was originated with this idea of Panel Data which means Mathematical Methods for Multidimensional Data.
Panel is a 3D labeled array; this is also one of the data structures in Pandas but rarely used.
What is a Series?
It is a one-dimensional labeled array, which returns an object in the form of a list. You can say that a Series is a type of list.
It contains a single axis index
So, series can only contain a list with an index.
Series are size immutable (You cannot change their sizes)
In layman terms, it is a list of contents or you can say 1D array, which looks like:
Creating a simple Series (as shown in the picture above)
import pandas as pd
series_data = [10, 'a', '19462', 'Alpha',0.255,'B',90,'Beta','zzzz',10000]
series_output = pd.Series(series_data)
print