Getting Started With PANDAS

Introduction

  • Pandas is the most popular open-source Python library.

  • It is mainly used for Data Analysis.

  • This is a High-Level Data Manipulation Tool.

  • Pandas deals with data structures: Series and DataFrame

  • A one-dimensional data structure is called a Series.

  • A multi-dimensional data structure is called a DataFrame.

  • These data structures are built on the NumPy package (You may refer to my article about NumPy ).

  • The Key Data structure is DATAFRAME (Tabular Data).

  • Data in Pandas is often used to feed statistical analysis in SciPy (Refer to my article about SciPy), or plotting functions from Matplotlib(Refer to my article about MatplotLib).

  • Installing Pandas: pip install pandas

  • After installation, you need to import the package as well to start using it. import pandas as pd

  • Pandas is used to handle missing data, merging, concatenate, and reshaping the data, etc.


Did You Know??

Pandas stand for Panel Data; it was originated with this idea of Panel Data which means Mathematical Methods for Multidimensional Data.

Panel is a 3D labeled array; this is also one of the data structures in Pandas but rarely used.

What is a Series?

  • It is a one-dimensional labeled array, which returns an object in the form of a list. You can say that a Series is a type of list.

  • It contains a single axis index

  • So, series can only contain a list with an index.

  • Series are size immutable (You cannot change their sizes)


In layman terms, it is a list of contents or you can say 1D array, which looks like:

Creating a simple Series (as shown in the picture above)

import pandas as pd    
 
series_data = [10, 'a', '19462', 'Alpha',0.255,'B',90,'Beta','zzzz',10000]    
series_output = pd.Series(series_data)    
print