The Tech Platform

Sep 9, 20202 min

Introduction to PANDAS

Updated: Mar 14, 2023

Pandas stand for Panel Data; it originated with the idea of Panel Data which means Mathematical Methods for Multidimensional Data.

Pandas is an open-source data manipulation library in Python that provides high-performance, easy-to-use data structures and data analysis tools for working with structured data. Pandas is built on top of NumPy and is widely used in data science, machine learning, and finance.

Pandas provides two primary classes for working with data:

  1. Series: A one-dimensional labeled array that can hold any data type (integer, float, string, etc.).

  2. DataFrame: A two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table.

Pandas also provides many useful functions and methods for working with data, including:

  • Reading and writing data to and from various file formats (CSV, Excel, SQL, etc.)

  • Data selection, filtering, and manipulation

  • Descriptive statistics, aggregation, and summarization

  • Handling missing data

  • Data visualization
     

To use Pandas, you first need to install it using pip or conda. Here's an example of how to import Pandas and create a simple DataFrame:

import pandas as pd
 

 
data = {'Name': ['John', 'Mary', 'Peter', 'Lisa'],
 
'Age': [25, 30, 35, 40],
 
'City': ['New York', 'Paris', 'London', 'Berlin']}
 

 
df = pd.DataFrame(data)
 
print(df)
 

This will output:

Name Age City
 
0 John 25 New York
 
1 Mary 30 Paris
 
2 Peter 35 London
 
3 Lisa 40 Berlin
 

As you can see, Pandas makes it easy to create, manipulate, and analyze structured data in Python.

What is a Series?

Series is a one-dimensional labeled array that can hold any data type, such as integers, floats, strings, etc. It is similar to a column in a table or a spreadsheet.

A Series consists of two arrays: the actual data and the index. The index is a set of labels that uniquely identify each element in the data array. The index can be any type of data, but it is usually a sequence of integers or strings.

In layman terms, it is a list of contents or you can say 1D array, which looks like:

Creating a simple Series (as shown in the picture above)

import pandas as pd    
 

 
series_data = [10, 'a''19462''Alpha',0.255,'B',90,'Beta','zzzz',10000]    
 
series_output = pd.Series(series_data)    
 
print(series_output)   

Output

Creating a Series from Numpy Array

import pandas as pd    
 
import numpy as np    
 

 
list_of_series = np.array(['alpha',58911,'b',0.525])    
 
series_example = pd.Series(list_of_series)    
 
print(series_example)   

Output

What is a DataFrame?

a DataFrame is a two-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet or SQL table. It is the most commonly used Pandas object and provides a powerful and flexible way to work with structured data.

A DataFrame consists of three components: the data, the index, and the columns. The data is a two-dimensional ndarray (NumPy array) or another DataFrame. The index is a set of labels that uniquely identify each row in the data. The columns are a set of labels that uniquely identify each column in the data.

In layman terms, it is a table of contents (Collection of Series), which looks like the following:

Creating a simple DataFrame (As in the picture above)

import pandas as pd    
 

 
data = [[85,60,90,95],[73,80,64,87],[98,58,74,92]]    
 
df = pd.DataFrame(data,columns=['English','Math','Science','French'],    
 
                             index=['2018','2019','2020'])    
 
print(df)   

Output

Creating a DataFrame from Numpy Array

import pandas as pd    
 
import numpy as np    
 

 
list_of_dataframe= [[1.055,'beta'],['a',4]]    
 
df= pd.DataFrame(list_of_dataframe)    
 
print("Dataframe:")    
 
print(df)  

Output

Creating a Numpy Array from Pandas (Series/DataFrame)

import pandas as pd    
 
import numpy as np    
 

 
list_of_series= (['alpha',58911,'b',0.525])    
 
s= pd.Series(list_of_series)    
 
numpy_array=np.array(s)    
 
print("Numpy Array from Series:")    
 
print(numpy_array)    
 
print("\n")    
 

 
list_of_dataframe= [[1.055,'beta'],['a',4]]    
 
df= pd.DataFrame(list_of_dataframe)    
 
numpy_array=np.array(df)    
 
print("Numpy Array from DataFrame:")    
 
print(numpy_array)    

Output

Conclusion

In this article, we discussed Pandas and learned what is Pandas, what is a Series, what are DataFrames, and how to install/import Pandas. This article was just to give you a basic idea.

    0