Pandas is the most popular open-source Python library.
It is mainly used for Data Analysis.
This is a High-Level Data Manipulation Tool.
Pandas deals with data structures: Series and DataFrame
A one-dimensional data structure is called a Series.
A multi-dimensional data structure is called a DataFrame.
These data structures are built on the NumPy package (You may refer to my article about NumPy ).
The Key Data structure is DATAFRAME (Tabular Data).
Data in Pandas is often used to feed statistical analysis in SciPy (Refer to my article about SciPy), or plotting functions from Matplotlib(Refer to my article about MatplotLib).
Installing Pandas: pip install pandas
After installation, you need to import the package as well to start using it. import pandas as pd
Pandas is used to handle missing data, merging, concatenate, and reshaping the data, etc.
Did You Know??
Pandas stand for Panel Data; it was originated with this idea of Panel Data which means Mathematical Methods for Multidimensional Data.
Panel is a 3D labeled array; this is also one of the data structures in Pandas but rarely used.
What is a Series?
It is a one-dimensional labeled array, which returns an object in the form of a list. You can say that a Series is a type of list.
It contains a single axis index
So, series can only contain a list with an index.
Series are size immutable (You cannot change their sizes)
In layman terms, it is a list of contents or you can say 1D array, which looks like:
Creating a simple Series (as shown in the picture above)
import pandas as pd series_data = [10, 'a', '19462', 'Alpha',0.255,'B',90,'Beta','zzzz',10000] series_output = pd.Series(series_data) print(series_output)
Creating a Series from Numpy Array
import pandas as pd import numpy as np list_of_series = np.array(['alpha',58911,'b',0.525]) series_example = pd.Series(list_of_series) print(series_example)
What is a DataFrame?
A DataFrame is a 2D labeled array, which stores an ordered collection column. These columns can store data of different types.
It contains 2 axes or indexes; row index and column index.
DataFrames are size mutable (You can change their sizes)
In layman terms, it is a table of contents (Collection of Series), which looks like the following:
Creating a simple DataFrame (As in the picture above)
import pandas as pd data = [[85,60,90,95],[73,80,64,87],[98,58,74,92]] df = pd.DataFrame(data,columns=['English','Math','Science','French'], index=['2018','2019','2020']) print(df)
Creating a DataFrame from Numpy Array
import pandas as pd import numpy as np list_of_dataframe= [[1.055,'beta'],['a',4]] df= pd.DataFrame(list_of_dataframe) print("Dataframe:") print(df)
Creating a Numpy Array from Pandas (Series/DataFrame)
import pandas as pd import numpy as np list_of_series= (['alpha',58911,'b',0.525]) s= pd.Series(list_of_series) numpy_array=np.array(s) print("Numpy Array from Series:") print(numpy_array) print("\n") list_of_dataframe= [[1.055,'beta'],['a',4]] df= pd.DataFrame(list_of_dataframe) numpy_array=np.array(df) print("Numpy Array from DataFrame:") print(numpy_array)
In this article, we discussed Pandas and learned what is Pandas, what is a Series, what are DataFrames, and how to install/import Pandas. This article was just to give you a basic idea. If you still have any issues regarding creating a DataFrame or anything else, don't worry! In my next article, we will learn about “Creating DataFrames” in different ways and will go deeper to learn more about DataFrames.
Feedback or queries related to this article are most welcome.
Thanks for reading.
Source: C# Corner