How to analyze data with Numpy
In this post, I will discuss the details of the Numpy library’s features and applications in data science. Code samples will be shown to illustrate specific techniques.
What is Data Science
Data science combines many fields, including statistics, scientific methods, and data analysis, to extract value from data.
Data: An underutilized resource for machine learning Data science is one of the most exciting areas of today. But why is it so important?
Because companies sit on a data treasury. Data volumes have exploded as modern technology enables more and more information to be created and stored. It is estimated that 90 percent of the data in the world has been created in the last two years. For example, Facebook users upload 10 million photos every hour.
The wealth of data these technologies collect and store can offer transformational benefits to organizations and societies around the world. However, for this, we need to be able to interpret the data. This is where data science comes into play.
The Relationship between Python and Data Science
Today, when it comes to data science, the first programming language that comes to mind is Python. The fact that it is easy to learn, easy to read, easy to maintain, and has many data science libraries has made the use of the language quite widespread.
As its popularity and use increase, its existing libraries are further developed and new libraries are also added. In this article, I will make a quick but detailed introduction to the Python language from a data science perspective, give information about NumPy, one of the important packages, and make examples.
There are tailor-made situations where it is the best data science tool for the job. It is perfect when data analysis tasks involve integration with web apps or when there is a need to incorporate statistical code into the production database. The full-fledged programming nature of Python makes it a perfect fit for implementing algorithms.
Its packages rooted for specific data science jobs. Packages like NumPy, SciPy, and pandas produce good results for data analysis jobs. While there is a need for graphics, Python’s matplotlib emerges as a good package, and for machine learning tasks, scikit-learn becomes the ideal alternate.
There are many reasons for choosing this powerful programming language, so it is up to you which reason is the main one. I will now list some of these reasons below:
-Useful Libraries and Framework(Numpy,Pandas,SciPy,Matplotlib) -Flexibility(Scalability) -Easy Web Development(Django,Flask) -Powerful Community -Automation -Graphics and Visualization and so on
The Numpy library got its name from the abbreviations for Numerical Python. The most important feature of the library is that it offers arrays for fast mathematical operations. Using the Numpy library, which works very fast compared to the list data structure in Python’s own data structure; It is possible to perform many mathematical operations such as random number generation, matrix multiplication, and linear algebra operations.
It provides high-level math functions along with data manipulations. For library use; Examples such as creating a multidimensional array, calculating the difference/sum/product of arrays, finding the indices of the elements larger than a certain value, and printing the elements on the screen will be discussed in this article.
import numpy as np # 1 dimensional x = np.array([1,3,5]) # 2 dimensional y = np.array([(2,4,6),(8,10,12)]) x = np.arange(8) >>> array([0, 1, 2, 3, 4, 5, 6, 7]) y = np.arange(4.0) >>> array([ 0., 1., 2., 3.]) x = np.arange(3,7) >>> array([3, 4, 5, 6]) y = np.arange(1,8,2) >>> array([1, 3, 5, 7])
import numpy as np # Sort sorts in ascending order y = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1]) y.sort() print(y) >>> [ 1 2 3 4 5 6 7 8 9 ]
import numpy as np# Append items to array m = np.array([(10, 20, 30),(40, 50, 60)]) n = np.append(m, [(70, 80, 90)]) print(n) >>> [10 20 30 40 50 60 70 80 90] # Remove index 2 from previous array print(np.delete(n, 20)) >>> [10 20 40 50 60 70 80 90]
# Split array into groups of ~3 a = np.array([1, 2, 3, 4, 5, 6, 7, 8]) print(np.array_split(a, 3)) >>> [array([1, 2, 3]), array([4, 5, 6]), array([7, 8])]
# Using comparison operators will create boolean NumPy arrays w = np.array([1, 2, 3, 4, 5, 6, 7, 8]) c = w< 6 print(c) >>> [ True True True True True False False False ]
# Statistics of an array a = np.array([1, 1, 2, 5, 8, 10, 11, 12])# Median print(np.median(a)) >>> 6.5# Standard deviation print(np.std(a)) >>> 4.2938910093294167
7-Slicing and Subsetting:
b = np.array([(1, 2, 3), (4, 5, 6)]) # The index *before* the comma refers to *rows*, # the index *after* the comma refers to *columns* print(b[0:1, 2]) >>>  print(b[:len(b), 2]) >>> [3 6] print(b[0, :]) >>> [1 2 3] print(b[0, 2:]) >>>  print(b[:, 0]) >>> [1 4] c = np.array([(1, 2, 3), (4, 5, 6)]) d = c[1:2, 0:2] print(d) >>> [[4 5]]
# If a 1d array is added to a 2d array (or the other way), NumPy # chooses the array with smaller dimension and adds it to the one # with bigger dimension a = np.array([1, 2, 3]) b = np.array([(1, 2, 3), (4, 5, 6)]) print(np.add(a, b)) >>> [[2 4 6] [5 7 9]] # Example of np.roots # Consider a polynomial function (x-1)^2 = x^2 - 2*x + 1 # Whose roots are 1,1 >>> np.roots([1,-2,1]) array([1., 1.])# Similarly x^2 - 4 = 0 has roots as x=±2 >>> np.roots([1,0,-4]) array([-2., 2.])
As you can see, the Numpy library has many advantages for data analysis. Now, I hope, you are familiar with the use of NumPy arrays and ready to incorporate them into your daily analysis tasks.