Welcome back, guys. This is the Day 3 of the "MY 7Day journey to Data Science" blog series.In this blog, you are going to learn about a datatype in Pandas library called "Series".
So first thing, what is Pandas?
Pandas is a python library that is mostly used for data handling, data manipulation, and cleaning kinds of tasks. Till now Pandas is one of the most preferred libraries for every data science domain employee.
Pandas have two important datatypes for holding array values. They are:
Series
Dataframes
Series:
INDEX:
Declaring a Series in Pandas
Accessing values
Boolean series
A series in pandas looks like a two-column table we use in excel. Where by default the first column is the index position of the element.
For importing pandas
import pandas
You can declare a series like,
a=pandas.Series([10,20,30,40])
Here the values we specify will be stored in the series datatype.
If we need to see the values inside the series,
a
//If we give only the name of series as input then This will give the output like
0 10
1 20
2 30
3 40
----------------------------------------------------------------------
a.values()
//Just returns the values in array kindof format
array([10,20,30,40])
We can also specify a name for the series for user convenience,using name method like,
a.name="SAMPLE"
If we need the total number of elements present in the series we can use,
a.size()
Accessing values:
We can access the values based on the index as we always do in most of the other datatypes like,
a[0] //For single values
a[0,5] //For multiple values
a[0:5] //For values between a certain range
This is the traditional method we use to access the values, this will also work here.But in Series, there is a special way of defining the names for each index position such as
a.index=['First','Second','Third','Fourth']
This can also be done at the stage of Series creation such as
a=pandas.Series({'First':1,'Second':2,'Third':3,'Fourth':4})
By changing the index name of the default such as (0 to n-1)to this user-defined name,it will be useful and easy for retrieving datas.We can do it like,
a['First'] //For single values
a['First','Fourth'] //For multiple values
a['First':'Fourth'] //For values between a certain range
Now if we need to find an element value, we can use its names instead of index positions.
But after naming the index positions, if we try like
a[0]
it will not work in most compilers and machines. Instead of this we can use the "iloc" keyword like,
a.iloc[0] /iloc means the index location
In general, python if we need to give
a[0:5]
the element in the 5th position will not be included. it only returns up to 4th element. But in pandas, it returns with the 5th element included.
output
0,1,2,3,4 ///in general python
0,1,2,3,4,5 //in pandas
Boolean series:
As same as Numpy Boolean arrays, Pandas also returns the condition selection values in the form of Boolean series.
a>3
//If there is five elements in the series a and only the last element satisfies this condition then the output will be like
0 False
1 False
2 False
3 False
4 True
But if we need the values instead of boolean series we can give the same command as
a[a>3]