data science,data analsis,python,data cleaning,pandas,deleting/alterin

Hello folks, I'm so happy to see you all in this blog series once again. As you know this is the Day 5 of the "My 7-day journey to Data Science" blog series.Till the previous blog, we came to know about,

What is Numpy?
What is Pandas?
Datatypes and operations in Pandas.

In this blog, we are going to learn about adding and deleting new/existing data in pandas datatypes.

INDEX:

Adding a column
Renaming column
Renaming row
Deleting columns or rows
Immutable characteristic

For adding a column in the existing data frame, we should have a series of values like,

b=pandas.Series(['French','German','Italian'],index=['FIRST','SECOND','THIRD'])

Now if we consider "a" as an existing data frame, then the following code will create a new column named "lang" with the values in the series "b".If there is no column name present for the existing row name, it makes that cell as a "nan(NotaNumber) " and adds the available value.

a['lang']=b

In case if we need to add a new column and set a single value for all the existing rows, then that is also possible like

a['lang2']="ENGLISH"

this creates a new column "lang2" in the data frame "a" and sets all the row values for that column as "ENGLISH".

Also, we can create a new column based on some operations on the existing columns such as,

a['AVG']=a['MARK']/a['SUB']

The above code will create a new column in the data frame called "AVG" by dividing the existing column values "MARK" and "SUB".

Renaming a column:

If we need to change the column name on the existing data frame then that is also possible by rename() as,

a.rename(columns={'REG':'ID'})

This will change the column name in "a" data frame from "REG" to "ID".We can also change more than one column name at a time.

Renaming a row:

As same as this we can also rename the rows such as,

d.rename(index={'First':'F'})

This will change the row name.

Deleting columns and rows:

a.drop('SECOND')

The above code will delete the full row named "SECOND".Also, we can select multiple rows as we always do while retrieving values like,

a.drop('FIRST','THIRD')
a.drop('FIRST':'THIRD')

Like this, we can also delete values based on columns like

a.drop(columns=['MARK'])

This will delete the complete column named "MARK".We can do this for more than one column at a time like,

a[['LANG','LANG2']]

As you noticed here we have used double square brackets. Yes while selecting if we use the comma(,) operator then we have to use it like this.

Immutable characteristic:

Mostly every operation in Pandas are immutable like NumPy.

It means that every operation in pandas creates a new Series or Dataframe without affecting the core values unless we specify it.

a.drop('FIRST')

This will not delete the row value "FIRST" in the data frame "a", Instead it will just create a new copy of "a" data frame and performs the deleting operation on the second element without affecting the core data frame.

If we need that the changes need to be stored in the core data frame then we have to specify it like,

a=a.drop('THIRD')

Whenever we use = symbol, the core values will be changed.

Other socials:

GITHUB

Twitter

MEDIUM

Day 5 - Altering and Deleting data