This notebook is an exercise in the Data Cleaning course. You can reference the tutorial at this link.


In this exercise, you'll apply what you learned in the Inconsistent data entry tutorial.

Setup

The questions below will give you feedback on your work. Run the following cell to set up the feedback system.

Get our environment set up

The first thing we'll need to do is load in the libraries and dataset we'll be using. We use the same dataset from the tutorial.

Next, we'll redo all of the work that we did in the tutorial.

1) Examine another column

Write code below to take a look at all the unique values in the "Graduated from" column.

Do you notice any inconsistencies in the data? Can any of the inconsistencies in the data be fixed by removing white spaces at the beginning and end of cells?

Once you have answered these questions, run the code cell below to get credit for your work.

Si, algunos de los problemas pueden ser resueltos eliminando espacios en blanco al inicio y final de los campos. Por otra parte, tenemos algunos carácteres con una codificación erronea \xa0

2) Do some text pre-processing

Convert every entry in the "Graduated from" column in the professors DataFrame to remove white spaces at the beginning and end of cells.

3) Continue working with countries

In the tutorial, we focused on cleaning up inconsistencies in the "Country" column. Run the code cell below to view the list of unique values that we ended with.

Take another look at the "Country" column and see if there's any more data cleaning we need to do.

It looks like 'usa' and 'usofa' should be the same country. Correct the "Country" column in the dataframe so that 'usofa' appears instead as 'usa'.

Use the most recent version of the DataFrame (with the whitespaces at the beginning and end of cells removed) from question 2.

Congratulations!

Congratulations for completing the Data Cleaning course on Kaggle Learn!

To practice your new skills, you're encouraged to download and investigate some of Kaggle's Datasets.


Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.