This notebook is an exercise in the Data Cleaning course. You can reference the tutorial at this link.


In this exercise, you'll apply what you learned in the Scaling and normalization tutorial.

Setup

The questions below will give you feedback on your work. Run the following cell to set up the feedback system.

Get our environment set up

To practice scaling and normalization, we're going to use a dataset of Kickstarter campaigns. (Kickstarter is a website where people can ask people to invest in various projects and concept products.)

The next code cell loads in the libraries and dataset we'll be using.

Let's start by scaling the goals of each campaign, which is how much money they were asking for. The plots show a histogram of the values in the "usd_goal_real" column, both before and after scaling.

After scaling, all values lie between 0 and 1 (you can read this in the horizontal axis of the second plot above, and we verify in the code cell below).

1) Practice scaling

We just scaled the "usd_goal_real" column. What about the "goal" column?

Begin by running the code cell below to create a DataFrame original_goal_data containing the "goal" column.

Use original_goal_data to create a new DataFrame scaled_goal_data with values scaled between 0 and 1. You must use the minmax_scaling() function.

2) Practice normalization

Now you'll practice normalization. We begin by normalizing the amount of money pledged to each campaign.

It's not perfect (it looks like a lot pledges got very few pledges) but it is much closer to a normal distribution!

We used the "usd_pledged_real" column. Follow the same process to normalize the "pledged" column.

How does the normalized "usd_pledged_real" column look different from when we normalized the "pledged" column? Or, do they look mostly the same?

Once you have an answer, run the code cell below.

(Optional) More practice

Try finding a new dataset and pretend you're preparing to perform a regression analysis.

These datasets are a good start!

Pick three or four variables and decide if you need to normalize or scale any of them and, if you think you should, practice applying the correct technique.

Keep going

In the next lesson, learn how to parse dates in a dataset.


Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.