This notebook is an exercise in the Data Visualization course. You can reference the tutorial at this link.


In this exercise, you will use your new knowledge to propose a solution to a real-world scenario. To succeed, you will need to import data into Python, answer questions using the data, and generate scatter plots to understand patterns in the data.

Scenario

You work for a major candy producer, and your goal is to write a report that your company can use to guide the design of its next product. Soon after starting your research, you stumble across this very interesting dataset containing results from a fun survey to crowdsource favorite candies.

Setup

Run the next cell to import and configure the Python libraries that you need to complete the exercise.

The questions below will give you feedback on your work. Run the following cell to set up our feedback system.

Step 1: Load the Data

Read the candy data file into candy_data. Use the "id" column to label the rows.

Step 2: Review the data

Use a Python command to print the first five rows of the data.

The dataset contains 83 rows, where each corresponds to a different candy bar. There are 13 columns:

Use the first five rows of the data to answer the questions below.

Step 3: The role of sugar

Do people tend to prefer candies with higher sugar content?

Part A

Create a scatter plot that shows the relationship between 'sugarpercent' (on the horizontal x-axis) and 'winpercent' (on the vertical y-axis). Don't add a regression line just yet -- you'll do that in the next step!

Part B

Does the scatter plot show a strong correlation between the two variables? If so, are candies with more sugar relatively more or less popular with the survey respondents?

Step 4: Take a closer look

Part A

Create the same scatter plot you created in Step 3, but now with a regression line!

Part B

According to the plot above, is there a slight correlation between 'winpercent' and 'sugarpercent'? What does this tell you about the candy that people tend to prefer?

Step 5: Chocolate!

In the code cell below, create a scatter plot to show the relationship between 'pricepercent' (on the horizontal x-axis) and 'winpercent' (on the vertical y-axis). Use the 'chocolate' column to color-code the points. Don't add any regression lines just yet -- you'll do that in the next step!

Can you see any interesting patterns in the scatter plot? We'll investigate this plot further by adding regression lines in the next step!

Step 6: Investigate chocolate

Part A

Create the same scatter plot you created in Step 5, but now with two regression lines, corresponding to (1) chocolate candies and (2) candies without chocolate.

Part B

Using the regression lines, what conclusions can you draw about the effects of chocolate and price on candy popularity?

Step 7: Everybody loves chocolate.

Part A

Create a categorical scatter plot to highlight the relationship between 'chocolate' and 'winpercent'. Put 'chocolate' on the (horizontal) x-axis, and 'winpercent' on the (vertical) y-axis.

Part B

You decide to dedicate a section of your report to the fact that chocolate candies tend to be more popular than candies without chocolate. Which plot is more appropriate to tell this story: the plot from Step 6, or the plot from Step 7?

Keep going

Explore histograms and density plots.


Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.