This notebook is an exercise in the Data Visualization course. You can reference the tutorial at this link.


In this exercise, you will use your new knowledge to propose a solution to a real-world scenario. To succeed, you will need to import data into Python, answer questions using the data, and generate histograms and density plots to understand patterns in the data.

Scenario

You'll work with a real-world dataset containing information collected from microscopic images of breast cancer tumors, similar to the image below.

ex4_cancer_image

Each tumor has been labeled as either benign (noncancerous) or malignant (cancerous).

To learn more about how this kind of data is used to create intelligent algorithms to classify tumors in medical settings, watch the short video at this link!

Setup

Run the next cell to import and configure the Python libraries that you need to complete the exercise.

The questions below will give you feedback on your work. Run the following cell to set up our feedback system.

Step 1: Load the data

In this step, you will load two data files.

Step 2: Review the data

Use a Python command to print the first 5 rows of the data for benign tumors.

Use a Python command to print the first 5 rows of the data for malignant tumors.

In the datasets, each row corresponds to a different image. Each dataset has 31 different columns, corresponding to:

Use the first 5 rows of the data (for benign and malignant tumors) to answer the questions below.

Step 3: Investigating differences

Part A

Use the code cell below to create two histograms that show the distribution in values for 'Area (mean)' for both benign and malignant tumors. (To permit easy comparison, create a single figure containing both histograms in the code cell below.)

Part B

A researcher approaches you for help with identifying how the 'Area (mean)' column can be used to understand the difference between benign and malignant tumors. Based on the histograms above,

Step 4: A very useful column

Part A

Use the code cell below to create two KDE plots that show the distribution in values for 'Radius (worst)' for both benign and malignant tumors. (To permit easy comparison, create a single figure containing both KDE plots in the code cell below.)

Part B

A hospital has recently started using an algorithm that can diagnose tumors with high accuracy. Given a tumor with a value for 'Radius (worst)' of 25, do you think the algorithm is more likely to classify the tumor as benign or malignant?

Keep going

Review all that you've learned and explore how to further customize your plots in the next tutorial!


Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.