This notebook is an exercise in the Data Visualization course. You can reference the tutorial at this link.
Now it's time for you to demonstrate your new skills with a project of your own!
In this exercise, you will work with a dataset of your choosing. Once you've selected a dataset, you'll design and create your own plot to tell interesting stories behind the data!
Run the next cell to import and configure the Python libraries that you need to complete the exercise.
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")
Setup Complete
The questions below will give you feedback on your work. Run the following cell to set up the feedback system.
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.data_viz_to_coder.ex7 import *
print("Setup Complete")
Setup Complete
Begin by selecting a CSV dataset from Kaggle Datasets. If you're unsure how to do this or would like to work with your own data, please revisit the instructions in the previous tutorial.
Once you have selected a dataset, click on the [+ Add Data] option in the top right corner. This will generate a pop-up window that you can use to search for your chosen dataset.
Once you have found the dataset, click on the [Add] button to attach it to the notebook. You can check that it was successful by looking at the Data dropdown menu to the right of the notebook -- look for an input folder containing a subfolder that matches the name of the dataset.
You can click on the carat to the left of the name of the dataset to double-check that it contains a CSV file. For instance, the image below shows that the example dataset contains two CSV files: (1) dc-wikia-data.csv, and (2) marvel-wikia-data.csv.
Once you've uploaded a dataset with a CSV file, run the code cell below without changes to receive credit for your work!
# Check for a dataset with a CSV file
dataset = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report-2021.csv', index_col='Country name')
step_1.check()
Correct:
Now that the dataset is attached to the notebook, you can find its filepath. To do this, begin by clicking on the CSV file you'd like to use. This will open the CSV file in a tab below the notebook. You can find the filepath towards the top of this new tab.
After you find the filepath corresponding to your dataset, fill it in as the value for my_filepath
in the code cell below, and run the code cell to check that you've provided a valid filepath. For instance, in the case of this example dataset, we would set
my_filepath = "../input/fivethirtyeight-comic-characters-dataset/dc-wikia-data.csv"
Note that you must enclose the filepath in quotation marks; otherwise, the code will return an error.
Once you've entered the filepath, you can close the tab below the notebook by clicking on the [X] at the top of the tab.
# Fill in the line below: Specify the path of the CSV file to read
my_filepath = '../input/world-happiness-report-2021/world-happiness-report-2021.csv'
# Check for a valid filepath to a CSV file in a dataset
step_2.check()
Correct:
Use the next code cell to load your data file into my_data
. Use the filepath that you specified in the previous step.
# Fill in the line below: Read the file into a variable my_data
filepath = '../input/world-happiness-report-2021/world-happiness-report-2021.csv'
my_data = pd.read_csv(filepath, index_col='Country name')
# Check that a dataset has been uploaded into my_data
step_3.check()
Correct:
After the code cell above is marked correct, run the code cell below without changes to view the first five rows of the data.
# Print the first five rows of the data
colunms = ['Country name', 'Regional indicator', 'Ladder score']
data = pd.read_csv(filepath, index_col='Country name', usecols=colunms)
data.head()
Regional indicator | Ladder score | |
---|---|---|
Country name | ||
Finland | Western Europe | 7.842 |
Denmark | Western Europe | 7.620 |
Switzerland | Western Europe | 7.571 |
Iceland | Western Europe | 7.554 |
Netherlands | Western Europe | 7.464 |
Use the next code cell to create a figure that tells a story behind your dataset. You can use any chart type (line chart, bar chart, heatmap, etc) of your choosing!
# Create a plot
# North America and ANZ VS Western Europe
na = data[data['Regional indicator'] == 'North America and ANZ']
eu = data[data['Regional indicator'] == 'Western Europe']
sns.distplot(a=na['Ladder score'], label='NA', kde=False)
sns.distplot(a=eu['Ladder score'], label='EUW', kde=False)
plt.legend()
# Check that a figure appears below
# step_4.check()
<matplotlib.legend.Legend at 0x7fb3e544c090>
Learn how to use your skills after completing the micro-course to create data visualizations in a final tutorial.
Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.