Run the following cell to load your data and some utility functions.
import pandas as pd
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.data_types_and_missing_data import *
print("Setup complete.")
reviews.head()
What is the data type of the points
column in the dataset?
# Your code here
dtype = reviews.points.dtype
# Check your answer
q1.check()
#q1.hint()
#q1.solution()
Create a Series from entries in the points
column, but convert the entries to strings. Hint: strings are str
in native Python.
# point_strings = reviews.points.astype(str)
point_strings = reviews.points.map(str)
# Check your answer
q2.check()
#q2.hint()
q2.solution()
Sometimes the price column is null. How many reviews in the dataset are missing a price?
n_missing_prices = reviews.price.isnull().sum()
# Check your answer
q3.check()
q3.hint()
q3.solution()
What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the region_1
field. This field is often missing data, so replace missing values with Unknown
. Sort in descending order. Your output should look something like this:
Unknown 21247
Napa Valley 4480
...
Bardolino Superiore 1
Primitivo del Tarantino 1
Name: region_1, Length: 1230, dtype: int64
region_1_with_missing_values_filled = reviews.region_1.fillna('Unknown')
reviews_per_region = region_1_with_missing_values_filled.value_counts()
# Check your answer
q4.check()
#q4.hint()
#q4.solution()
Move on to renaming and combining.
Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.