{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**This notebook is an exercise in the [Data Cleaning](https://www.kaggle.com/learn/data-cleaning) course. You can reference the tutorial at [this link](https://www.kaggle.com/alexisbcook/handling-missing-values).**\n", "\n", "---\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this exercise, you'll apply what you learned in the **Handling missing values** tutorial.\n", "\n", "# Setup\n", "\n", "The questions below will give you feedback on your work. Run the following cell to set up the feedback system." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:04.967259Z", "iopub.status.busy": "2021-07-01T17:24:04.966426Z", "iopub.status.idle": "2021-07-01T17:24:09.349378Z", "shell.execute_reply": "2021-07-01T17:24:09.347725Z", "shell.execute_reply.started": "2021-07-01T17:24:04.967124Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3347: DtypeWarning: Columns (22,32) have mixed types.Specify dtype option on import or set low_memory=False.\n", " if (await self.run_code(code, result, async_=asy)):\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Setup Complete\n" ] } ], "source": [ "from learntools.core import binder\n", "binder.bind(globals())\n", "from learntools.data_cleaning.ex1 import *\n", "print(\"Setup Complete\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1) Take a first look at the data\n", "\n", "Run the next code cell to load in the libraries and dataset you'll use to complete the exercise." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:09.352245Z", "iopub.status.busy": "2021-07-01T17:24:09.351782Z", "iopub.status.idle": "2021-07-01T17:24:11.340420Z", "shell.execute_reply": "2021-07-01T17:24:11.339137Z", "shell.execute_reply.started": "2021-07-01T17:24:09.352199Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3156: DtypeWarning: Columns (22,32) have mixed types.Specify dtype option on import or set low_memory=False.\n", " interactivity=interactivity, compiler=compiler, result=result)\n" ] } ], "source": [ "# modules we'll use\n", "import pandas as pd\n", "import numpy as np\n", "\n", "# read in all our data\n", "sf_permits = pd.read_csv(\"../input/building-permit-applications-data/Building_Permits.csv\")\n", "\n", "# set seed for reproducibility\n", "np.random.seed(0) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the code cell below to print the first five rows of the `sf_permits` DataFrame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.342107Z", "iopub.status.busy": "2021-07-01T17:24:11.341802Z", "iopub.status.idle": "2021-07-01T17:24:11.384350Z", "shell.execute_reply": "2021-07-01T17:24:11.382757Z", "shell.execute_reply.started": "2021-07-01T17:24:11.342078Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Permit NumberPermit TypePermit Type DefinitionPermit Creation DateBlockLotStreet NumberStreet Number SuffixStreet NameStreet Suffix...Existing Construction TypeExisting Construction Type DescriptionProposed Construction TypeProposed Construction Type DescriptionSite PermitSupervisor DistrictNeighborhoods - Analysis BoundariesZipcodeLocationRecord ID
02015050655194sign - erect05/06/20150326023140NaNEllisSt...3.0constr type 3NaNNaNNaN3.0Tenderloin94102.0(37.785719256680785, -122.40852313194863)1380611233945
12016041951464sign - erect04/19/20160306007440NaNGearySt...3.0constr type 3NaNNaNNaN3.0Tenderloin94102.0(37.78733980600732, -122.41063199757738)1420164406718
22016052786093additions alterations or repairs05/27/201605952031647NaNPacificAv...1.0constr type 11.0constr type 1NaN3.0Russian Hill94109.0(37.7946573324287, -122.42232562979227)1424856504716
32016110721668otc alterations permit11/07/201601560111230NaNPacificAv...5.0wood frame (5)5.0wood frame (5)NaN3.0Nob Hill94109.0(37.79595867909168, -122.41557405519474)1443574295566
42016112835296demolitions11/28/20160342001950NaNMarketSt...3.0constr type 3NaNNaNNaN6.0Tenderloin94102.0(37.78315261897309, -122.40950883997789)144548169992
\n", "

5 rows × 43 columns

\n", "
" ], "text/plain": [ " Permit Number Permit Type Permit Type Definition \\\n", "0 201505065519 4 sign - erect \n", "1 201604195146 4 sign - erect \n", "2 201605278609 3 additions alterations or repairs \n", "3 201611072166 8 otc alterations permit \n", "4 201611283529 6 demolitions \n", "\n", " Permit Creation Date Block Lot Street Number Street Number Suffix \\\n", "0 05/06/2015 0326 023 140 NaN \n", "1 04/19/2016 0306 007 440 NaN \n", "2 05/27/2016 0595 203 1647 NaN \n", "3 11/07/2016 0156 011 1230 NaN \n", "4 11/28/2016 0342 001 950 NaN \n", "\n", " Street Name Street Suffix ... Existing Construction Type \\\n", "0 Ellis St ... 3.0 \n", "1 Geary St ... 3.0 \n", "2 Pacific Av ... 1.0 \n", "3 Pacific Av ... 5.0 \n", "4 Market St ... 3.0 \n", "\n", " Existing Construction Type Description Proposed Construction Type \\\n", "0 constr type 3 NaN \n", "1 constr type 3 NaN \n", "2 constr type 1 1.0 \n", "3 wood frame (5) 5.0 \n", "4 constr type 3 NaN \n", "\n", " Proposed Construction Type Description Site Permit Supervisor District \\\n", "0 NaN NaN 3.0 \n", "1 NaN NaN 3.0 \n", "2 constr type 1 NaN 3.0 \n", "3 wood frame (5) NaN 3.0 \n", "4 NaN NaN 6.0 \n", "\n", " Neighborhoods - Analysis Boundaries Zipcode \\\n", "0 Tenderloin 94102.0 \n", "1 Tenderloin 94102.0 \n", "2 Russian Hill 94109.0 \n", "3 Nob Hill 94109.0 \n", "4 Tenderloin 94102.0 \n", "\n", " Location Record ID \n", "0 (37.785719256680785, -122.40852313194863) 1380611233945 \n", "1 (37.78733980600732, -122.41063199757738) 1420164406718 \n", "2 (37.7946573324287, -122.42232562979227) 1424856504716 \n", "3 (37.79595867909168, -122.41557405519474) 1443574295566 \n", "4 (37.78315261897309, -122.40950883997789) 144548169992 \n", "\n", "[5 rows x 43 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TODO: Your code here!\n", "sf_permits.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does the dataset have any missing values? Once you have an answer, run the code cell below to get credit for your work." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.387345Z", "iopub.status.busy": "2021-07-01T17:24:11.386838Z", "iopub.status.idle": "2021-07-01T17:24:11.398506Z", "shell.execute_reply": "2021-07-01T17:24:11.396983Z", "shell.execute_reply.started": "2021-07-01T17:24:11.387292Z" } }, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 4, \"questionId\": \"1_TakeFirstLook\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct: \n", "\n", "The first five rows of the data does show that several columns have missing values. You can see this in the \"Street Number Suffix\", \"Proposed Construction Type\" and \"Site Permit\" columns, among others." ], "text/plain": [ "Correct: \n", "\n", "The first five rows of the data does show that several columns have missing values. You can see this in the \"Street Number Suffix\", \"Proposed Construction Type\" and \"Site Permit\" columns, among others." ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Check your answer (Run this code cell to receive credit!)\n", "q1.check()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.403732Z", "iopub.status.busy": "2021-07-01T17:24:11.403213Z", "iopub.status.idle": "2021-07-01T17:24:11.410833Z", "shell.execute_reply": "2021-07-01T17:24:11.409654Z", "shell.execute_reply.started": "2021-07-01T17:24:11.403673Z" } }, "outputs": [], "source": [ "# Line below will give you a hint\n", "#q1.hint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2) How many missing data points do we have?\n", "\n", "What percentage of the values in the dataset are missing? Your answer should be a number between 0 and 100. (If 1/4 of the values in the dataset are missing, the answer is 25.)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.413899Z", "iopub.status.busy": "2021-07-01T17:24:11.413006Z", "iopub.status.idle": "2021-07-01T17:24:11.894777Z", "shell.execute_reply": "2021-07-01T17:24:11.893971Z", "shell.execute_reply.started": "2021-07-01T17:24:11.413779Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "26.26002315058403\n" ] }, { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 1, \"questionId\": \"2_PercentMissingValues\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# TODO: Your code here!\n", "total_values = np.product(sf_permits.shape)\n", "missing_values = sf_permits.isnull().sum().sum()\n", "\n", "percent_missing = (missing_values/total_values) * 100\n", "\n", "print(percent_missing)\n", "\n", "# Check your answer\n", "q2.check()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.896718Z", "iopub.status.busy": "2021-07-01T17:24:11.896209Z", "iopub.status.idle": "2021-07-01T17:24:11.900253Z", "shell.execute_reply": "2021-07-01T17:24:11.899319Z", "shell.execute_reply.started": "2021-07-01T17:24:11.896676Z" } }, "outputs": [], "source": [ "# Lines below will give you a hint or solution code\n", "#q2.hint()\n", "#q2.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3) Figure out why the data is missing\n", "\n", "Look at the columns **\"Street Number Suffix\"** and **\"Zipcode\"** from the [San Francisco Building Permits dataset](https://www.kaggle.com/aparnashastry/building-permit-applications-data). Both of these contain missing values. \n", "- Which, if either, are missing because they don't exist? \n", "- Which, if either, are missing because they weren't recorded? \n", "\n", "Once you have an answer, run the code cell below." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.902176Z", "iopub.status.busy": "2021-07-01T17:24:11.901627Z", "iopub.status.idle": "2021-07-01T17:24:11.919580Z", "shell.execute_reply": "2021-07-01T17:24:11.918406Z", "shell.execute_reply.started": "2021-07-01T17:24:11.902091Z" } }, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 4, \"questionId\": \"3_WhyDataMissing\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct: \n", "\n", "If a value in the \"Street Number Suffix\" column is missing, it is likely because it does not exist. If a value in the \"Zipcode\" column is missing, it was not recorded." ], "text/plain": [ "Correct: \n", "\n", "If a value in the \"Street Number Suffix\" column is missing, it is likely because it does not exist. If a value in the \"Zipcode\" column is missing, it was not recorded." ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Check your answer (Run this code cell to receive credit!)\n", "q3.check()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.921657Z", "iopub.status.busy": "2021-07-01T17:24:11.921285Z", "iopub.status.idle": "2021-07-01T17:24:11.932083Z", "shell.execute_reply": "2021-07-01T17:24:11.930956Z", "shell.execute_reply.started": "2021-07-01T17:24:11.921589Z" } }, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"interactionType\": 2, \"questionType\": 4, \"questionId\": \"3_WhyDataMissing\", \"learnToolsVersion\": \"0.3.4\", \"valueTowardsCompletion\": 0.0, \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\", \"outcomeType\": 4}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Hint: Do all addresses generally have a street number suffix? Do all addresses generally have a zipcode?" ], "text/plain": [ "Hint: Do all addresses generally have a street number suffix? Do all addresses generally have a zipcode?" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Line below will give you a hint\n", "q3.hint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4) Drop missing values: rows\n", "\n", "If you removed all of the rows of `sf_permits` with missing values, how many rows are left?\n", "\n", "**Note**: Do not change the value of `sf_permits` when checking this. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:11.934922Z", "iopub.status.busy": "2021-07-01T17:24:11.934253Z", "iopub.status.idle": "2021-07-01T17:24:12.430605Z", "shell.execute_reply": "2021-07-01T17:24:12.429239Z", "shell.execute_reply.started": "2021-07-01T17:24:11.934850Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Permit NumberPermit TypePermit Type DefinitionPermit Creation DateBlockLotStreet NumberStreet Number SuffixStreet NameStreet Suffix...Existing Construction TypeExisting Construction Type DescriptionProposed Construction TypeProposed Construction Type DescriptionSite PermitSupervisor DistrictNeighborhoods - Analysis BoundariesZipcodeLocationRecord ID
\n", "

0 rows × 43 columns

\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Permit Number, Permit Type, Permit Type Definition, Permit Creation Date, Block, Lot, Street Number, Street Number Suffix, Street Name, Street Suffix, Unit, Unit Suffix, Description, Current Status, Current Status Date, Filed Date, Issued Date, Completed Date, First Construction Document Date, Structural Notification, Number of Existing Stories, Number of Proposed Stories, Voluntary Soft-Story Retrofit, Fire Only Permit, Permit Expiration Date, Estimated Cost, Revised Cost, Existing Use, Existing Units, Proposed Use, Proposed Units, Plansets, TIDF Compliance, Existing Construction Type, Existing Construction Type Description, Proposed Construction Type, Proposed Construction Type Description, Site Permit, Supervisor District, Neighborhoods - Analysis Boundaries, Zipcode, Location, Record ID]\n", "Index: []\n", "\n", "[0 rows x 43 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TODO: Your code here!\n", "sf_permits.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have an answer, run the code cell below." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:12.432556Z", "iopub.status.busy": "2021-07-01T17:24:12.432239Z", "iopub.status.idle": "2021-07-01T17:24:12.440068Z", "shell.execute_reply": "2021-07-01T17:24:12.438960Z", "shell.execute_reply.started": "2021-07-01T17:24:12.432526Z" } }, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 4, \"questionId\": \"4_DropMissingRows\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct: \n", "\n", "There are no rows remaining in the dataset!" ], "text/plain": [ "Correct: \n", "\n", "There are no rows remaining in the dataset!" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Check your answer (Run this code cell to receive credit!)\n", "q4.check()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:12.441888Z", "iopub.status.busy": "2021-07-01T17:24:12.441542Z", "iopub.status.idle": "2021-07-01T17:24:12.450835Z", "shell.execute_reply": "2021-07-01T17:24:12.449945Z", "shell.execute_reply.started": "2021-07-01T17:24:12.441838Z" } }, "outputs": [], "source": [ "# Line below will give you a hint\n", "#q4.hint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5) Drop missing values: columns\n", "\n", "Now try removing all the columns with empty values. \n", "- Create a new DataFrame called `sf_permits_with_na_dropped` that has all of the columns with empty values removed. \n", "- How many columns were removed from the original `sf_permits` DataFrame? Use this number to set the value of the `dropped_columns` variable below." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:12.453467Z", "iopub.status.busy": "2021-07-01T17:24:12.452843Z", "iopub.status.idle": "2021-07-01T17:24:12.946898Z", "shell.execute_reply": "2021-07-01T17:24:12.945560Z", "shell.execute_reply.started": "2021-07-01T17:24:12.453416Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "31\n" ] }, { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 1, \"questionId\": \"5_DropMissingCols\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# TODO: Your code here\n", "sf_permits_with_na_dropped = sf_permits.dropna(axis=1)\n", "\n", "dropped_columns = sf_permits.shape[1] - sf_permits_with_na_dropped.shape[1]\n", "\n", "print(dropped_columns)\n", "\n", "# Check your answer\n", "q5.check()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:12.948966Z", "iopub.status.busy": "2021-07-01T17:24:12.948541Z", "iopub.status.idle": "2021-07-01T17:24:12.954068Z", "shell.execute_reply": "2021-07-01T17:24:12.952778Z", "shell.execute_reply.started": "2021-07-01T17:24:12.948923Z" } }, "outputs": [], "source": [ "# Lines below will give you a hint or solution code\n", "#q5.hint()\n", "#q5.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6) Fill in missing values automatically\n", "\n", "Try replacing all the NaN's in the `sf_permits` data with the one that comes directly after it and then replacing any remaining NaN's with 0. Set the result to a new DataFrame `sf_permits_with_na_imputed`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:12.956457Z", "iopub.status.busy": "2021-07-01T17:24:12.956012Z", "iopub.status.idle": "2021-07-01T17:24:14.320688Z", "shell.execute_reply": "2021-07-01T17:24:14.319953Z", "shell.execute_reply.started": "2021-07-01T17:24:12.956417Z" } }, "outputs": [ { "data": { "application/javascript": [ "parent.postMessage({\"jupyterEvent\": \"custom.exercise_interaction\", \"data\": {\"outcomeType\": 1, \"valueTowardsCompletion\": 0.16666666666666666, \"interactionType\": 1, \"questionType\": 1, \"questionId\": \"6_ImputeAutomatically\", \"learnToolsVersion\": \"0.3.4\", \"failureMessage\": \"\", \"exceptionClass\": \"\", \"trace\": \"\"}}, \"*\")" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Correct" ], "text/plain": [ "Correct" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# TODO: Your code here\n", "sf_permits_with_na_imputed = sf_permits.fillna(method='bfill', axis=0).fillna(0)\n", "\n", "# Check your answer\n", "q6.check()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2021-07-01T17:24:14.322314Z", "iopub.status.busy": "2021-07-01T17:24:14.321841Z", "iopub.status.idle": "2021-07-01T17:24:14.325255Z", "shell.execute_reply": "2021-07-01T17:24:14.324494Z", "shell.execute_reply.started": "2021-07-01T17:24:14.322267Z" } }, "outputs": [], "source": [ "# Lines below will give you a hint or solution code\n", "#q6.hint()\n", "#q6.solution()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# More practice\n", "\n", "If you're looking for more practice handling missing values:\n", "\n", "* Check out [this noteboook](https://www.kaggle.com/alexisbcook/missing-values) on handling missing values using scikit-learn's imputer. \n", "* Look back at the \"Zipcode\" column in the `sf_permits` dataset, which has some missing values. How would you go about figuring out what the actual zipcode of each address should be? (You might try using another dataset. You can search for datasets about San Fransisco on the [Datasets listing](https://www.kaggle.com/datasets).) \n", "\n", "# Keep going\n", "\n", "In the next lesson, learn how to [**apply scaling and normalization**](https://www.kaggle.com/alexisbcook/scaling-and-normalization) to transform your data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "\n", "*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/172650) to chat with other Learners.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }