{ "cells": [ { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.012636, "end_time": "2020-10-01T01:24:12.822659", "exception": false, "start_time": "2020-10-01T01:24:12.810023", "status": "completed" }, "tags": [] }, "source": [ "# Introduction\n", "\n", "In the last tutorial, we learned how to select relevant data out of a DataFrame or Series. Plucking the right data out of our data representation is critical to getting work done, as we demonstrated in the exercises.\n", "\n", "However, the data does not always come out of memory in the format we want it in right out of the bat. Sometimes we have to do some more work ourselves to reformat it for the task at hand. This tutorial will cover different operations we can apply to our data to get the input \"just right\". \n", "\n", "**To start the exercise for this topic, please click [here](https://www.kaggle.com/kernels/fork/595524).**\n", "\n", "We'll use the Wine Magazine data for demonstration." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "_kg_hide-input": true, "execution": { "iopub.execute_input": "2020-10-01T01:24:12.855039Z", "iopub.status.busy": "2020-10-01T01:24:12.854360Z", "iopub.status.idle": "2020-10-01T01:24:13.755936Z", "shell.execute_reply": "2020-10-01T01:24:13.755315Z" }, "papermill": { "duration": 0.921711, "end_time": "2020-10-01T01:24:13.756059", "exception": false, "start_time": "2020-10-01T01:24:12.834348", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "\n", "import pandas as pd\n", "pd.set_option('max_rows', 5)\n", "import numpy as np\n", "reviews = pd.read_csv(\"../input/wine-reviews/winemag-data-130k-v2.csv\", index_col=0)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2020-10-01T01:24:13.797262Z", "iopub.status.busy": "2020-10-01T01:24:13.796667Z", "iopub.status.idle": "2020-10-01T01:24:13.809939Z", "shell.execute_reply": "2020-10-01T01:24:13.810415Z" }, "papermill": { "duration": 0.042714, "end_time": "2020-10-01T01:24:13.810574", "exception": false, "start_time": "2020-10-01T01:24:13.767860", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | country | \n", "description | \n", "designation | \n", "points | \n", "price | \n", "province | \n", "region_1 | \n", "region_2 | \n", "taster_name | \n", "taster_twitter_handle | \n", "title | \n", "variety | \n", "winery | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Italy | \n", "Aromas include tropical fruit, broom, brimston... | \n", "Vulkà Bianco | \n", "87 | \n", "NaN | \n", "Sicily & Sardinia | \n", "Etna | \n", "NaN | \n", "Kerin O’Keefe | \n", "@kerinokeefe | \n", "Nicosia 2013 Vulkà Bianco (Etna) | \n", "White Blend | \n", "Nicosia | \n", "
1 | \n", "Portugal | \n", "This is ripe and fruity, a wine that is smooth... | \n", "Avidagos | \n", "87 | \n", "15.0 | \n", "Douro | \n", "NaN | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) | \n", "Portuguese Red | \n", "Quinta dos Avidagos | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
129969 | \n", "France | \n", "A dry style of Pinot Gris, this is crisp with ... | \n", "NaN | \n", "90 | \n", "32.0 | \n", "Alsace | \n", "Alsace | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | \n", "Pinot Gris | \n", "Domaine Marcel Deiss | \n", "
129970 | \n", "France | \n", "Big, rich and off-dry, this is powered by inte... | \n", "Lieu-dit Harth Cuvée Caroline | \n", "90 | \n", "21.0 | \n", "Alsace | \n", "Alsace | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | \n", "Gewürztraminer | \n", "Domaine Schoffit | \n", "
129971 rows × 13 columns
\n", "\n", " | country | \n", "description | \n", "designation | \n", "points | \n", "price | \n", "province | \n", "region_1 | \n", "region_2 | \n", "taster_name | \n", "taster_twitter_handle | \n", "title | \n", "variety | \n", "winery | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Italy | \n", "Aromas include tropical fruit, broom, brimston... | \n", "Vulkà Bianco | \n", "-1.447138 | \n", "NaN | \n", "Sicily & Sardinia | \n", "Etna | \n", "NaN | \n", "Kerin O’Keefe | \n", "@kerinokeefe | \n", "Nicosia 2013 Vulkà Bianco (Etna) | \n", "White Blend | \n", "Nicosia | \n", "
1 | \n", "Portugal | \n", "This is ripe and fruity, a wine that is smooth... | \n", "Avidagos | \n", "-1.447138 | \n", "15.0 | \n", "Douro | \n", "NaN | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) | \n", "Portuguese Red | \n", "Quinta dos Avidagos | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
129969 | \n", "France | \n", "A dry style of Pinot Gris, this is crisp with ... | \n", "NaN | \n", "1.552862 | \n", "32.0 | \n", "Alsace | \n", "Alsace | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) | \n", "Pinot Gris | \n", "Domaine Marcel Deiss | \n", "
129970 | \n", "France | \n", "Big, rich and off-dry, this is powered by inte... | \n", "Lieu-dit Harth Cuvée Caroline | \n", "1.552862 | \n", "21.0 | \n", "Alsace | \n", "Alsace | \n", "NaN | \n", "Roger Voss | \n", "@vossroger | \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... | \n", "Gewürztraminer | \n", "Domaine Schoffit | \n", "
129971 rows × 13 columns
\n", "\n", " | country | \n", "description | \n", "designation | \n", "points | \n", "price | \n", "province | \n", "region_1 | \n", "region_2 | \n", "taster_name | \n", "taster_twitter_handle | \n", "title | \n", "variety | \n", "winery | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Italy | \n", "Aromas include tropical fruit, broom, brimston... | \n", "Vulkà Bianco | \n", "87 | \n", "NaN | \n", "Sicily & Sardinia | \n", "Etna | \n", "NaN | \n", "Kerin O’Keefe | \n", "@kerinokeefe | \n", "Nicosia 2013 Vulkà Bianco (Etna) | \n", "White Blend | \n", "Nicosia | \n", "