{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.012636,
     "end_time": "2020-10-01T01:24:12.822659",
     "exception": false,
     "start_time": "2020-10-01T01:24:12.810023",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Introduction\n",
    "\n",
    "In the last tutorial, we learned how to select relevant data out of a DataFrame or Series. Plucking the right data out of our data representation is critical to getting work done, as we demonstrated in the exercises.\n",
    "\n",
    "However, the data does not always come out of memory in the format we want it in right out of the bat. Sometimes we have to do some more work ourselves to reformat it for the task at hand.  This tutorial will cover different operations we can apply to our data to get the input \"just right\". \n",
    "\n",
    "**To start the exercise for this topic, please click [here](https://www.kaggle.com/kernels/fork/595524).**\n",
    "\n",
    "We'll use the Wine Magazine data for demonstration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "_kg_hide-input": true,
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:12.855039Z",
     "iopub.status.busy": "2020-10-01T01:24:12.854360Z",
     "iopub.status.idle": "2020-10-01T01:24:13.755936Z",
     "shell.execute_reply": "2020-10-01T01:24:13.755315Z"
    },
    "papermill": {
     "duration": 0.921711,
     "end_time": "2020-10-01T01:24:13.756059",
     "exception": false,
     "start_time": "2020-10-01T01:24:12.834348",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "import pandas as pd\n",
    "pd.set_option('max_rows', 5)\n",
    "import numpy as np\n",
    "reviews = pd.read_csv(\"../input/wine-reviews/winemag-data-130k-v2.csv\", index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:13.797262Z",
     "iopub.status.busy": "2020-10-01T01:24:13.796667Z",
     "iopub.status.idle": "2020-10-01T01:24:13.809939Z",
     "shell.execute_reply": "2020-10-01T01:24:13.810415Z"
    },
    "papermill": {
     "duration": 0.042714,
     "end_time": "2020-10-01T01:24:13.810574",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.767860",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>designation</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region_1</th>\n",
       "      <th>region_2</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Italy</td>\n",
       "      <td>Aromas include tropical fruit, broom, brimston...</td>\n",
       "      <td>Vulkà Bianco</td>\n",
       "      <td>87</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Etna</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Nicosia 2013 Vulkà Bianco  (Etna)</td>\n",
       "      <td>White Blend</td>\n",
       "      <td>Nicosia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>This is ripe and fruity, a wine that is smooth...</td>\n",
       "      <td>Avidagos</td>\n",
       "      <td>87</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Douro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Quinta dos Avidagos 2011 Avidagos Red (Douro)</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>Quinta dos Avidagos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129969</th>\n",
       "      <td>France</td>\n",
       "      <td>A dry style of Pinot Gris, this is crisp with ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>90</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Domaine Marcel Deiss 2012 Pinot Gris (Alsace)</td>\n",
       "      <td>Pinot Gris</td>\n",
       "      <td>Domaine Marcel Deiss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129970</th>\n",
       "      <td>France</td>\n",
       "      <td>Big, rich and off-dry, this is powered by inte...</td>\n",
       "      <td>Lieu-dit Harth Cuvée Caroline</td>\n",
       "      <td>90</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>Domaine Schoffit</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>129971 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         country                                        description  \\\n",
       "0          Italy  Aromas include tropical fruit, broom, brimston...   \n",
       "1       Portugal  This is ripe and fruity, a wine that is smooth...   \n",
       "...          ...                                                ...   \n",
       "129969    France  A dry style of Pinot Gris, this is crisp with ...   \n",
       "129970    France  Big, rich and off-dry, this is powered by inte...   \n",
       "\n",
       "                          designation  points  price           province  \\\n",
       "0                        Vulkà Bianco      87    NaN  Sicily & Sardinia   \n",
       "1                            Avidagos      87   15.0              Douro   \n",
       "...                               ...     ...    ...                ...   \n",
       "129969                            NaN      90   32.0             Alsace   \n",
       "129970  Lieu-dit Harth Cuvée Caroline      90   21.0             Alsace   \n",
       "\n",
       "       region_1 region_2    taster_name taster_twitter_handle  \\\n",
       "0          Etna      NaN  Kerin O’Keefe          @kerinokeefe   \n",
       "1           NaN      NaN     Roger Voss            @vossroger   \n",
       "...         ...      ...            ...                   ...   \n",
       "129969   Alsace      NaN     Roger Voss            @vossroger   \n",
       "129970   Alsace      NaN     Roger Voss            @vossroger   \n",
       "\n",
       "                                                    title         variety  \\\n",
       "0                       Nicosia 2013 Vulkà Bianco  (Etna)     White Blend   \n",
       "1           Quinta dos Avidagos 2011 Avidagos Red (Douro)  Portuguese Red   \n",
       "...                                                   ...             ...   \n",
       "129969      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)      Pinot Gris   \n",
       "129970  Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...  Gewürztraminer   \n",
       "\n",
       "                      winery  \n",
       "0                    Nicosia  \n",
       "1        Quinta dos Avidagos  \n",
       "...                      ...  \n",
       "129969  Domaine Marcel Deiss  \n",
       "129970      Domaine Schoffit  \n",
       "\n",
       "[129971 rows x 13 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.01176,
     "end_time": "2020-10-01T01:24:13.836528",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.824768",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Summary functions\n",
    "\n",
    "Pandas provides many simple \"summary functions\" (not an official name) which restructure the data in some useful way. For example, consider the `describe()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:13.866679Z",
     "iopub.status.busy": "2020-10-01T01:24:13.866056Z",
     "iopub.status.idle": "2020-10-01T01:24:13.877438Z",
     "shell.execute_reply": "2020-10-01T01:24:13.876983Z"
    },
    "papermill": {
     "duration": 0.029087,
     "end_time": "2020-10-01T01:24:13.877571",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.848484",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    129971.000000\n",
       "mean         88.447138\n",
       "             ...      \n",
       "75%          91.000000\n",
       "max         100.000000\n",
       "Name: points, Length: 8, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.points.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.012402,
     "end_time": "2020-10-01T01:24:13.902283",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.889881",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input. The output above only makes sense for numerical data; for string data here's what we get:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:13.943600Z",
     "iopub.status.busy": "2020-10-01T01:24:13.942664Z",
     "iopub.status.idle": "2020-10-01T01:24:13.975798Z",
     "shell.execute_reply": "2020-10-01T01:24:13.975171Z"
    },
    "papermill": {
     "duration": 0.061259,
     "end_time": "2020-10-01T01:24:13.975907",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.914648",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count         103727\n",
       "unique            19\n",
       "top       Roger Voss\n",
       "freq           25514\n",
       "Name: taster_name, dtype: object"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.taster_name.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.012458,
     "end_time": "2020-10-01T01:24:14.001198",
     "exception": false,
     "start_time": "2020-10-01T01:24:13.988740",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "If you want to get some particular simple summary statistic about a column in a DataFrame or a Series, there is usually a helpful pandas function that makes it happen. \n",
    "\n",
    "For example, to see the mean of the points allotted (e.g. how well an averagely rated wine does), we can use the `mean()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:14.031018Z",
     "iopub.status.busy": "2020-10-01T01:24:14.030253Z",
     "iopub.status.idle": "2020-10-01T01:24:14.035090Z",
     "shell.execute_reply": "2020-10-01T01:24:14.034487Z"
    },
    "papermill": {
     "duration": 0.021332,
     "end_time": "2020-10-01T01:24:14.035191",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.013859",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "88.44713820775404"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.points.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.012731,
     "end_time": "2020-10-01T01:24:14.060838",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.048107",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "To see a list of unique values we can use the `unique()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:14.098681Z",
     "iopub.status.busy": "2020-10-01T01:24:14.097801Z",
     "iopub.status.idle": "2020-10-01T01:24:14.101208Z",
     "shell.execute_reply": "2020-10-01T01:24:14.101658Z"
    },
    "papermill": {
     "duration": 0.028,
     "end_time": "2020-10-01T01:24:14.101791",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.073791",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Kerin O’Keefe', 'Roger Voss', 'Paul Gregutt',\n",
       "       'Alexander Peartree', 'Michael Schachner', 'Anna Lee C. Iijima',\n",
       "       'Virginie Boone', 'Matt Kettmann', nan, 'Sean P. Sullivan',\n",
       "       'Jim Gordon', 'Joe Czerwinski', 'Anne Krebiehl\\xa0MW',\n",
       "       'Lauren Buzzeo', 'Mike DeSimone', 'Jeff Jenssen',\n",
       "       'Susan Kostrzewa', 'Carrie Dykes', 'Fiona Adams',\n",
       "       'Christina Pickard'], dtype=object)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.taster_name.unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.013004,
     "end_time": "2020-10-01T01:24:14.128078",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.115074",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "To see a list of unique values _and_ how often they occur in the dataset, we can use the `value_counts()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:14.172762Z",
     "iopub.status.busy": "2020-10-01T01:24:14.169655Z",
     "iopub.status.idle": "2020-10-01T01:24:14.176476Z",
     "shell.execute_reply": "2020-10-01T01:24:14.175919Z"
    },
    "papermill": {
     "duration": 0.035306,
     "end_time": "2020-10-01T01:24:14.176595",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.141289",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Roger Voss           25514\n",
       "Michael Schachner    15134\n",
       "                     ...  \n",
       "Fiona Adams             27\n",
       "Christina Pickard        6\n",
       "Name: taster_name, Length: 19, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.taster_name.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.013576,
     "end_time": "2020-10-01T01:24:14.203894",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.190318",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Maps\n",
    "\n",
    "A **map** is a term, borrowed from mathematics, for a function that takes one set of values and \"maps\" them to another set of values. In data science we often have a need for creating new representations from existing data, or for transforming data from the format it is in now to the format that we want it to be in later. Maps are what handle this work, making them extremely important for getting your work done!\n",
    "\n",
    "There are two mapping methods that you will use often. \n",
    "\n",
    "[`map()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html) is the first, and slightly simpler one. For example, suppose that we wanted to remean the scores the wines received to 0. We can do this as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:14.236343Z",
     "iopub.status.busy": "2020-10-01T01:24:14.235754Z",
     "iopub.status.idle": "2020-10-01T01:24:14.279303Z",
     "shell.execute_reply": "2020-10-01T01:24:14.278783Z"
    },
    "papermill": {
     "duration": 0.061567,
     "end_time": "2020-10-01T01:24:14.279420",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.217853",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        -1.447138\n",
       "1        -1.447138\n",
       "            ...   \n",
       "129969    1.552862\n",
       "129970    1.552862\n",
       "Name: points, Length: 129971, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "review_points_mean = reviews.points.mean()\n",
    "reviews.points.map(lambda p: p - review_points_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.013883,
     "end_time": "2020-10-01T01:24:14.307726",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.293843",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The function you pass to `map()` should expect a single value from the Series (a point value, in the above example), and return a transformed version of that value. `map()` returns a new Series where all the values have been transformed by your function.\n",
    "\n",
    "[`apply()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:14.341101Z",
     "iopub.status.busy": "2020-10-01T01:24:14.340488Z",
     "iopub.status.idle": "2020-10-01T01:24:27.501248Z",
     "shell.execute_reply": "2020-10-01T01:24:27.501697Z"
    },
    "papermill": {
     "duration": 13.179839,
     "end_time": "2020-10-01T01:24:27.501844",
     "exception": false,
     "start_time": "2020-10-01T01:24:14.322005",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>designation</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region_1</th>\n",
       "      <th>region_2</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Italy</td>\n",
       "      <td>Aromas include tropical fruit, broom, brimston...</td>\n",
       "      <td>Vulkà Bianco</td>\n",
       "      <td>-1.447138</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Etna</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Nicosia 2013 Vulkà Bianco  (Etna)</td>\n",
       "      <td>White Blend</td>\n",
       "      <td>Nicosia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>This is ripe and fruity, a wine that is smooth...</td>\n",
       "      <td>Avidagos</td>\n",
       "      <td>-1.447138</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Douro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Quinta dos Avidagos 2011 Avidagos Red (Douro)</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>Quinta dos Avidagos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129969</th>\n",
       "      <td>France</td>\n",
       "      <td>A dry style of Pinot Gris, this is crisp with ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.552862</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Domaine Marcel Deiss 2012 Pinot Gris (Alsace)</td>\n",
       "      <td>Pinot Gris</td>\n",
       "      <td>Domaine Marcel Deiss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129970</th>\n",
       "      <td>France</td>\n",
       "      <td>Big, rich and off-dry, this is powered by inte...</td>\n",
       "      <td>Lieu-dit Harth Cuvée Caroline</td>\n",
       "      <td>1.552862</td>\n",
       "      <td>21.0</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>Domaine Schoffit</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>129971 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         country                                        description  \\\n",
       "0          Italy  Aromas include tropical fruit, broom, brimston...   \n",
       "1       Portugal  This is ripe and fruity, a wine that is smooth...   \n",
       "...          ...                                                ...   \n",
       "129969    France  A dry style of Pinot Gris, this is crisp with ...   \n",
       "129970    France  Big, rich and off-dry, this is powered by inte...   \n",
       "\n",
       "                          designation    points  price           province  \\\n",
       "0                        Vulkà Bianco -1.447138    NaN  Sicily & Sardinia   \n",
       "1                            Avidagos -1.447138   15.0              Douro   \n",
       "...                               ...       ...    ...                ...   \n",
       "129969                            NaN  1.552862   32.0             Alsace   \n",
       "129970  Lieu-dit Harth Cuvée Caroline  1.552862   21.0             Alsace   \n",
       "\n",
       "       region_1 region_2    taster_name taster_twitter_handle  \\\n",
       "0          Etna      NaN  Kerin O’Keefe          @kerinokeefe   \n",
       "1           NaN      NaN     Roger Voss            @vossroger   \n",
       "...         ...      ...            ...                   ...   \n",
       "129969   Alsace      NaN     Roger Voss            @vossroger   \n",
       "129970   Alsace      NaN     Roger Voss            @vossroger   \n",
       "\n",
       "                                                    title         variety  \\\n",
       "0                       Nicosia 2013 Vulkà Bianco  (Etna)     White Blend   \n",
       "1           Quinta dos Avidagos 2011 Avidagos Red (Douro)  Portuguese Red   \n",
       "...                                                   ...             ...   \n",
       "129969      Domaine Marcel Deiss 2012 Pinot Gris (Alsace)      Pinot Gris   \n",
       "129970  Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...  Gewürztraminer   \n",
       "\n",
       "                      winery  \n",
       "0                    Nicosia  \n",
       "1        Quinta dos Avidagos  \n",
       "...                      ...  \n",
       "129969  Domaine Marcel Deiss  \n",
       "129970      Domaine Schoffit  \n",
       "\n",
       "[129971 rows x 13 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def remean_points(row):\n",
    "    row.points = row.points - review_points_mean\n",
    "    return row\n",
    "\n",
    "reviews.apply(remean_points, axis='columns')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.014526,
     "end_time": "2020-10-01T01:24:27.531342",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.516816",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "If we had called `reviews.apply()` with `axis='index'`, then instead of passing a function to transform each row, we would need to give a function to transform each *column*.\n",
    "\n",
    "Note that `map()` and `apply()` return new, transformed Series and DataFrames, respectively. They don't modify the original data they're called on. If we look at the first row of `reviews`, we can see that it still has its original `points` value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:27.576768Z",
     "iopub.status.busy": "2020-10-01T01:24:27.575999Z",
     "iopub.status.idle": "2020-10-01T01:24:27.580009Z",
     "shell.execute_reply": "2020-10-01T01:24:27.579533Z"
    },
    "papermill": {
     "duration": 0.034049,
     "end_time": "2020-10-01T01:24:27.580123",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.546074",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>designation</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region_1</th>\n",
       "      <th>region_2</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Italy</td>\n",
       "      <td>Aromas include tropical fruit, broom, brimston...</td>\n",
       "      <td>Vulkà Bianco</td>\n",
       "      <td>87</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Etna</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Nicosia 2013 Vulkà Bianco  (Etna)</td>\n",
       "      <td>White Blend</td>\n",
       "      <td>Nicosia</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  country                                        description   designation  \\\n",
       "0   Italy  Aromas include tropical fruit, broom, brimston...  Vulkà Bianco   \n",
       "\n",
       "   points  price           province region_1 region_2    taster_name  \\\n",
       "0      87    NaN  Sicily & Sardinia     Etna      NaN  Kerin O’Keefe   \n",
       "\n",
       "  taster_twitter_handle                              title      variety  \\\n",
       "0          @kerinokeefe  Nicosia 2013 Vulkà Bianco  (Etna)  White Blend   \n",
       "\n",
       "    winery  \n",
       "0  Nicosia  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.014873,
     "end_time": "2020-10-01T01:24:27.610139",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.595266",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Pandas provides many common mapping operations as built-ins. For example, here's a faster way of remeaning our points column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:27.645209Z",
     "iopub.status.busy": "2020-10-01T01:24:27.644317Z",
     "iopub.status.idle": "2020-10-01T01:24:27.651750Z",
     "shell.execute_reply": "2020-10-01T01:24:27.651238Z"
    },
    "papermill": {
     "duration": 0.026672,
     "end_time": "2020-10-01T01:24:27.651851",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.625179",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        -1.447138\n",
       "1        -1.447138\n",
       "            ...   \n",
       "129969    1.552862\n",
       "129970    1.552862\n",
       "Name: points, Length: 129971, dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "review_points_mean = reviews.points.mean()\n",
    "reviews.points - review_points_mean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.015174,
     "end_time": "2020-10-01T01:24:27.682371",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.667197",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "In this code we are performing an operation between a lot of values on the left-hand side (everything in the Series) and a single value on the right-hand side (the mean value). Pandas looks at this expression and figures out that we must mean to subtract that mean value from every value in the dataset.\n",
    "\n",
    "Pandas will also understand what to do if we perform these operations between Series of equal length. For example, an easy way of combining country and region information in the dataset would be to do the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-10-01T01:24:27.717413Z",
     "iopub.status.busy": "2020-10-01T01:24:27.716844Z",
     "iopub.status.idle": "2020-10-01T01:24:27.767424Z",
     "shell.execute_reply": "2020-10-01T01:24:27.766916Z"
    },
    "papermill": {
     "duration": 0.06988,
     "end_time": "2020-10-01T01:24:27.767529",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.697649",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0            Italy - Etna\n",
       "1                     NaN\n",
       "               ...       \n",
       "129969    France - Alsace\n",
       "129970    France - Alsace\n",
       "Length: 129971, dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews.country + \" - \" + reviews.region_1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.015738,
     "end_time": "2020-10-01T01:24:27.799153",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.783415",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "These operators are faster than `map()` or `apply()` because they uses speed ups built into pandas. All of the standard Python operators (`>`, `<`, `==`, and so on) work in this manner.\n",
    "\n",
    "However, they are not as flexible as `map()` or `apply()`, which can do more advanced things, like applying conditional logic, which cannot be done with addition and subtraction alone.\n",
    "\n",
    "# Your turn\n",
    "\n",
    "If you haven't started the exercise, you can **[get started here](https://www.kaggle.com/kernels/fork/595524)**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.015614,
     "end_time": "2020-10-01T01:24:27.830603",
     "exception": false,
     "start_time": "2020-10-01T01:24:27.814989",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "---\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/161299) to chat with other Learners.*"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "papermill": {
   "duration": 19.439876,
   "end_time": "2020-10-01T01:24:27.953096",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2020-10-01T01:24:08.513220",
   "version": "2.1.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}