{ "cells": [ { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.036894, "end_time": "2020-10-19T19:10:12.857060", "exception": false, "start_time": "2020-10-19T19:10:12.820166", "status": "completed" }, "tags": [] }, "source": [ "# Introduction\n", "\n", "Selecting specific values of a pandas DataFrame or Series to work on is an implicit step in almost any data operation you'll run, so one of the first things you need to learn in working with data in Python is how to go about selecting the data points relevant to you quickly and effectively." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "_kg_hide-input": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:12.935721Z", "iopub.status.busy": "2020-10-19T19:10:12.934939Z", "iopub.status.idle": "2020-10-19T19:10:14.543274Z", "shell.execute_reply": "2020-10-19T19:10:14.542567Z" }, "papermill": { "duration": 1.652041, "end_time": "2020-10-19T19:10:14.543405", "exception": false, "start_time": "2020-10-19T19:10:12.891364", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "\n", "import pandas as pd\n", "reviews = pd.read_csv(\"../input/wine-reviews/winemag-data-130k-v2.csv\", index_col=0)\n", "pd.set_option('max_rows', 5)" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.042936, "end_time": "2020-10-19T19:10:14.621420", "exception": false, "start_time": "2020-10-19T19:10:14.578484", "status": "completed" }, "tags": [] }, "source": [ "**To start the exercise for this topic, please click [here](https://www.kaggle.com/kernels/fork/587910).**" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.041952, "end_time": "2020-10-19T19:10:14.713532", "exception": false, "start_time": "2020-10-19T19:10:14.671580", "status": "completed" }, "tags": [] }, "source": [ "# Native accessors\n", "\n", "Native Python objects provide good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.\n", "\n", "Consider this DataFrame:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:14.813190Z", "iopub.status.busy": "2020-10-19T19:10:14.812396Z", "iopub.status.idle": "2020-10-19T19:10:14.831381Z", "shell.execute_reply": "2020-10-19T19:10:14.830700Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.082945, "end_time": "2020-10-19T19:10:14.831541", "exception": false, "start_time": "2020-10-19T19:10:14.748596", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit
\n", "

129971 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "0 Italy Aromas include tropical fruit, broom, brimston... \n", "1 Portugal This is ripe and fruity, a wine that is smooth... \n", "... ... ... \n", "129969 France A dry style of Pinot Gris, this is crisp with ... \n", "129970 France Big, rich and off-dry, this is powered by inte... \n", "\n", " designation points price province \\\n", "0 Vulkà Bianco 87 NaN Sicily & Sardinia \n", "1 Avidagos 87 15.0 Douro \n", "... ... ... ... ... \n", "129969 NaN 90 32.0 Alsace \n", "129970 Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace \n", "\n", " region_1 region_2 taster_name taster_twitter_handle \\\n", "0 Etna NaN Kerin O’Keefe @kerinokeefe \n", "1 NaN NaN Roger Voss @vossroger \n", "... ... ... ... ... \n", "129969 Alsace NaN Roger Voss @vossroger \n", "129970 Alsace NaN Roger Voss @vossroger \n", "\n", " title variety \\\n", "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", "1 Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red \n", "... ... ... \n", "129969 Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "129970 Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer \n", "\n", " winery \n", "0 Nicosia \n", "1 Quinta dos Avidagos \n", "... ... \n", "129969 Domaine Marcel Deiss \n", "129970 Domaine Schoffit \n", "\n", "[129971 rows x 13 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.041253, "end_time": "2020-10-19T19:10:14.912140", "exception": false, "start_time": "2020-10-19T19:10:14.870887", "status": "completed" }, "tags": [] }, "source": [ "In Python, we can access the property of an object by accessing it as an attribute. A `book` object, for example, might have a `title` property, which we can access by calling `book.title`. Columns in a pandas DataFrame work in much the same way. \n", "\n", "Hence to access the `country` property of `reviews` we can use:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:14.991201Z", "iopub.status.busy": "2020-10-19T19:10:14.990135Z", "iopub.status.idle": "2020-10-19T19:10:14.994848Z", "shell.execute_reply": "2020-10-19T19:10:14.994047Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.047648, "end_time": "2020-10-19T19:10:14.995017", "exception": false, "start_time": "2020-10-19T19:10:14.947369", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Italy\n", "1 Portugal\n", " ... \n", "129969 France\n", "129970 France\n", "Name: country, Length: 129971, dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.country" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.035695, "end_time": "2020-10-19T19:10:15.071316", "exception": false, "start_time": "2020-10-19T19:10:15.035621", "status": "completed" }, "tags": [] }, "source": [ "If we have a Python dictionary, we can access its values using the indexing (`[]`) operator. We can do the same with columns in a DataFrame:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.151374Z", "iopub.status.busy": "2020-10-19T19:10:15.150310Z", "iopub.status.idle": "2020-10-19T19:10:15.155302Z", "shell.execute_reply": "2020-10-19T19:10:15.154606Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.04843, "end_time": "2020-10-19T19:10:15.155420", "exception": false, "start_time": "2020-10-19T19:10:15.106990", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Italy\n", "1 Portugal\n", " ... \n", "129969 France\n", "129970 France\n", "Name: country, Length: 129971, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews['country']" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.035881, "end_time": "2020-10-19T19:10:15.227985", "exception": false, "start_time": "2020-10-19T19:10:15.192104", "status": "completed" }, "tags": [] }, "source": [ "These are the two ways of selecting a specific Series out of a DataFrame. Neither of them is more or less syntactically valid than the other, but the indexing operator `[]` does have the advantage that it can handle column names with reserved characters in them (e.g. if we had a `country providence` column, `reviews.country providence` wouldn't work).\n", "\n", "Doesn't a pandas Series look kind of like a fancy dictionary? It pretty much is, so it's no surprise that, to drill down to a single specific value, we need only use the indexing operator `[]` once more:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.310547Z", "iopub.status.busy": "2020-10-19T19:10:15.309407Z", "iopub.status.idle": "2020-10-19T19:10:15.318016Z", "shell.execute_reply": "2020-10-19T19:10:15.317236Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.054028, "end_time": "2020-10-19T19:10:15.318152", "exception": false, "start_time": "2020-10-19T19:10:15.264124", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'Italy'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews['country'][0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.03624, "end_time": "2020-10-19T19:10:15.394823", "exception": false, "start_time": "2020-10-19T19:10:15.358583", "status": "completed" }, "tags": [] }, "source": [ "# Indexing in pandas\n", "\n", "The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, `loc` and `iloc`. For more advanced operations, these are the ones you're supposed to be using.\n", "\n", "### Index-based selection\n", "\n", "Pandas indexing works in one of two paradigms. The first is **index-based selection**: selecting data based on its numerical position in the data. `iloc` follows this paradigm.\n", "\n", "To select the first row of data in a DataFrame, we may use the following:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.476675Z", "iopub.status.busy": "2020-10-19T19:10:15.475784Z", "iopub.status.idle": "2020-10-19T19:10:15.480850Z", "shell.execute_reply": "2020-10-19T19:10:15.480233Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.049165, "end_time": "2020-10-19T19:10:15.480978", "exception": false, "start_time": "2020-10-19T19:10:15.431813", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "country Italy\n", "description Aromas include tropical fruit, broom, brimston...\n", " ... \n", "variety White Blend\n", "winery Nicosia\n", "Name: 0, Length: 13, dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.036961, "end_time": "2020-10-19T19:10:15.555385", "exception": false, "start_time": "2020-10-19T19:10:15.518424", "status": "completed" }, "tags": [] }, "source": [ "Both `loc` and `iloc` are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.\n", "\n", "This means that it's marginally easier to retrieve rows, and marginally harder to get retrieve columns. To get a column with `iloc`, we can do the following:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.638158Z", "iopub.status.busy": "2020-10-19T19:10:15.637095Z", "iopub.status.idle": "2020-10-19T19:10:15.641218Z", "shell.execute_reply": "2020-10-19T19:10:15.640621Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.048741, "end_time": "2020-10-19T19:10:15.641340", "exception": false, "start_time": "2020-10-19T19:10:15.592599", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Italy\n", "1 Portugal\n", " ... \n", "129969 France\n", "129970 France\n", "Name: country, Length: 129971, dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[:, 0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.037376, "end_time": "2020-10-19T19:10:15.716766", "exception": false, "start_time": "2020-10-19T19:10:15.679390", "status": "completed" }, "tags": [] }, "source": [ "On its own, the `:` operator, which also comes from native Python, means \"everything\". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the `country` column from just the first, second, and third row, we would do:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.800208Z", "iopub.status.busy": "2020-10-19T19:10:15.799148Z", "iopub.status.idle": "2020-10-19T19:10:15.802799Z", "shell.execute_reply": "2020-10-19T19:10:15.803527Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.049106, "end_time": "2020-10-19T19:10:15.803686", "exception": false, "start_time": "2020-10-19T19:10:15.754580", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Italy\n", "1 Portugal\n", "2 US\n", "Name: country, dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[:3, 0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.038068, "end_time": "2020-10-19T19:10:15.880396", "exception": false, "start_time": "2020-10-19T19:10:15.842328", "status": "completed" }, "tags": [] }, "source": [ "Or, to select just the second and third entries, we would do:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:15.967440Z", "iopub.status.busy": "2020-10-19T19:10:15.966222Z", "iopub.status.idle": "2020-10-19T19:10:15.970710Z", "shell.execute_reply": "2020-10-19T19:10:15.971234Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.05113, "end_time": "2020-10-19T19:10:15.971392", "exception": false, "start_time": "2020-10-19T19:10:15.920262", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "1 Portugal\n", "2 US\n", "Name: country, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[1:3, 0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.039715, "end_time": "2020-10-19T19:10:16.050740", "exception": false, "start_time": "2020-10-19T19:10:16.011025", "status": "completed" }, "tags": [] }, "source": [ "It's also possible to pass a list:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:16.138667Z", "iopub.status.busy": "2020-10-19T19:10:16.137535Z", "iopub.status.idle": "2020-10-19T19:10:16.142760Z", "shell.execute_reply": "2020-10-19T19:10:16.142084Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.052567, "end_time": "2020-10-19T19:10:16.142892", "exception": false, "start_time": "2020-10-19T19:10:16.090325", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 Italy\n", "1 Portugal\n", "2 US\n", "Name: country, dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[[0, 1, 2], 0]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.03995, "end_time": "2020-10-19T19:10:16.223826", "exception": false, "start_time": "2020-10-19T19:10:16.183876", "status": "completed" }, "tags": [] }, "source": [ "Finally, it's worth knowing that negative numbers can be used in selection. This will start counting forwards from the _end_ of the values. So for example here are the last five elements of the dataset." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:16.327236Z", "iopub.status.busy": "2020-10-19T19:10:16.326131Z", "iopub.status.idle": "2020-10-19T19:10:16.331085Z", "shell.execute_reply": "2020-10-19T19:10:16.330416Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.067558, "end_time": "2020-10-19T19:10:16.331211", "exception": false, "start_time": "2020-10-19T19:10:16.263653", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
129966GermanyNotes of honeysuckle and cantaloupe sweeten th...Brauneberger Juffer-Sonnenuhr Spätlese9028.0MoselNaNNaNAnna Lee C. IijimaNaNDr. H. Thanisch (Erben Müller-Burggraef) 2013 ...RieslingDr. H. Thanisch (Erben Müller-Burggraef)
129967USCitation is given as much as a decade of bottl...NaN9075.0OregonOregonOregon OtherPaul Gregutt@paulgwineCitation 2004 Pinot Noir (Oregon)Pinot NoirCitation
129968FranceWell-drained gravel soil gives this wine its c...Kritt9030.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Gresser 2013 Kritt Gewurztraminer (Als...GewürztraminerDomaine Gresser
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit
\n", "
" ], "text/plain": [ " country description \\\n", "129966 Germany Notes of honeysuckle and cantaloupe sweeten th... \n", "129967 US Citation is given as much as a decade of bottl... \n", "129968 France Well-drained gravel soil gives this wine its c... \n", "129969 France A dry style of Pinot Gris, this is crisp with ... \n", "129970 France Big, rich and off-dry, this is powered by inte... \n", "\n", " designation points price province \\\n", "129966 Brauneberger Juffer-Sonnenuhr Spätlese 90 28.0 Mosel \n", "129967 NaN 90 75.0 Oregon \n", "129968 Kritt 90 30.0 Alsace \n", "129969 NaN 90 32.0 Alsace \n", "129970 Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace \n", "\n", " region_1 region_2 taster_name taster_twitter_handle \\\n", "129966 NaN NaN Anna Lee C. Iijima NaN \n", "129967 Oregon Oregon Other Paul Gregutt @paulgwine  \n", "129968 Alsace NaN Roger Voss @vossroger \n", "129969 Alsace NaN Roger Voss @vossroger \n", "129970 Alsace NaN Roger Voss @vossroger \n", "\n", " title variety \\\n", "129966 Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ... Riesling \n", "129967 Citation 2004 Pinot Noir (Oregon) Pinot Noir \n", "129968 Domaine Gresser 2013 Kritt Gewurztraminer (Als... Gewürztraminer \n", "129969 Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "129970 Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer \n", "\n", " winery \n", "129966 Dr. H. Thanisch (Erben Müller-Burggraef) \n", "129967 Citation \n", "129968 Domaine Gresser \n", "129969 Domaine Marcel Deiss \n", "129970 Domaine Schoffit " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.iloc[-5:]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.056382, "end_time": "2020-10-19T19:10:16.436766", "exception": false, "start_time": "2020-10-19T19:10:16.380384", "status": "completed" }, "tags": [] }, "source": [ "### Label-based selection\n", "\n", "The second paradigm for attribute selection is the one followed by the `loc` operator: **label-based selection**. In this paradigm, it's the data index value, not its position, which matters.\n", "\n", "For example, to get the first entry in `reviews`, we would now do the following:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:16.531354Z", "iopub.status.busy": "2020-10-19T19:10:16.530586Z", "iopub.status.idle": "2020-10-19T19:10:16.535028Z", "shell.execute_reply": "2020-10-19T19:10:16.534300Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.053272, "end_time": "2020-10-19T19:10:16.535148", "exception": false, "start_time": "2020-10-19T19:10:16.481876", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'Italy'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[0, 'country']" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.04103, "end_time": "2020-10-19T19:10:16.617709", "exception": false, "start_time": "2020-10-19T19:10:16.576679", "status": "completed" }, "tags": [] }, "source": [ "`iloc` is conceptually simpler than `loc` because it ignores the dataset's indices. When we use `iloc` we treat the dataset like a big matrix (a list of lists), one that we have to index into by position. `loc`, by contrast, uses the information in the indices to do its work. Since your dataset usually has meaningful indices, it's usually easier to do things using `loc` instead. For example, here's one operation that's much easier using `loc`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:16.707169Z", "iopub.status.busy": "2020-10-19T19:10:16.706358Z", "iopub.status.idle": "2020-10-19T19:10:16.721788Z", "shell.execute_reply": "2020-10-19T19:10:16.721045Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.062685, "end_time": "2020-10-19T19:10:16.721911", "exception": false, "start_time": "2020-10-19T19:10:16.659226", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
taster_nametaster_twitter_handlepoints
0Kerin O’Keefe@kerinokeefe87
1Roger Voss@vossroger87
............
129969Roger Voss@vossroger90
129970Roger Voss@vossroger90
\n", "

129971 rows × 3 columns

\n", "
" ], "text/plain": [ " taster_name taster_twitter_handle points\n", "0 Kerin O’Keefe @kerinokeefe 87\n", "1 Roger Voss @vossroger 87\n", "... ... ... ...\n", "129969 Roger Voss @vossroger 90\n", "129970 Roger Voss @vossroger 90\n", "\n", "[129971 rows x 3 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.041532, "end_time": "2020-10-19T19:10:16.805661", "exception": false, "start_time": "2020-10-19T19:10:16.764129", "status": "completed" }, "tags": [] }, "source": [ "### Choosing between `loc` and `iloc`\n", "\n", "When choosing or transitioning between `loc` and `iloc`, there is one \"gotcha\" worth keeping in mind, which is that the two methods use slightly different indexing schemes.\n", "\n", "`iloc` uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So `0:10` will select entries `0,...,9`. `loc`, meanwhile, indexes inclusively. So `0:10` will select entries `0,...,10`.\n", "\n", "Why the change? Remember that loc can index any stdlib type: strings, for example. If we have a DataFrame with index values `Apples, ..., Potatoes, ...`, and we want to select \"all the alphabetical fruit choices between Apples and Potatoes\", then it's a lot more convenient to index `df.loc['Apples':'Potatoes']` than it is to index something like `df.loc['Apples', 'Potatoet]` (`t` coming after `s` in the alphabet).\n", "\n", "This is particularly confusing when the DataFrame index is a simple numerical list, e.g. `0,...,1000`. In this case `df.iloc[0:1000]` will return 1000 entries, while `df.loc[0:1000]` return 1001 of them! To get 1000 elements using `loc`, you will need to go one lower and ask for `df.loc[0:999]`. \n", "\n", "Otherwise, the semantics of using `loc` are the same as those for `iloc`." ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.041569, "end_time": "2020-10-19T19:10:16.889346", "exception": false, "start_time": "2020-10-19T19:10:16.847777", "status": "completed" }, "tags": [] }, "source": [ "# Manipulating the index\n", "\n", "Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate the index in any way we see fit.\n", "\n", "The `set_index()` method can be used to do the job. Here is what happens when we `set_index` to the `title` field:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:16.979618Z", "iopub.status.busy": "2020-10-19T19:10:16.978842Z", "iopub.status.idle": "2020-10-19T19:10:17.049607Z", "shell.execute_reply": "2020-10-19T19:10:17.048935Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.117454, "end_time": "2020-10-19T19:10:17.049748", "exception": false, "start_time": "2020-10-19T19:10:16.932294", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handlevarietywinery
title
Nicosia 2013 Vulkà Bianco (Etna)ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeWhite BlendNicosia
Quinta dos Avidagos 2011 Avidagos Red (Douro)PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerPortuguese RedQuinta dos Avidagos
.......................................
Domaine Marcel Deiss 2012 Pinot Gris (Alsace)FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerPinot GrisDomaine Marcel Deiss
Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline Gewurztraminer (Alsace)FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerGewürztraminerDomaine Schoffit
\n", "

129971 rows × 12 columns

\n", "
" ], "text/plain": [ " country \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Italy \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Portugal \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) France \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... France \n", "\n", " description \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Aromas include tropical fruit, broom, brimston... \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) This is ripe and fruity, a wine that is smooth... \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) A dry style of Pinot Gris, this is crisp with ... \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Big, rich and off-dry, this is powered by inte... \n", "\n", " designation \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Vulkà Bianco \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Avidagos \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) NaN \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Lieu-dit Harth Cuvée Caroline \n", "\n", " points price \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) 87 NaN \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) 87 15.0 \n", "... ... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) 90 32.0 \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... 90 21.0 \n", "\n", " province \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Sicily & Sardinia \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Douro \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Alsace \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Alsace \n", "\n", " region_1 region_2 \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Etna NaN \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) NaN NaN \n", "... ... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Alsace NaN \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Alsace NaN \n", "\n", " taster_name \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Kerin O’Keefe \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Roger Voss \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Roger Voss \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Roger Voss \n", "\n", " taster_twitter_handle \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) @kerinokeefe \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) @vossroger \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) @vossroger \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... @vossroger \n", "\n", " variety \\\n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Gewürztraminer \n", "\n", " winery \n", "title \n", "Nicosia 2013 Vulkà Bianco (Etna) Nicosia \n", "Quinta dos Avidagos 2011 Avidagos Red (Douro) Quinta dos Avidagos \n", "... ... \n", "Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Domaine Marcel Deiss \n", "Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caro... Domaine Schoffit \n", "\n", "[129971 rows x 12 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.set_index(\"title\")" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.042893, "end_time": "2020-10-19T19:10:17.135818", "exception": false, "start_time": "2020-10-19T19:10:17.092925", "status": "completed" }, "tags": [] }, "source": [ "This is useful if you can come up with an index for the dataset which is better than the current one." ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.042907, "end_time": "2020-10-19T19:10:17.222188", "exception": false, "start_time": "2020-10-19T19:10:17.179281", "status": "completed" }, "tags": [] }, "source": [ "# Conditional selection\n", "\n", "So far we've been indexing various strides of data, using structural properties of the DataFrame itself. To do *interesting* things with the data, however, we often need to ask questions based on conditions. \n", "\n", "For example, suppose that we're interested specifically in better-than-average wines produced in Italy.\n", "\n", "We can start by checking if each wine is Italian or not:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:17.342805Z", "iopub.status.busy": "2020-10-19T19:10:17.341806Z", "iopub.status.idle": "2020-10-19T19:10:17.346618Z", "shell.execute_reply": "2020-10-19T19:10:17.346025Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.081462, "end_time": "2020-10-19T19:10:17.346747", "exception": false, "start_time": "2020-10-19T19:10:17.265285", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 False\n", " ... \n", "129969 False\n", "129970 False\n", "Name: country, Length: 129971, dtype: bool" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.country == 'Italy'" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.043432, "end_time": "2020-10-19T19:10:17.434487", "exception": false, "start_time": "2020-10-19T19:10:17.391055", "status": "completed" }, "tags": [] }, "source": [ "This operation produced a Series of `True`/`False` booleans based on the `country` of each record. This result can then be used inside of `loc` to select the relevant data:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:17.549820Z", "iopub.status.busy": "2020-10-19T19:10:17.548999Z", "iopub.status.idle": "2020-10-19T19:10:17.578106Z", "shell.execute_reply": "2020-10-19T19:10:17.577452Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.099522, "end_time": "2020-10-19T19:10:17.578247", "exception": false, "start_time": "2020-10-19T19:10:17.478725", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129961ItalyIntense aromas of wild cherry, baking spice, t...NaN9030.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCOS 2013 Frappato (Sicilia)FrappatoCOS
129962ItalyBlackberry, cassis, grilled herb and toasted a...Sàgana Tenuta San Giacomo9040.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCusumano 2012 Sàgana Tenuta San Giacomo Nero d...Nero d'AvolaCusumano
\n", "

19540 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "0 Italy Aromas include tropical fruit, broom, brimston... \n", "6 Italy Here's a bright, informal red that opens with ... \n", "... ... ... \n", "129961 Italy Intense aromas of wild cherry, baking spice, t... \n", "129962 Italy Blackberry, cassis, grilled herb and toasted a... \n", "\n", " designation points price province region_1 \\\n", "0 Vulkà Bianco 87 NaN Sicily & Sardinia Etna \n", "6 Belsito 87 16.0 Sicily & Sardinia Vittoria \n", "... ... ... ... ... ... \n", "129961 NaN 90 30.0 Sicily & Sardinia Sicilia \n", "129962 Sàgana Tenuta San Giacomo 90 40.0 Sicily & Sardinia Sicilia \n", "\n", " region_2 taster_name taster_twitter_handle \\\n", "0 NaN Kerin O’Keefe @kerinokeefe \n", "6 NaN Kerin O’Keefe @kerinokeefe \n", "... ... ... ... \n", "129961 NaN Kerin O’Keefe @kerinokeefe \n", "129962 NaN Kerin O’Keefe @kerinokeefe \n", "\n", " title variety \\\n", "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", "6 Terre di Giurfo 2013 Belsito Frappato (Vittoria) Frappato \n", "... ... ... \n", "129961 COS 2013 Frappato (Sicilia) Frappato \n", "129962 Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... Nero d'Avola \n", "\n", " winery \n", "0 Nicosia \n", "6 Terre di Giurfo \n", "... ... \n", "129961 COS \n", "129962 Cusumano \n", "\n", "[19540 rows x 13 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[reviews.country == 'Italy']" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.044521, "end_time": "2020-10-19T19:10:17.667620", "exception": false, "start_time": "2020-10-19T19:10:17.623099", "status": "completed" }, "tags": [] }, "source": [ "This DataFrame has ~20,000 rows. The original had ~130,000. That means that around 15% of wines originate from Italy.\n", "\n", "We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.\n", "\n", "We can use the ampersand (`&`) to bring the two questions together:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:17.785019Z", "iopub.status.busy": "2020-10-19T19:10:17.783977Z", "iopub.status.idle": "2020-10-19T19:10:17.829109Z", "shell.execute_reply": "2020-10-19T19:10:17.828514Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.117206, "end_time": "2020-10-19T19:10:17.829237", "exception": false, "start_time": "2020-10-19T19:10:17.712031", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
120ItalySlightly backward, particularly given the vint...Bricco Rocche Prapó9270.0PiedmontBaroloNaNNaNNaNCeretto 2003 Bricco Rocche Prapó (Barolo)NebbioloCeretto
130ItalyAt the first it was quite muted and subdued, b...Bricco Rocche Brunate9170.0PiedmontBaroloNaNNaNNaNCeretto 2003 Bricco Rocche Brunate (Barolo)NebbioloCeretto
..........................................
129961ItalyIntense aromas of wild cherry, baking spice, t...NaN9030.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCOS 2013 Frappato (Sicilia)FrappatoCOS
129962ItalyBlackberry, cassis, grilled herb and toasted a...Sàgana Tenuta San Giacomo9040.0Sicily & SardiniaSiciliaNaNKerin O’Keefe@kerinokeefeCusumano 2012 Sàgana Tenuta San Giacomo Nero d...Nero d'AvolaCusumano
\n", "

6648 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "120 Italy Slightly backward, particularly given the vint... \n", "130 Italy At the first it was quite muted and subdued, b... \n", "... ... ... \n", "129961 Italy Intense aromas of wild cherry, baking spice, t... \n", "129962 Italy Blackberry, cassis, grilled herb and toasted a... \n", "\n", " designation points price province region_1 \\\n", "120 Bricco Rocche Prapó 92 70.0 Piedmont Barolo \n", "130 Bricco Rocche Brunate 91 70.0 Piedmont Barolo \n", "... ... ... ... ... ... \n", "129961 NaN 90 30.0 Sicily & Sardinia Sicilia \n", "129962 Sàgana Tenuta San Giacomo 90 40.0 Sicily & Sardinia Sicilia \n", "\n", " region_2 taster_name taster_twitter_handle \\\n", "120 NaN NaN NaN \n", "130 NaN NaN NaN \n", "... ... ... ... \n", "129961 NaN Kerin O’Keefe @kerinokeefe \n", "129962 NaN Kerin O’Keefe @kerinokeefe \n", "\n", " title variety \\\n", "120 Ceretto 2003 Bricco Rocche Prapó (Barolo) Nebbiolo \n", "130 Ceretto 2003 Bricco Rocche Brunate (Barolo) Nebbiolo \n", "... ... ... \n", "129961 COS 2013 Frappato (Sicilia) Frappato \n", "129962 Cusumano 2012 Sàgana Tenuta San Giacomo Nero d... Nero d'Avola \n", "\n", " winery \n", "120 Ceretto \n", "130 Ceretto \n", "... ... \n", "129961 COS \n", "129962 Cusumano \n", "\n", "[6648 rows x 13 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.045184, "end_time": "2020-10-19T19:10:17.920119", "exception": false, "start_time": "2020-10-19T19:10:17.874935", "status": "completed" }, "tags": [] }, "source": [ "Suppose we'll buy any wine that's made in Italy _or_ which is rated above average. For this we use a pipe (`|`):" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:18.042256Z", "iopub.status.busy": "2020-10-19T19:10:18.041055Z", "iopub.status.idle": "2020-10-19T19:10:18.080836Z", "shell.execute_reply": "2020-10-19T19:10:18.080094Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.114084, "end_time": "2020-10-19T19:10:18.080959", "exception": false, "start_time": "2020-10-19T19:10:17.966875", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit
\n", "

61937 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "0 Italy Aromas include tropical fruit, broom, brimston... \n", "6 Italy Here's a bright, informal red that opens with ... \n", "... ... ... \n", "129969 France A dry style of Pinot Gris, this is crisp with ... \n", "129970 France Big, rich and off-dry, this is powered by inte... \n", "\n", " designation points price province \\\n", "0 Vulkà Bianco 87 NaN Sicily & Sardinia \n", "6 Belsito 87 16.0 Sicily & Sardinia \n", "... ... ... ... ... \n", "129969 NaN 90 32.0 Alsace \n", "129970 Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace \n", "\n", " region_1 region_2 taster_name taster_twitter_handle \\\n", "0 Etna NaN Kerin O’Keefe @kerinokeefe \n", "6 Vittoria NaN Kerin O’Keefe @kerinokeefe \n", "... ... ... ... ... \n", "129969 Alsace NaN Roger Voss @vossroger \n", "129970 Alsace NaN Roger Voss @vossroger \n", "\n", " title variety \\\n", "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", "6 Terre di Giurfo 2013 Belsito Frappato (Vittoria) Frappato \n", "... ... ... \n", "129969 Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "129970 Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer \n", "\n", " winery \n", "0 Nicosia \n", "6 Terre di Giurfo \n", "... ... \n", "129969 Domaine Marcel Deiss \n", "129970 Domaine Schoffit \n", "\n", "[61937 rows x 13 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.046955, "end_time": "2020-10-19T19:10:18.174518", "exception": false, "start_time": "2020-10-19T19:10:18.127563", "status": "completed" }, "tags": [] }, "source": [ "Pandas comes with a few built-in conditional selectors, two of which we will highlight here. \n", "\n", "The first is `isin`. `isin` is lets you select data whose value \"is in\" a list of values. For example, here's how we can use it to select wines only from Italy or France:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:18.285547Z", "iopub.status.busy": "2020-10-19T19:10:18.284772Z", "iopub.status.idle": "2020-10-19T19:10:18.321314Z", "shell.execute_reply": "2020-10-19T19:10:18.320577Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.099369, "end_time": "2020-10-19T19:10:18.321436", "exception": false, "start_time": "2020-10-19T19:10:18.222067", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
0ItalyAromas include tropical fruit, broom, brimston...Vulkà Bianco87NaNSicily & SardiniaEtnaNaNKerin O’Keefe@kerinokeefeNicosia 2013 Vulkà Bianco (Etna)White BlendNicosia
6ItalyHere's a bright, informal red that opens with ...Belsito8716.0Sicily & SardiniaVittoriaNaNKerin O’Keefe@kerinokeefeTerre di Giurfo 2013 Belsito Frappato (Vittoria)FrappatoTerre di Giurfo
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit
\n", "

41633 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "0 Italy Aromas include tropical fruit, broom, brimston... \n", "6 Italy Here's a bright, informal red that opens with ... \n", "... ... ... \n", "129969 France A dry style of Pinot Gris, this is crisp with ... \n", "129970 France Big, rich and off-dry, this is powered by inte... \n", "\n", " designation points price province \\\n", "0 Vulkà Bianco 87 NaN Sicily & Sardinia \n", "6 Belsito 87 16.0 Sicily & Sardinia \n", "... ... ... ... ... \n", "129969 NaN 90 32.0 Alsace \n", "129970 Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace \n", "\n", " region_1 region_2 taster_name taster_twitter_handle \\\n", "0 Etna NaN Kerin O’Keefe @kerinokeefe \n", "6 Vittoria NaN Kerin O’Keefe @kerinokeefe \n", "... ... ... ... ... \n", "129969 Alsace NaN Roger Voss @vossroger \n", "129970 Alsace NaN Roger Voss @vossroger \n", "\n", " title variety \\\n", "0 Nicosia 2013 Vulkà Bianco (Etna) White Blend \n", "6 Terre di Giurfo 2013 Belsito Frappato (Vittoria) Frappato \n", "... ... ... \n", "129969 Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "129970 Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer \n", "\n", " winery \n", "0 Nicosia \n", "6 Terre di Giurfo \n", "... ... \n", "129969 Domaine Marcel Deiss \n", "129970 Domaine Schoffit \n", "\n", "[41633 rows x 13 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[reviews.country.isin(['Italy', 'France'])]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.047, "end_time": "2020-10-19T19:10:18.417542", "exception": false, "start_time": "2020-10-19T19:10:18.370542", "status": "completed" }, "tags": [] }, "source": [ "The second is `isnull` (and its companion `notnull`). These methods let you highlight values which are (or are not) empty (`NaN`). For example, to filter out wines lacking a price tag in the dataset, here's what we would do:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:18.519026Z", "iopub.status.busy": "2020-10-19T19:10:18.518221Z", "iopub.status.idle": "2020-10-19T19:10:18.574234Z", "shell.execute_reply": "2020-10-19T19:10:18.573476Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.109326, "end_time": "2020-10-19T19:10:18.574360", "exception": false, "start_time": "2020-10-19T19:10:18.465034", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
1PortugalThis is ripe and fruity, a wine that is smooth...Avidagos8715.0DouroNaNNaNRoger Voss@vossrogerQuinta dos Avidagos 2011 Avidagos Red (Douro)Portuguese RedQuinta dos Avidagos
2USTart and snappy, the flavors of lime flesh and...NaN8714.0OregonWillamette ValleyWillamette ValleyPaul Gregutt@paulgwineRainstorm 2013 Pinot Gris (Willamette Valley)Pinot GrisRainstorm
..........................................
129969FranceA dry style of Pinot Gris, this is crisp with ...NaN9032.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Marcel Deiss 2012 Pinot Gris (Alsace)Pinot GrisDomaine Marcel Deiss
129970FranceBig, rich and off-dry, this is powered by inte...Lieu-dit Harth Cuvée Caroline9021.0AlsaceAlsaceNaNRoger Voss@vossrogerDomaine Schoffit 2012 Lieu-dit Harth Cuvée Car...GewürztraminerDomaine Schoffit
\n", "

120975 rows × 13 columns

\n", "
" ], "text/plain": [ " country description \\\n", "1 Portugal This is ripe and fruity, a wine that is smooth... \n", "2 US Tart and snappy, the flavors of lime flesh and... \n", "... ... ... \n", "129969 France A dry style of Pinot Gris, this is crisp with ... \n", "129970 France Big, rich and off-dry, this is powered by inte... \n", "\n", " designation points price province \\\n", "1 Avidagos 87 15.0 Douro \n", "2 NaN 87 14.0 Oregon \n", "... ... ... ... ... \n", "129969 NaN 90 32.0 Alsace \n", "129970 Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace \n", "\n", " region_1 region_2 taster_name \\\n", "1 NaN NaN Roger Voss \n", "2 Willamette Valley Willamette Valley Paul Gregutt \n", "... ... ... ... \n", "129969 Alsace NaN Roger Voss \n", "129970 Alsace NaN Roger Voss \n", "\n", " taster_twitter_handle \\\n", "1 @vossroger \n", "2 @paulgwine  \n", "... ... \n", "129969 @vossroger \n", "129970 @vossroger \n", "\n", " title variety \\\n", "1 Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red \n", "2 Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris \n", "... ... ... \n", "129969 Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris \n", "129970 Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer \n", "\n", " winery \n", "1 Quinta dos Avidagos \n", "2 Rainstorm \n", "... ... \n", "129969 Domaine Marcel Deiss \n", "129970 Domaine Schoffit \n", "\n", "[120975 rows x 13 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews.loc[reviews.price.notnull()]" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.04816, "end_time": "2020-10-19T19:10:18.671032", "exception": false, "start_time": "2020-10-19T19:10:18.622872", "status": "completed" }, "tags": [] }, "source": [ "# Assigning data\n", "\n", "Going the other way, assigning data to a DataFrame is easy. You can assign either a constant value:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:18.779275Z", "iopub.status.busy": "2020-10-19T19:10:18.773530Z", "iopub.status.idle": "2020-10-19T19:10:18.783909Z", "shell.execute_reply": "2020-10-19T19:10:18.783132Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.064627, "end_time": "2020-10-19T19:10:18.784046", "exception": false, "start_time": "2020-10-19T19:10:18.719419", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 everyone\n", "1 everyone\n", " ... \n", "129969 everyone\n", "129970 everyone\n", "Name: critic, Length: 129971, dtype: object" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews['critic'] = 'everyone'\n", "reviews['critic']" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.048753, "end_time": "2020-10-19T19:10:18.882016", "exception": false, "start_time": "2020-10-19T19:10:18.833263", "status": "completed" }, "tags": [] }, "source": [ "Or with an iterable of values:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2020-10-19T19:10:19.028344Z", "iopub.status.busy": "2020-10-19T19:10:19.027259Z", "iopub.status.idle": "2020-10-19T19:10:19.032140Z", "shell.execute_reply": "2020-10-19T19:10:19.031381Z" }, "jupyter": { "outputs_hidden": true }, "papermill": { "duration": 0.101059, "end_time": "2020-10-19T19:10:19.032262", "exception": false, "start_time": "2020-10-19T19:10:18.931203", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "0 129971\n", "1 129970\n", " ... \n", "129969 2\n", "129970 1\n", "Name: index_backwards, Length: 129971, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews['index_backwards'] = range(len(reviews), 0, -1)\n", "reviews['index_backwards']" ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.049124, "end_time": "2020-10-19T19:10:19.130994", "exception": false, "start_time": "2020-10-19T19:10:19.081870", "status": "completed" }, "tags": [] }, "source": [ "# Your turn\n", "\n", "If you haven't started the exercise, you can **[get started here](https://www.kaggle.com/kernels/fork/587910)**." ] }, { "cell_type": "markdown", "metadata": { "papermill": { "duration": 0.049367, "end_time": "2020-10-19T19:10:19.230734", "exception": false, "start_time": "2020-10-19T19:10:19.181367", "status": "completed" }, "tags": [] }, "source": [ "---\n", "\n", "\n", "\n", "\n", "*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/161299) to chat with other Learners.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" }, "papermill": { "duration": 11.444387, "end_time": "2020-10-19T19:10:19.387935", "environment_variables": {}, "exception": null, "input_path": "__notebook__.ipynb", "output_path": "__notebook__.ipynb", "parameters": {}, "start_time": "2020-10-19T19:10:07.943548", "version": "2.1.0" } }, "nbformat": 4, "nbformat_minor": 4 }