{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.015636,
"end_time": "2020-10-01T00:23:43.827727",
"exception": false,
"start_time": "2020-10-01T00:23:43.812091",
"status": "completed"
},
"tags": []
},
"source": [
"In this tutorial you'll learn all about **histograms** and **density plots**.\n",
"\n",
"# Set up the notebook\n",
"\n",
"As always, we begin by setting up the coding environment. (_This code is hidden, but you can un-hide it by clicking on the \"Code\" button immediately below this text, on the right._)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"_kg_hide-input": true,
"_kg_hide-output": true,
"execution": {
"iopub.execute_input": "2020-10-01T00:23:43.864741Z",
"iopub.status.busy": "2020-10-01T00:23:43.864012Z",
"iopub.status.idle": "2020-10-01T00:23:45.479142Z",
"shell.execute_reply": "2020-10-01T00:23:45.479860Z"
},
"papermill": {
"duration": 1.638265,
"end_time": "2020-10-01T00:23:45.480072",
"exception": false,
"start_time": "2020-10-01T00:23:43.841807",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setup Complete\n"
]
}
],
"source": [
"\n",
"import pandas as pd\n",
"pd.plotting.register_matplotlib_converters()\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"import seaborn as sns\n",
"print(\"Setup Complete\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.014121,
"end_time": "2020-10-01T00:23:45.509264",
"exception": false,
"start_time": "2020-10-01T00:23:45.495143",
"status": "completed"
},
"tags": []
},
"source": [
"# Select a dataset\n",
"\n",
"We'll work with a dataset of 150 different flowers, or 50 each from three different species of iris (*Iris setosa*, *Iris versicolor*, and *Iris virginica*).\n",
"\n",
"\n",
"\n",
"# Load and examine the data\n",
"\n",
"Each row in the dataset corresponds to a different flower. There are four measurements: the sepal length and width, along with the petal length and width. We also keep track of the corresponding species. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:45.550515Z",
"iopub.status.busy": "2020-10-01T00:23:45.549498Z",
"iopub.status.idle": "2020-10-01T00:23:45.591486Z",
"shell.execute_reply": "2020-10-01T00:23:45.592268Z"
},
"papermill": {
"duration": 0.068478,
"end_time": "2020-10-01T00:23:45.592469",
"exception": false,
"start_time": "2020-10-01T00:23:45.523991",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sepal Length (cm)
\n",
"
Sepal Width (cm)
\n",
"
Petal Length (cm)
\n",
"
Petal Width (cm)
\n",
"
Species
\n",
"
\n",
"
\n",
"
Id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
5.1
\n",
"
3.5
\n",
"
1.4
\n",
"
0.2
\n",
"
Iris-setosa
\n",
"
\n",
"
\n",
"
2
\n",
"
4.9
\n",
"
3.0
\n",
"
1.4
\n",
"
0.2
\n",
"
Iris-setosa
\n",
"
\n",
"
\n",
"
3
\n",
"
4.7
\n",
"
3.2
\n",
"
1.3
\n",
"
0.2
\n",
"
Iris-setosa
\n",
"
\n",
"
\n",
"
4
\n",
"
4.6
\n",
"
3.1
\n",
"
1.5
\n",
"
0.2
\n",
"
Iris-setosa
\n",
"
\n",
"
\n",
"
5
\n",
"
5.0
\n",
"
3.6
\n",
"
1.4
\n",
"
0.2
\n",
"
Iris-setosa
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sepal Length (cm) Sepal Width (cm) Petal Length (cm) Petal Width (cm) \\\n",
"Id \n",
"1 5.1 3.5 1.4 0.2 \n",
"2 4.9 3.0 1.4 0.2 \n",
"3 4.7 3.2 1.3 0.2 \n",
"4 4.6 3.1 1.5 0.2 \n",
"5 5.0 3.6 1.4 0.2 \n",
"\n",
" Species \n",
"Id \n",
"1 Iris-setosa \n",
"2 Iris-setosa \n",
"3 Iris-setosa \n",
"4 Iris-setosa \n",
"5 Iris-setosa "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Path of the file to read\n",
"iris_filepath = \"../input/iris.csv\"\n",
"\n",
"# Read the file into a variable iris_data\n",
"iris_data = pd.read_csv(iris_filepath, index_col=\"Id\")\n",
"\n",
"# Print the first 5 rows of the data\n",
"iris_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.015045,
"end_time": "2020-10-01T00:23:45.628334",
"exception": false,
"start_time": "2020-10-01T00:23:45.613289",
"status": "completed"
},
"tags": []
},
"source": [
"# Histograms\n",
"\n",
"Say we would like to create a **histogram** to see how petal length varies in iris flowers. We can do this with the `sns.distplot` command. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:45.666824Z",
"iopub.status.busy": "2020-10-01T00:23:45.666070Z",
"iopub.status.idle": "2020-10-01T00:23:45.958032Z",
"shell.execute_reply": "2020-10-01T00:23:45.956778Z"
},
"papermill": {
"duration": 0.314604,
"end_time": "2020-10-01T00:23:45.958186",
"exception": false,
"start_time": "2020-10-01T00:23:45.643582",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEGCAYAAAB8Ys7jAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAQOklEQVR4nO3df6zddX3H8edLikMBBcaFVZF1G4gjZhZyBZYuRq0QVCK4zR84XV3IusUfgehm0G0aZraRuRkTsxgbYOsiIPiDgW5RmwpRFJEWQWDF4BxiR0cr6oCNSID3/jifwvX2lnt6zr339FOfj+Tm+/1+zvf7+b6/bfrq937O+X5OqgpJUn+eNukCJEmjMcAlqVMGuCR1ygCXpE4Z4JLUqWVLebLDDz+8VqxYsZSnlKTubd68+YdVNTW7fUkDfMWKFWzatGkpTylJ3Uvy/bnaHUKRpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnRrqY4RJ7gYeBB4DHq2q6SSHAVcAK4C7gddX1Y8Xp0xJ0mx7cgf+sqpaWVXTbft8YGNVHQtsbNuSpCUyzhDKmcD6tr4eOGv8ciRJwxr2ScwCvpSkgI9X1TrgyKraBlBV25IcMdeBSdYCawGOPvrokQu97MZ7Rj62V286efQ/L0n7vmEDfFVV3dtCekOSO4c9QQv7dQDT09N+/Y8kLZChhlCq6t623A5cBZwE3JdkOUBbbl+sIiVJu5o3wJMcmOTgnevAacDtwDXAmrbbGuDqxSpSkrSrYYZQjgSuSrJz/8uq6gtJbgKuTHIOcA/wusUrU5I027wBXlXfA140R/v9wOrFKEqSND+fxJSkThngktQpA1ySOmWAS1KnDHBJ6pQBLkmdWtJvpZe0K+f50ai8A5ekThngktQpA1ySOmWAS1KnDHBJ6pQBLkmdMsAlqVMGuCR1ygCXpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnqlAEuSZ0ywCWpUwa4JHXKAJekTg0d4En2S/KtJJ9v24cl2ZDkrrY8dPHKlCTNtid34OcCW2Zsnw9srKpjgY1tW5K0RIYK8CRHAa8GLprRfCawvq2vB85a2NIkSU9l2DvwjwDvAR6f0XZkVW0DaMsj5jowydokm5Js2rFjx1jFSpKeNG+AJzkD2F5Vm0c5QVWtq6rpqpqempoapQtJ0hyWDbHPKuA1SV4FHAA8K8kngPuSLK+qbUmWA9sXs1BJ0s+a9w68qt5bVUdV1QrgjcCXq+rNwDXAmrbbGuDqRatSkrSLcT4HfiFwapK7gFPbtiRpiQwzhPKEqroOuK6t3w+sXviSJEnD8ElMSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnqlAEuSZ0ywCWpUwa4JHXKAJekThngktQpA1ySOmWAS1KnDHBJ6pQBLkmdMsAlqVN79J2Y0mK77MZ7Jl2C1A3vwCWpUwa4JHXKAJekThngktQpA1ySOmWAS1KnDHBJ6pQBLkmdMsAlqVMGuCR1ygCXpE4Z4JLUqXkDPMkBSb6Z5NYkdyS5oLUflmRDkrva8tDFL1eStNMwd+A/BV5eVS8CVgKnJzkFOB/YWFXHAhvbtiRpicwb4DXwUNvcv/0UcCawvrWvB85alAolSXMaagw8yX5JbgG2Axuq6kbgyKraBtCWRyxemZKk2YYK8Kp6rKpWAkcBJyV54bAnSLI2yaYkm3bs2DFqnZKkWfboUyhV9RPgOuB04L4kywHacvtujllXVdNVNT01NTVmuZKknYb5FMpUkkPa+jOAVwB3AtcAa9pua4CrF6tISdKuhvlOzOXA+iT7MQj8K6vq80luAK5Mcg5wD/C6RaxTkjTLvAFeVd8GTpij/X5g9WIUJUman09iSlKnDHBJ6pQBLkmdMsAlqVMGuCR1ygCXpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnqlAEuSZ0ywCWpUwa4JHXKAJekThngktQpA1ySOmWAS1KnDHBJ6pQBLkmdMsAlqVMGuCR1ygCXpE4Z4JLUKQNckjplgEtSp+YN8CTPS3Jtki1J7khybms/LMmGJHe15aGLX64kaadh7sAfBd5dVb8OnAK8PcnxwPnAxqo6FtjYtiVJS2TeAK+qbVV1c1t/ENgCPBc4E1jfdlsPnLVYRUqSdrVHY+BJVgAnADcCR1bVNhiEPHDEbo5Zm2RTkk07duwYr1pJ0hOGDvAkBwGfAc6rqgeGPa6q1lXVdFVNT01NjVKjJGkOQwV4kv0ZhPelVfXZ1nxfkuXt9eXA9sUpUZI0l2E+hRLgYmBLVX14xkvXAGva+hrg6oUvT5K0O8uG2GcV8BbgtiS3tLb3ARcCVyY5B7gHeN3ilChJmsu8AV5V1wPZzcurF7YcSdKwfBJTkjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnq1DCTWUnSgrrsxnsmXcKSe9PJRy94n96BS1KnDHBJ6pQBLkmdMsAlqVMGuCR1ygCXpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnqlAEuSZ0ywCWpU/MGeJJLkmxPcvuMtsOSbEhyV1seurhlSpJmG+YO/J+A02e1nQ9srKpjgY1tW5K0hOYN8Kr6CvCjWc1nAuvb+nrgrAWuS5I0j1HHwI+sqm0AbXnE7nZMsjbJpiSbduzYMeLpJEmzLfqbmFW1rqqmq2p6ampqsU8nST83Rg3w+5IsB2jL7QtXkiRpGKMG+DXAmra+Brh6YcqRJA1rmI8RXg7cAByXZGuSc4ALgVOT3AWc2rYlSUto2Xw7VNXZu3lp9QLXIknaAz6JKUmdMsAlqVMGuCR1ygCXpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROGeCS1CkDXJI6ZYBLUqcMcEnqlAEuSZ0ywCWpUwa4JHXKAJekThngktQpA1ySOmWAS1Knlk26AO3eZTfeM+kSJO3FvAOXpE4Z4JLUKQNckjplgEtSpwxwSeqUAS5JnTLAJalTBrgkdcoAl6ROjRXgSU5P8p0k301y/kIVJUma38gBnmQ/4B+AVwLHA2cnOX6hCpMkPbVx7sBPAr5bVd+rqkeATwJnLkxZkqT5jDOZ1XOBH8zY3gqcPHunJGuBtW3zoSTfGfF8hwM/HPHYvdG+dD370rWA17M36/Zafm/u5mGv55fnahwnwDNHW+3SULUOWDfGeQYnSzZV1fS4/ewt9qXr2ZeuBbyevdm+dC0w/vWMM4SyFXjejO2jgHvH6E+StAfGCfCbgGOT/EqSpwNvBK5ZmLIkSfMZeQilqh5N8g7gi8B+wCVVdceCVbarsYdh9jL70vXsS9cCXs/ebF+6FhjzelK1y7C1JKkDPokpSZ0ywCWpU3t9gCe5JMn2JLdPupZxJXlekmuTbElyR5JzJ13TOJIckOSbSW5t13PBpGsaV5L9knwryecnXcu4ktyd5LYktyTZNOl6xpXkkCSfTnJn+zf0m5OuaRRJjmt/Jzt/Hkhy3kh97e1j4EleAjwE/HNVvXDS9YwjyXJgeVXdnORgYDNwVlX9+4RLG0mSAAdW1UNJ9geuB86tqm9MuLSRJXkXMA08q6rOmHQ940hyNzBdVV0++DJbkvXAV6vqovbJt2dW1U8mXdc42pQk/wWcXFXf39Pj9/o78Kr6CvCjSdexEKpqW1Xd3NYfBLYweKK1SzXwUNvcv/3s3XcETyHJUcCrgYsmXYt+VpJnAS8BLgaoqkd6D+9mNfAfo4Q3dBDg+6okK4ATgBsnW8l42pDDLcB2YENV9Xw9HwHeAzw+6UIWSAFfSrK5TWnRs18FdgD/2Ia4Lkpy4KSLWgBvBC4f9WADfAKSHAR8Bjivqh6YdD3jqKrHqmolgydxT0rS5TBXkjOA7VW1edK1LKBVVXUigxlD396GI3u1DDgR+FhVnQD8L9D1FNZtGOg1wKdG7cMAX2JtrPgzwKVV9dlJ17NQ2q+z1wGnT7iUUa0CXtPGjT8JvDzJJyZb0niq6t623A5cxWAG0V5tBbbO+A3v0wwCvWevBG6uqvtG7cAAX0LtTb+LgS1V9eFJ1zOuJFNJDmnrzwBeAdw52apGU1XvraqjqmoFg19rv1xVb55wWSNLcmB7o5w21HAa0O0nuarqv4EfJDmuNa0Gunzzf4azGWP4BMabjXBJJLkceClweJKtwAeq6uLJVjWyVcBbgNvauDHA+6rq3yZY0ziWA+vbO+lPA66squ4/frePOBK4anDPwDLgsqr6wmRLGts7gUvb0MP3gD+YcD0jS/JM4FTgj8bqZ2//GKEkaW4OoUhSpwxwSeqUAS5JnTLAJalTBrgkdcoA16JJ8libbe32JJ9qH53a3b4rk7xqiD5fOtdMgbtrXyhtJry3jXK+JB9ZiKcgk/xdkpeP24/2HQa4FtPDVbWyzSL5CPDHT7HvSmDeAJ+gQ4C3zbvXLEkOA05pk7KN66N0/vi4FpYBrqXyVeCY9oTgJUluapMSndkezPhL4A3tjv0NSU5K8vW2z9dnPIG3R5KcluSGJDe33wIOau13J7mgtd+W5AWtfSrJhtb+8STfT3I4cCHwa62+D7XuD5oxP/Wl7Unb2X4XeOIBmiQvbtdzawZzqR+c5K1J/iXJ55L8Z5J3JHlXu/ZvtP8EaDPW/WKSXxrlz0L7HgNciy7JMgbzPtwG/BmDx9RfDLwM+BCDaWjfD1zR7tivYPBI/kvaxEXvB/56hPMeDvw58Io2qdMm4F0zdvlha/8Y8Cet7QOtvhMZzB9ydGs/n8G0nyur6k9b2wnAecDxDGbLWzVHGasYzPu+c/KiKxjMmf4iBlMPPNz2eyHwJgbzlfwV8H/t2m8Afn9Gfzfv5jz6ObTXP0qvrj1jxpQBX2UwD8zXGUwatTMwD+DJkJzp2Qwe0z+WwbSo+49w/lMYhOvX2s3x0xkE4k47JxPbDPx2W/8t4LUAVfWFJD9+iv6/WVVbAdp1rmDwpRYzLWcwDSrAccC2qrqp9f9AOxbg2jZH/INJ/gf4XDvmNuA3ZvS3HXjOU120fn4Y4FpMD7epZp/Qhhl+p6q+M6v95FnHfpBBqL22zZ1+3QjnD4M5ys/ezes/bcvHePLfwlzDILvz0xnrM/uY6WEG/0nt7Ht3c1fM7OvxGduPz+r3AJ68a9fPOYdQtNS+CLxz53hxkhNa+4PAwTP2ezaDr5oCeOuI5/oGsCrJMe1cz0zy/HmOuR54fdv/NODQ3dQ3rC3AMW39TuA5SV7c+j+4DS/tiefT8ayCWlgGuJbaBxkMh3w7gy+q/mBrvxY4fuebmMDfAn+T5GvAfkP2vTrJ1p0/DILzrcDlSb7NINBfME8fFwCnJbmZwbj9NuDBqrqfwVDM7TPexBzGvzKYTZOqegR4A/DRJLcCG3jy7nxeGcwlfwyDsXzJ2QilmZL8AvBYVT2awbeef2z2MNAIfV4PnDHudzgmeS1wYlX9xTj9aN/hGLj0s44GrkzyNAafXf/DBejz3a3fcb+Edxnw9+OXo32Fd+CS1CnHwCWpUwa4JHXKAJekThngktQpA1ySOvX/NzftIexlKbMAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Histogram \n",
"sns.distplot(a=iris_data['Petal Length (cm)'], kde=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.016218,
"end_time": "2020-10-01T00:23:45.992651",
"exception": false,
"start_time": "2020-10-01T00:23:45.976433",
"status": "completed"
},
"tags": []
},
"source": [
"We customize the behavior of the command with two additional pieces of information:\n",
"- `a=` chooses the column we'd like to plot (_in this case, we chose `'Petal Length (cm)'`_).\n",
"- `kde=False` is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.\n",
"\n",
"# Density plots\n",
"\n",
"The next type of plot is a **kernel density estimate (KDE)** plot. In case you're not familiar with KDE plots, you can think of it as a smoothed histogram. \n",
"\n",
"To make a KDE plot, we use the `sns.kdeplot` command. Setting `shade=True` colors the area below the curve (_and `data=` has identical functionality as when we made the histogram above_)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:46.040180Z",
"iopub.status.busy": "2020-10-01T00:23:46.039214Z",
"iopub.status.idle": "2020-10-01T00:23:46.334546Z",
"shell.execute_reply": "2020-10-01T00:23:46.332709Z"
},
"papermill": {
"duration": 0.325589,
"end_time": "2020-10-01T00:23:46.334742",
"exception": false,
"start_time": "2020-10-01T00:23:46.009153",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# KDE plot \n",
"sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.024866,
"end_time": "2020-10-01T00:23:46.397329",
"exception": false,
"start_time": "2020-10-01T00:23:46.372463",
"status": "completed"
},
"tags": []
},
"source": [
"# 2D KDE plots\n",
"\n",
"We're not restricted to a single column when creating a KDE plot. We can create a **two-dimensional (2D) KDE plot** with the `sns.jointplot` command.\n",
"\n",
"In the plot below, the color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:46.456015Z",
"iopub.status.busy": "2020-10-01T00:23:46.454735Z",
"iopub.status.idle": "2020-10-01T00:23:48.385769Z",
"shell.execute_reply": "2020-10-01T00:23:48.384988Z"
},
"papermill": {
"duration": 1.963271,
"end_time": "2020-10-01T00:23:48.385912",
"exception": false,
"start_time": "2020-10-01T00:23:46.422641",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 2D KDE plot\n",
"sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind=\"kde\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.018782,
"end_time": "2020-10-01T00:23:48.425599",
"exception": false,
"start_time": "2020-10-01T00:23:48.406817",
"status": "completed"
},
"tags": []
},
"source": [
"Note that in addition to the 2D KDE plot in the center,\n",
"- the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, `iris_data['Petal Length (cm)']`), and\n",
"- the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, `iris_data['Sepal Width (cm)']`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.019473,
"end_time": "2020-10-01T00:23:48.463985",
"exception": false,
"start_time": "2020-10-01T00:23:48.444512",
"status": "completed"
},
"tags": []
},
"source": [
"# Color-coded plots\n",
"\n",
"For the next part of the tutorial, we'll create plots to understand differences between the species. To accomplish this, we begin by breaking the dataset into three separate files, with one for each species."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:48.512500Z",
"iopub.status.busy": "2020-10-01T00:23:48.511368Z",
"iopub.status.idle": "2020-10-01T00:23:48.542725Z",
"shell.execute_reply": "2020-10-01T00:23:48.541878Z"
},
"papermill": {
"duration": 0.059858,
"end_time": "2020-10-01T00:23:48.542913",
"exception": false,
"start_time": "2020-10-01T00:23:48.483055",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Sepal Length (cm)
\n",
"
Sepal Width (cm)
\n",
"
Petal Length (cm)
\n",
"
Petal Width (cm)
\n",
"
Species
\n",
"
\n",
"
\n",
"
Id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
51
\n",
"
7.0
\n",
"
3.2
\n",
"
4.7
\n",
"
1.4
\n",
"
Iris-versicolor
\n",
"
\n",
"
\n",
"
52
\n",
"
6.4
\n",
"
3.2
\n",
"
4.5
\n",
"
1.5
\n",
"
Iris-versicolor
\n",
"
\n",
"
\n",
"
53
\n",
"
6.9
\n",
"
3.1
\n",
"
4.9
\n",
"
1.5
\n",
"
Iris-versicolor
\n",
"
\n",
"
\n",
"
54
\n",
"
5.5
\n",
"
2.3
\n",
"
4.0
\n",
"
1.3
\n",
"
Iris-versicolor
\n",
"
\n",
"
\n",
"
55
\n",
"
6.5
\n",
"
2.8
\n",
"
4.6
\n",
"
1.5
\n",
"
Iris-versicolor
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sepal Length (cm) Sepal Width (cm) Petal Length (cm) Petal Width (cm) \\\n",
"Id \n",
"51 7.0 3.2 4.7 1.4 \n",
"52 6.4 3.2 4.5 1.5 \n",
"53 6.9 3.1 4.9 1.5 \n",
"54 5.5 2.3 4.0 1.3 \n",
"55 6.5 2.8 4.6 1.5 \n",
"\n",
" Species \n",
"Id \n",
"51 Iris-versicolor \n",
"52 Iris-versicolor \n",
"53 Iris-versicolor \n",
"54 Iris-versicolor \n",
"55 Iris-versicolor "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Paths of the files to read\n",
"iris_set_filepath = \"../input/iris_setosa.csv\"\n",
"iris_ver_filepath = \"../input/iris_versicolor.csv\"\n",
"iris_vir_filepath = \"../input/iris_virginica.csv\"\n",
"\n",
"# Read the files into variables \n",
"iris_set_data = pd.read_csv(iris_set_filepath, index_col=\"Id\")\n",
"iris_ver_data = pd.read_csv(iris_ver_filepath, index_col=\"Id\")\n",
"iris_vir_data = pd.read_csv(iris_vir_filepath, index_col=\"Id\")\n",
"\n",
"# Print the first 5 rows of the Iris versicolor data\n",
"iris_ver_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.019678,
"end_time": "2020-10-01T00:23:48.585834",
"exception": false,
"start_time": "2020-10-01T00:23:48.566156",
"status": "completed"
},
"tags": []
},
"source": [
"In the code cell below, we create a different histogram for each species by using the `sns.distplot` command (_as above_) three times. We use `label=` to set how each histogram will appear in the legend."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:48.641160Z",
"iopub.status.busy": "2020-10-01T00:23:48.639985Z",
"iopub.status.idle": "2020-10-01T00:23:48.945141Z",
"shell.execute_reply": "2020-10-01T00:23:48.943938Z"
},
"papermill": {
"duration": 0.339684,
"end_time": "2020-10-01T00:23:48.945279",
"exception": false,
"start_time": "2020-10-01T00:23:48.605595",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Histograms for each species\n",
"sns.distplot(a=iris_set_data['Petal Length (cm)'], label=\"Iris-setosa\", kde=False)\n",
"sns.distplot(a=iris_ver_data['Petal Length (cm)'], label=\"Iris-versicolor\", kde=False)\n",
"sns.distplot(a=iris_vir_data['Petal Length (cm)'], label=\"Iris-virginica\", kde=False)\n",
"\n",
"# Add title\n",
"plt.title(\"Histogram of Petal Lengths, by Species\")\n",
"\n",
"# Force legend to appear\n",
"plt.legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.021305,
"end_time": "2020-10-01T00:23:48.988296",
"exception": false,
"start_time": "2020-10-01T00:23:48.966991",
"status": "completed"
},
"tags": []
},
"source": [
"In this case, the legend does not automatically appear on the plot. To force it to show (for any plot type), we can always use `plt.legend()`.\n",
"\n",
"We can also create a KDE plot for each species by using `sns.kdeplot` (_as above_). Again, `label=` is used to set the values in the legend."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2020-10-01T00:23:49.049487Z",
"iopub.status.busy": "2020-10-01T00:23:49.041583Z",
"iopub.status.idle": "2020-10-01T00:23:49.375110Z",
"shell.execute_reply": "2020-10-01T00:23:49.374295Z"
},
"papermill": {
"duration": 0.364844,
"end_time": "2020-10-01T00:23:49.375244",
"exception": false,
"start_time": "2020-10-01T00:23:49.010400",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Distribution of Petal Lengths, by Species')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# KDE plots for each species\n",
"sns.kdeplot(data=iris_set_data['Petal Length (cm)'], label=\"Iris-setosa\", shade=True)\n",
"sns.kdeplot(data=iris_ver_data['Petal Length (cm)'], label=\"Iris-versicolor\", shade=True)\n",
"sns.kdeplot(data=iris_vir_data['Petal Length (cm)'], label=\"Iris-virginica\", shade=True)\n",
"\n",
"# Add title\n",
"plt.title(\"Distribution of Petal Lengths, by Species\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.022477,
"end_time": "2020-10-01T00:23:49.422527",
"exception": false,
"start_time": "2020-10-01T00:23:49.400050",
"status": "completed"
},
"tags": []
},
"source": [
"One interesting pattern that can be seen in plots is that the plants seem to belong to one of two groups, where _Iris versicolor_ and _Iris virginica_ seem to have similar values for petal length, while _Iris setosa_ belongs in a category all by itself. \n",
"\n",
"In fact, according to this dataset, we might even be able to classify any iris plant as *Iris setosa* (as opposed to *Iris versicolor* or *Iris virginica*) just by looking at the petal length: if the petal length of an iris flower is less than 2 cm, it's most likely to be *Iris setosa*!"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.022517,
"end_time": "2020-10-01T00:23:49.467914",
"exception": false,
"start_time": "2020-10-01T00:23:49.445397",
"status": "completed"
},
"tags": []
},
"source": [
"# What's next?\n",
"\n",
"Put your new skills to work in a **[coding exercise](https://www.kaggle.com/kernels/fork/2951534)**!"
]
},
{
"cell_type": "markdown",
"metadata": {
"papermill": {
"duration": 0.022592,
"end_time": "2020-10-01T00:23:49.513374",
"exception": false,
"start_time": "2020-10-01T00:23:49.490782",
"status": "completed"
},
"tags": []
},
"source": [
"---\n",
"\n",
"\n",
"\n",
"\n",
"*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/161291) to chat with other Learners.*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
},
"papermill": {
"duration": 11.488358,
"end_time": "2020-10-01T00:23:49.649098",
"environment_variables": {},
"exception": null,
"input_path": "__notebook__.ipynb",
"output_path": "__notebook__.ipynb",
"parameters": {},
"start_time": "2020-10-01T00:23:38.160740",
"version": "2.1.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}