{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**This notebook is an exercise in the [Introduction to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning) course. You can reference the tutorial at [this link](https://www.kaggle.com/alexisbcook/intro-to-automl).**\n", "\n", "---\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "In this notebook, you'll use **Google Cloud AutoML Tables** to generate a submission for a Kaggle competition.\n", "\n", "You'll work with the **[House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)** competition. The competition is simple: we want you to use 79 different explanatory variables (such as the type of roof, number of bedrooms, and number of bathrooms) to predict home prices.\n", "\n", "# Note\n", "\n", "Before we begin, an important note! \n", "
\n", "Note: Google Cloud AutoML Tables is a paid service. At the time of publishing, it charges \\$19.32 per hour of compute during training and \\$1.16 per hour of compute for batch prediction. You can find more details here. \n", "

\n", "\n", "Furthermore, this notebook is **optional** and does not have to be completed to get full credit for the **[Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)** course.\n", "\n", "# Set up the notebook\n", "\n", "To begin, we'll need to make sure that your notebook is set up to run the code. Begin by looking at the **\"Settings\"** menu to the right of your notebook. Your menu will look like one of the following:\n", "\n", "
\n", "
\n", "
\n", "\n", "If your **\"Internet\"** setting appears as a **\"Requires phone verification\"** link, click on this link. This will bring you to a new window; then, follow the instructions to verify your account. After following this step, your **\"Internet\"** setting will appear **\"Off\"**, as in the example to the right.\n", "\n", "Once your **\"Internet\"** setting appears as **\"Off\"**, click to turn it on. You'll see a pop-up window that you'll need to **\"Accept\"** in order to complete the process and have the setting switched to **\"On\"**. \n", "\n", "
\n", "
\n", "
\n", "\n", "Once you have followed the steps above, you're ready to proceed!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Set up Google Cloud\n", "\n", "Next, create a Google Cloud account by following the instructions **[here](https://www.kaggle.com/alexisbcook/get-started-with-google-cloud-platform)**. You'll also learn how to claim $300 of free credits!\n", "\n", "Then, connect your Google Cloud account to this notebook by selecting **Add-ons > Google Cloud Services**.\n", "\n", "![](https://i.imgur.com/UHB4P5o.png)\n", "\n", "In the pop-up window, select **Cloud Storage** and **AutoML (beta)**. Then click on **Link Account**.\n", "\n", "![](https://i.imgur.com/IlbdbHD.png)\n", "\n", "You'll see another pop-up that tells you about Google AutoML pricing. Once you have reviewed this information, click on **ENABLE**. Then, sign in with the e-mail address that is linked to your Google Cloud account. \n", "\n", "Once your account is attached to the notebook, you can close the pop-up.\n", "\n", "# Get started with AutoML\n", "\n", "We have supplied values for the following variables for you:\n", "- `DATASET_DISPLAY_NAME` - The name of your dataset (should use at most 32 characters; allowed characters: ASCII Latin letters A-Z and a-z, an underscore (\"\\_\"), and ASCII digits 0-9). \n", "- `TRAIN_FILEPATH` - The filepath for the training data (`train.csv` file) from the competition.\n", "- `TEST_FILEPATH` - The filepath for the test data (`test.csv` file) from the competition.\n", "- `TARGET_COLUMN` - The name of the column in your training data that contains the values you'd like to predict.\n", "- `ID_COLUMN` - The name of the column containing IDs.\n", "- `MODEL_DISPLAY_NAME` - The name of your model (should use at most 32 characters; allowed characters: ASCII Latin letters A-Z and a-z, an underscore (\"\\_\"), and ASCII digits 0-9).\n", "- `TRAIN_BUDGET` - How long you want your model to train (use 1000 for 1 hour, 2000 for 2 hours, and so on). In this case, since we filled in a value of 2000, the model will train for up to two hours. **At the time of publishing, AutoML charges $19.32 for each hour of training.** For some general guidelines about how to select this value, check out the notes on training budget at [this link](https://cloud.google.com/automl-tables/docs/train).\n", "\n", "You'll need to fill in values for:\n", "- `PROJECT_ID` - The project ID you created when following the instructions to create a Google Cloud account. \n", "- `BUCKET_NAME` - The name of your [Google Cloud storage bucket](https://cloud.google.com/storage/docs/creating-buckets). In order to work with AutoML, we'll need to create a storage bucket, where we'll upload the Kaggle dataset. Running the code cell will create the bucket for you. Use these guidelines when creating a bucket name:\n", " - Bucket names must contain only lowercase letters, numbers, dashes (\"-\"), and underscores (\"_\"). Spaces are not allowed. \n", " - Bucket names must start and end with a number or letter.\n", " - Bucket names must contain 3-63 characters. \n", "\n", "Once you've done that, run the code cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: Fill in your project ID and bucket name\n", "PROJECT_ID = 'your-project-id-here'\n", "BUCKET_NAME = 'your-bucket-name-here'\n", "\n", "# Do not change: Fill in the remaining variables\n", "DATASET_DISPLAY_NAME = 'house_prices_dataset'\n", "TRAIN_FILEPATH = \"../input/house-prices-advanced-regression-techniques/train.csv\"\n", "TEST_FILEPATH = \"../input/house-prices-advanced-regression-techniques/test.csv\"\n", "TARGET_COLUMN = 'SalePrice'\n", "ID_COLUMN = 'Id'\n", "MODEL_DISPLAY_NAME = 'house_prices_model'\n", "TRAIN_BUDGET = 2000\n", "\n", "# Do not change: Create an instance of the wrapper\n", "from automl_tables_wrapper import AutoMLTablesWrapper\n", "\n", "amw = AutoMLTablesWrapper(project_id=PROJECT_ID,\n", " bucket_name=BUCKET_NAME,\n", " dataset_display_name=DATASET_DISPLAY_NAME,\n", " train_filepath=TRAIN_FILEPATH,\n", " test_filepath=TEST_FILEPATH,\n", " target_column=TARGET_COLUMN,\n", " id_column=ID_COLUMN,\n", " model_display_name=MODEL_DISPLAY_NAME,\n", " train_budget=TRAIN_BUDGET)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you get a `Ready to train model.` result, you're ready to move on!\n", "\n", "The next step is to commit your notebook.\n", "
\n", "Note: You should not run the next code cell before committing your notebook. These lines will be run for you when you commit your notebook. \n", "

\n", "\n", "To commit your notebook (and submit predictions to the competition), \n", "1. Begin by clicking on the **Save Version** button in the top right corner of the window. This will generate a pop-up window. \n", "2. Ensure that the **Save and Run All** option is selected, and then click on the **Save** button.\n", "3. This generates a window in the bottom left corner of the notebook. After it has finished running, click on the number to the right of the **Save Version** button. This pulls up a list of versions on the right of the screen. Click on the ellipsis **(...)** to the right of the most recent version, and select **Open in Viewer**. This brings you into view mode of the same page. You will need to scroll down to get back to these instructions.\n", "4. Click on the **Output** tab on the right of the screen. Then, click on the file you would like to submit, and click on the blue **Submit** button to submit your results to the leaderboard.\n", "\n", "You have now successfully submitted to the competition!\n", "\n", "If you want to keep working to improve your performance, select the **Edit** button in the top right of the screen. Then you can change your code and repeat the process. There's a lot of room to improve, and you will climb up the leaderboard as you work.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Do not change: Create and train the model\n", "amw.train_model()\n", "\n", "# Do not change: Get predictions\n", "amw.get_predictions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What's next?\n", "\n", "In this notebook, you used automated machine learning to generate a submission to a Kaggle competition. All of the steps up to generating predictions were completed for you!\n", "\n", "If you're interested in learning more, you can read about Google Cloud AutoML Tables **[here](https://cloud.google.com/automl-tables)**. \n", "\n", "To use the code presented here to train models on other datasets, you may need to make some changes to the wrapper, which you can find **[here](https://www.kaggle.com/alexisbcook/automl-tables-wrapper)**. (Currently, the code works only for regression tasks, but everything you need to know to amend it for a classification task can be found **[here](https://cloud.google.com/automl-tables/docs/predict-batch)**.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "\n", "\n", "\n", "*Have questions or comments? Visit the [Learn Discussion forum](https://www.kaggle.com/learn-forum/161285) to chat with other Learners.*" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 4 }