This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.
In this notebook, you'll use Google Cloud AutoML Tables to generate a submission for a Kaggle competition.
You'll work with the House Prices: Advanced Regression Techniques competition. The competition is simple: we want you to use 79 different explanatory variables (such as the type of roof, number of bedrooms, and number of bathrooms) to predict home prices.
Before we begin, an important note!
Furthermore, this notebook is optional and does not have to be completed to get full credit for the Intro to Machine Learning course.
To begin, we'll need to make sure that your notebook is set up to run the code. Begin by looking at the "Settings" menu to the right of your notebook. Your menu will look like one of the following:
If your "Internet" setting appears as a "Requires phone verification" link, click on this link. This will bring you to a new window; then, follow the instructions to verify your account. After following this step, your "Internet" setting will appear "Off", as in the example to the right.
Once your "Internet" setting appears as "Off", click to turn it on. You'll see a pop-up window that you'll need to "Accept" in order to complete the process and have the setting switched to "On".
Once you have followed the steps above, you're ready to proceed!
Next, create a Google Cloud account by following the instructions here. You'll also learn how to claim $300 of free credits!
Then, connect your Google Cloud account to this notebook by selecting Add-ons > Google Cloud Services.
In the pop-up window, select Cloud Storage and AutoML (beta). Then click on Link Account.
You'll see another pop-up that tells you about Google AutoML pricing. Once you have reviewed this information, click on ENABLE. Then, sign in with the e-mail address that is linked to your Google Cloud account.
Once your account is attached to the notebook, you can close the pop-up.
We have supplied values for the following variables for you:
DATASET_DISPLAY_NAME
- The name of your dataset (should use at most 32 characters; allowed characters: ASCII Latin letters A-Z and a-z, an underscore ("_"), and ASCII digits 0-9). TRAIN_FILEPATH
- The filepath for the training data (train.csv
file) from the competition.TEST_FILEPATH
- The filepath for the test data (test.csv
file) from the competition.TARGET_COLUMN
- The name of the column in your training data that contains the values you'd like to predict.ID_COLUMN
- The name of the column containing IDs.MODEL_DISPLAY_NAME
- The name of your model (should use at most 32 characters; allowed characters: ASCII Latin letters A-Z and a-z, an underscore ("_"), and ASCII digits 0-9).TRAIN_BUDGET
- How long you want your model to train (use 1000 for 1 hour, 2000 for 2 hours, and so on). In this case, since we filled in a value of 2000, the model will train for up to two hours. At the time of publishing, AutoML charges $19.32 for each hour of training. For some general guidelines about how to select this value, check out the notes on training budget at this link.You'll need to fill in values for:
PROJECT_ID
- The project ID you created when following the instructions to create a Google Cloud account. BUCKET_NAME
- The name of your Google Cloud storage bucket. In order to work with AutoML, we'll need to create a storage bucket, where we'll upload the Kaggle dataset. Running the code cell will create the bucket for you. Use these guidelines when creating a bucket name:Once you've done that, run the code cell.
# TODO: Fill in your project ID and bucket name
PROJECT_ID = 'your-project-id-here'
BUCKET_NAME = 'your-bucket-name-here'
# Do not change: Fill in the remaining variables
DATASET_DISPLAY_NAME = 'house_prices_dataset'
TRAIN_FILEPATH = "../input/house-prices-advanced-regression-techniques/train.csv"
TEST_FILEPATH = "../input/house-prices-advanced-regression-techniques/test.csv"
TARGET_COLUMN = 'SalePrice'
ID_COLUMN = 'Id'
MODEL_DISPLAY_NAME = 'house_prices_model'
TRAIN_BUDGET = 2000
# Do not change: Create an instance of the wrapper
from automl_tables_wrapper import AutoMLTablesWrapper
amw = AutoMLTablesWrapper(project_id=PROJECT_ID,
bucket_name=BUCKET_NAME,
dataset_display_name=DATASET_DISPLAY_NAME,
train_filepath=TRAIN_FILEPATH,
test_filepath=TEST_FILEPATH,
target_column=TARGET_COLUMN,
id_column=ID_COLUMN,
model_display_name=MODEL_DISPLAY_NAME,
train_budget=TRAIN_BUDGET)
Once you get a Ready to train model.
result, you're ready to move on!
The next step is to commit your notebook.
To commit your notebook (and submit predictions to the competition),
You have now successfully submitted to the competition!
If you want to keep working to improve your performance, select the Edit button in the top right of the screen. Then you can change your code and repeat the process. There's a lot of room to improve, and you will climb up the leaderboard as you work.
# Do not change: Create and train the model
amw.train_model()
# Do not change: Get predictions
amw.get_predictions()
In this notebook, you used automated machine learning to generate a submission to a Kaggle competition. All of the steps up to generating predictions were completed for you!
If you're interested in learning more, you can read about Google Cloud AutoML Tables here.
To use the code presented here to train models on other datasets, you may need to make some changes to the wrapper, which you can find here. (Currently, the code works only for regression tasks, but everything you need to know to amend it for a classification task can be found here.)
Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.