docs: add snippet for creating boosted tree model#1142
Conversation
| import bigframes.ml.linear_model | ||
|
|
||
| # input_data is defined in an earlier step. | ||
| training_data = input_data[input_data["dataframe"] == "training"] |
There was a problem hiding this comment.
No action needed, but something to consider for future: it would be nice to update the prepare section above to work without referencing an index (e.g. when ordering mode = "partial").
We have a few options, but the easiest will be to start with a string column and add (True, "training") as the last in the list of cases.
Aside: we have an issue open (349926559) to allow selecting any column in the dataframe (such as functional_weight, which would be a natural choice in this example) even if its a different type, so long as a True (default) case is provided.
| # input_data is defined in an earlier step. | ||
| training_data = input_data[input_data["dataframe"] == "training"] | ||
| X = training_data.drop(columns=["income_bracket", "dataframe"]) | ||
| y = training_data["income_bracket"] |
There was a problem hiding this comment.
Presumably you ran this code sample and it worked OK? I remember we had some bugs where y had to be a DataFrame not a Series in past, so just double-checking.
There was a problem hiding this comment.
The code sample seems to run! Not sure if I did it right so here's the colab: https://colab.sandbox.google.com/drive/10jA6zSRiptXWrTkCcmyCT_sYBjLqGJx0?resourcekey=0-0TrIkmDzAJw_F6ONFikwaA#scrollTo=wU367u1SAj3Y
| census_model = bigframes.ml.linear_model.LogisticRegression( | ||
| # model_type="BOOSTED_TREE_CLASSIFIER", | ||
| # booster_type="gbtree", | ||
| max_iterations=50, | ||
| ) |
There was a problem hiding this comment.
I don't think we should be doing LogisticRegression here. In the SQL we do use model_type='BOOSTED_TREE_CLASSIFIER', but in BigQuery DataFrames we normally use separate Python classes to represent the different model types.
A few ways to discover which class to use:
- Search our code for
BOOSTED_TREE_CLASSIFIER - Google search for boosted trees BigFrames
These should give you some strong hints as to which class to use instead.
There was a problem hiding this comment.
Copying this from an internal comment I made for visibility:
Just like scikit-learn, it's one of the "ensemble" methods: https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble
Normally we try to use the scikit-learn class names too, but I think we may have added this class before GradientBoostingClassifier was in scikit-learn
|
Here is the summary of changes. You are about to add 1 region tag.
This comment is generated by snippet-bot.
|
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕