Wednesday, December 1, 2021

Jota's ML Advent Calendar - 01/December

 Hopping on the idea of the “advent calendar” that is very typical in Germany (at least in Bavaria), during this month I’ll be sharing every day one link/info I’ve read recently in the field of AI/ML that I found interesting.

For a start, my suggestion of the day is FLAML (https://github.com/microsoft/FLAML). This is an AutoML python library available on GitHub and created by people at MSR, based on research published end of last year. Why another one, especially considering we already have this on Azure Machine Learning? A few reasons I like it:

  • It’s fully Python based with a super-simple API. You have control of all the parameters in your experiments in code that you can source control.
  • In my experiments (for tabular data + classification/regression), I got consistently good results.
  • You can set a training budget – for how long you want it to train.
  • You can pick the algorithms you want to use in the training (most common being LightGBM, XGBoost and Catboost) – and if you pick only one, you’re effectively doing hyper-param tuning.
  • Supports sklearn pipelines (i.e., you can for example do Automl as the final step of a training pipeline).
  • You can do an optimization run based on a previous run, to further optimize results you’ve already obtained.
  • Has support for ensembles/stacks, where a set of models are trained to do a 1st prediction, and then a final estimator builds on the outputs of the previous predictors to make a final prediction.
  • And obviously, runs in AML (albeit not benefiting from clusters/parallelization).

Two other relevant links I’d also include are Optuna (https://optuna.org/), a library specifically for hyperparam tuning (similar to HyperOpt), and LightAutoML (https://github.com/sberbank-ai-lab/LightAutoML), both widely used in Kaggle competitions.

Cheers!

No comments:

Post a Comment