🐱 CatBoost Flashcards
A gradient boosting framework optimized for categorical data and ease of use
🐱 What is CatBoost?
CatBoost is an open-source gradient boosting library developed by Yandex. It handles categorical features natively and supports classification and regression.
🎯 Key Strength
Excels at handling categorical variables without the need for preprocessing like one-hot encoding.
🧠 Use Cases
Widely used in fraud detection, customer churn prediction, credit scoring, and recommendation systems.
⚙️ Installation
Install using pip install catboost
. No GPU configuration needed unless desired.
📈 Built-in Visualization
CatBoost supports built-in model evaluation plots, feature importance, and cross-validation tools.
🧩 Native Categorical Handling
Simply specify categorical feature indices when creating the Pool
object—no need for manual encoding.
🎛️ Parameters
Important hyperparameters include depth
, iterations
, learning_rate
, and l2_leaf_reg
.
🖥️ GPU Training
Use task_type="GPU"
in CatBoostClassifier
or CatBoostRegressor
for faster training on large data.
💡 Auto Feature Engineering
Performs automatic feature combinations and target encoding behind the scenes.
🔁 Model Saving
Use model.save_model("model.cbm")
to save and reload with load_model
.
🔍 Evaluation Metrics
Supports metrics like AUC, RMSE, Logloss, MAE, and allows custom metrics.
🤝 Integration
Integrates easily with scikit-learn pipeline, Optuna, MLflow, and supports JSON, CSV, and NumPy formats.