CatBoost Flashcards

🐱 CatBoost Flashcards

A gradient boosting framework optimized for categorical data and ease of use

🐱 What is CatBoost?

CatBoost is an open-source gradient boosting library developed by Yandex. It handles categorical features natively and supports classification and regression.

🎯 Key Strength

Excels at handling categorical variables without the need for preprocessing like one-hot encoding.

🧠 Use Cases

Widely used in fraud detection, customer churn prediction, credit scoring, and recommendation systems.

⚙️ Installation

Install using pip install catboost. No GPU configuration needed unless desired.

📈 Built-in Visualization

CatBoost supports built-in model evaluation plots, feature importance, and cross-validation tools.

🧩 Native Categorical Handling

Simply specify categorical feature indices when creating the Pool object—no need for manual encoding.

🎛️ Parameters

Important hyperparameters include depth, iterations, learning_rate, and l2_leaf_reg.

🖥️ GPU Training

Use task_type="GPU" in CatBoostClassifier or CatBoostRegressor for faster training on large data.

💡 Auto Feature Engineering

Performs automatic feature combinations and target encoding behind the scenes.

🔁 Model Saving

Use model.save_model("model.cbm") to save and reload with load_model.

🔍 Evaluation Metrics

Supports metrics like AUC, RMSE, Logloss, MAE, and allows custom metrics.

🤝 Integration

Integrates easily with scikit-learn pipeline, Optuna, MLflow, and supports JSON, CSV, and NumPy formats.