LightGBM Flashcards

โšก LightGBM Flashcards

A gradient boosting framework that uses tree-based learning algorithms

๐ŸŒณ What is LightGBM?

A high-performance gradient boosting framework optimized for speed and accuracy, especially on large datasets.

๐Ÿš€ Speed Advantage

Uses histogram-based algorithms and leaf-wise tree growth, making it significantly faster than XGBoost in many cases.

๐Ÿ“Š Dataset Compatibility

Best suited for large datasets with thousands of features. Handles both categorical and numerical features efficiently.

๐Ÿงช Use Cases

Fraud detection, click-through rate prediction, ranking tasks, and any tabular supervised learning problem.

โš™๏ธ Core Features

Supports parallel learning, efficient memory usage, GPU training, and early stopping.

๐Ÿ“ฆ Installation

Install with pip install lightgbm. Requires CMake and compilers for advanced builds.

๐Ÿ“ˆ Tuning Parameters

Key parameters: num_leaves, learning_rate, n_estimators, max_depth.

๐Ÿ“‰ Overfitting Control

Use max_depth, min_data_in_leaf, feature_fraction, and bagging_fraction to reduce overfitting.

๐Ÿ“‚ File Format

Model can be saved in text or binary format using booster.save_model() method.

๐Ÿงฎ Categorical Features

Use categorical_feature argument to handle non-numeric features natively without one-hot encoding.

๐Ÿ’ป GPU Support

Enable with device = 'gpu' for faster training on large datasets.

๐Ÿ”„ Integration

Integrates with scikit-learn API and supports use with libraries like Optuna and MLflow for hyperparameter tuning and tracking.