[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화

캐글 보충

[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화

dongsunseng 2024. 11. 17. 22:50

Multiclass Classification can be divided into 2 categories: ordinal classification and nominal classification.

For nominal(명목형) classification problems, you can think of a multiclass classification algorithm that outputs probability distributions like [0.7, 0.1, 0.2] for distinguishing between car, human, and tree.
For ordinal classification, you can think of a problem that categorizes a child's computer addiction into 4 ordered levels: Very Severe, Severe, Moderate, and Good.

Solving Nominal Classification Problems

No order or magnitude relationship between classes.
Sum of outputs must be 1 (probability).
Independent threshold setting for each class.
Primarily uses one-vs-rest approach.
Typically uses Softmax function.

# 일반적인 다중분류 신경망 예시
class NominalClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 3),  # 3개 클래스
            nn.Softmax(dim=1)  # 확률 합이 1이 되도록
        )
    
    def forward(self, x):
        return self.model(x)

# 손실 함수
criterion = nn.CrossEntropyLoss()

What does threshold optimization mean in nominal classification problems:

Generally, multi-class models output probability values for each class.
By default, predictions are made based on the class with the highest probability.
However, this default approach isn't always optimal.
Therefore, we use threshold optimization to
- Address class imbalance problems
- Adjust False Positive/Negative ratios for specific classes

1. Class Imbalance

# For imbalanced data
# Dogs: 1000 samples, Cats: 100 samples, Birds: 50 samples
# Lower thresholds for minority classes to increase prediction opportunities
thresholds = [0.6, 0.4, 0.3] # Higher for majority class, lower for minority classes

2. Different Misclassification Costs(오분류 비용이 다른 경우)

# 예: 새를 강아지로 잘못 분류하는 것이 고양이로 잘못 분류하는 것보다 심각한 경우
def cost_sensitive_predict(probs):
    # 새에 대한 임계값을 낮게 설정
    thresholds = [0.5, 0.5, 0.3]
    predictions = (probs >= thresholds)
    # ... 판단 로직

3. Adjusting Precision/Recall for Specific Classes(특정 클래스의 정밀도/재현율 조정)

# 강아지 클래스의 정밀도를 높이고 싶은 경우
# 강아지 클래스의 임계값을 높게 설정
thresholds = [0.7, 0.5, 0.5]

# 강아지 클래스의 재현율을 높이고 싶은 경우
# 강아지 클래스의 임계값을 낮게 설정
thresholds = [0.3, 0.5, 0.5]

Types:

1. Optimization through Grid Search

Set threshold values independently for each class.
Search for optimal values by trying all possible combinations.

# 순서관계가 없는 경우: 각 클래스를 독립적으로 처리
thresholds = np.arange(0.1, 0.9, 0.1)
for t1 in thresholds:
    for t2 in thresholds:
        pred = (proba > [t1, t2]).astype(int)

2. ROC Curve Analysis

Process by converting each class into a binary classification problem.
Optimize performance of each class independently without considering order.

# One-vs-Rest 방식으로 각 클래스별 독립적인 ROC 분석
for class_idx in range(n_classes):
    fpr, tpr, thresholds = roc_curve(y_true[:, class_idx], y_pred[:, class_idx])
    # Youden's J statistic
    j_scores = tpr - fpr
    optimal_threshold = thresholds[np.argmax(j_scores)]

3. Using Precision-Recall Curves

Independent optimization for each class.
Effective with imbalanced data.

# 각 클래스별로 독립적인 PR 곡선 분석
for i in range(n_classes):
    precision, recall, thresholds = precision_recall_curve(y_true[:, i], y_pred[:, i])
    f1_scores = 2 * (precision * recall) / (precision + recall)

4. Cost Function-based Optimization

Considers only misclassification costs without regard to order relationships.
Enables independent cost setting for each class.

def custom_cost(threshold, proba, y_true):
    pred = (proba > threshold).astype(int)
    fp_cost = 1
    fn_cost = 2

5. Validation through Cross-Validation

Validation technique applicable to all optimization methods.
Used regardless of order relationships.

for train_idx, val_idx in kf.split(X):
    fold_threshold = find_optimal_threshold(X[train_idx], y[train_idx])

Solving Ordinal Classification Problems

Order relationship exists between classes.
Relationships between adjacent classes are important.
Requires special ordinal encoding/decoding.

class OrdinalClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.base_model = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU()
        )
        # 각 임계점에 대한 이진 분류기
        self.thresholds = nn.Linear(64, 3)  # 4개 클래스면 3개 임계점
        
    def forward(self, x):
        features = self.base_model(x)
        # 누적 확률 계산
        cumulative_probs = torch.sigmoid(self.thresholds(features))
        # 클래스 확률 계산
        probs = torch.zeros(x.size(0), 4)  # 4개 클래스
        probs[:, 0] = 1 - cumulative_probs[:, 0]
        probs[:, 1] = cumulative_probs[:, 0] - cumulative_probs[:, 1]
        probs[:, 2] = cumulative_probs[:, 1] - cumulative_probs[:, 2]
        probs[:, 3] = cumulative_probs[:, 2]
        return probs

# 순서를 고려한 손실 함수
class OrdinalLoss(nn.Module):
    def __init__(self):
        super().__init__()
        
    def forward(self, predictions, targets):
        # 순서 관계를 반영한 가중치 부여
        weights = torch.abs(
            torch.arange(predictions.size(1))[None, :] - 
            targets[:, None]
        )
        return torch.mean(weights * nn.CrossEntropyLoss(reduction='none')
                        (predictions, targets))

What does threshold optimization mean in nominal classification problems:

Finding the optimal thresholds for converting predicted values into actual classes.
Finds the best boundary values to divide continuous model predictions into four classes (0, 1, 2, 3).
Aims to find thresholds that maximize the Quadratic Weighted Kappa score for example of a evaluation method.

KappaOptimizer = minimize(evaluate_predictions,
                         x0=[0.5, 1.5, 2.5],  # initial thresholds
                         args=(y, oof_non_rounded),  # actual and predicted values
                         method='Nelder-Mead')  # optimization algorithm

Separation
- x < 0.5 is class 0
- 0.5 ≤ x < 1.5 is class 1
- 1.5 ≤ x < 2.5 is class 2
- x ≥ 2.5 is class 3
Process
- Uses the Nelder-Mead algorithm to iteratively adjust thresholds
- For each attempt, calls evaluate_predictions function to:
  - Convert predicted values to classes using current thresholds
  - Calculate Quadratic Weighted Kappa score
  - Return negative score (since minimize function minimizes, we use negative for maximization)
- tpTuned = threshold_Rounder(tpm, KappaOptimizer.x)
  - Uses the found optimal thresholds (KappaOptimizer.x) for final predictions

Types:

1. Kappa Optimization using Nelder-Mead

Also called as Simplex Method.
Nonlinear Optimization Algorithm.
Particularly useful for functions that are non-differentiable or complex.

2. Cumulative Probability-based Threshold Optimization

3. Cost Function Optimization Considering Order

4. Binary Classifier Combination using Frank & Hall Method

5. Threshold Optimization through Cross-validation

6. Ensemble-based Threshold Optimization

Needs to be updated with more details (2024.11.17)

I'm not here to take part; I'm here to take over.
- Conor Mcgregor -

저작자표시 비영리 변경금지

'캐글 보충' 카테고리의 다른 글

[Kaggle Extra Study] 18. Types of Correlation Analysis (0)	2024.11.23
[Kaggle Extra Study] 16. Handling Categorical Variables (2)	2024.11.11
[Kaggle Extra Study] 15. GBM vs. XGBoost (0)	2024.11.10
[Kaggle Extra Study] 14. Tree-based Ensemble Models (1)	2024.11.10
[Kaggle Extra Study] 13. Weight Initialization (3)	2024.11.09

현재글[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화

캐글에 도전중인 Stony Brook University 졸업생입니다 =)

캐글, 사이드 프로젝트, time-series, nodejs, 티스토리챌린지, cnn, Express, 오블완, ML, SUNYTIME, Underfitting, backend, 웹개발, dl, Overfitting, Kaggle, translation invariance, xgboost, 시계열, home credit default risk,

Today :
Yesterday :

동선생