CIBMTR - Equity in post-HCT Survival Predictions #10 A general Understanding for AFT Loss function

대회

CIBMTR - Equity in post-HCT Survival Predictions #10 A general Understanding for AFT Loss function

dongsunseng 2025. 2. 6. 22:17

Annotation of the discussion about AFT loss function:

https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/discussion/550563

CIBMTR - Equity in post-HCT Survival Predictions

Improve prediction of transplant survival rates equitably for allogeneic HCT patients

www.kaggle.com

A general Understanding for AFT Loss function

My notebook using AFT Loss function is [CV0.665 LB0.666]cat+xgb with AFT loss function based on Dear @cdeotte's code, thanks!
- My annotation on the kernel:
The Accelerated Failure Time (AFT) model is a parametric survival analysis model that describes how covariates influence the survival time of an event.
Unlike Proportional Hazards (PH) models, including COX ph model, which assume covariates proportionally scale the hazard function, AFT models assume that covariates accelerate or decelerate the life course of a survival process by a multiplicative factor.
- Detailed explanation about Proportional Hazards model vs. Accelerated Failure Time model
  - Proportional Hazard(PH) Model:
    - # Hazard-based approach
      # Example: Comparing two patients
      Patient A's hazard = baseline hazard × 2.0 # 2 times riskier than baseline
      Patient B's hazard = baseline hazard × 0.5 # 0.5 times riskier than baseline
      # Feature: Hazard changes proportionally
  - Accelerated Failure Time(AFT) Model:
    - # Survival time-based approach
      # Example: Comparing two patients
      Patient A's survival time = baseline survival time × 0.5 # Progresses 2x faster than baseline
      Patient B's survival time = baseline survival time × 2.0 # Progresses 2x slower than baseline
      # Feature: Time scale is accelerated/decelerated
  - Example:
    - Situation: Effect of a specific treatment on disease progression
      PH Model Interpretation:
      - "Patients receiving this treatment have half the risk of death"
      AFT Model Interpretation:
      - "Disease progression is 2x slower in patients receiving this treatment"
      - i.e., it takes twice as long to reach the same stage
  - Key Differences:
    - PH Model: Focuses on hazard (risk)
    - AFT Model: Focuses on actual survival time
    - PH models "how risky"
    - AFT models "how fast/slow it progresses"

Detailed Explanation:
- Basic Model Equation:
  - log(T) = Xβ + ε
    where:
    T = survival time
    X = feature variables (age, gender, disease status, etc.)
    β = coefficients for each feature (impact)
    ε = error term (random variable following probability distribution)
- Acceleration Factor:
  - θ = exp(-Xβ)
    # Interpretation:
    θ > 1: survival time decreases (disease progresses faster)
    θ < 1: survival time increases (disease progresses slower)
- Example:
  - # Example: Modeling treatment effect
    X = [treatment_dose]
    β = -0.7 # assumed coefficient
  - # When treatment dose is 1 unit
    θ = exp(-1 × -0.7) = exp(0.7) ≈ 2.01
    # Interpretation: 1 unit of treatment doubles survival time
  - # When treatment dose is 2 units
    θ = exp(-2 × -0.7) = exp(1.4) ≈ 4.06
    # Interpretation: 2 units of treatment quadruples survival time
- Key Features:
  - Reasons for modeling log(T):
    - Survival time is always positive
    - Log transformation better satisfies normality assumption
    - Interpretation becomes easier with multiplicative effects

Detailed Explanation:
- Component Explanations:
  - tᵢ: observed survival time
    δᵢ: event occurrence indicator (1=occurred, 0=censored)
    μᵢ = Xᵢβ: predicted log-survival time
    σ: scale parameter controlling variance
    f(t): probability density function (PDF)
    S(t): survival function
- How log function works:
  - When event occurs (δᵢ = 1):
    - Loss = -log f(tᵢ; μᵢ, σ)
      # Tries to maximize PDF
      # Learns to increase probability density at actual occurrence time
    - PDF
      - PDF represents the probability density of an event occurring at a specific time point
      - Example: When a patient dies on day 100
      - pdf(t) = density of probability of death at a specific time t
      - High pdf value at t=100 = high probability of death around day 100
  - When censored (δᵢ = 0):
    - Loss = -log S(tᵢ; μᵢ, σ)
      # Tries to maximize survival function
      # Learns to increase probability of survival beyond observed time
    - Survival function S
      - S(t) = P(T > t) = probability of survival beyond time point t
      - Characteristics:
        
        Decreasing function over time (monotonically decreasing)
        
        Initial value S(0) = 1 (everyone is alive at time 0)
        
        When time approaches infinity, S(∞) = 0
- Example:
  - # Patient A: death at day 100 (δ = 1)
    Loss_A = -log f(100; μ_A, σ)
    # Learns to increase probability of death at day 100
    - In this case, we know the exact time of death
    - So model learns to predict high probability of death around day 100
    - Thus, "to make accurate predictions at the actual occurrence time", we "learn to increase probability density at actual occurrence time."
  - # Patient B: censored at day 80 (δ = 0)
    Loss_B = -log S(80; μ_B, σ)
    # Learns to increase probability of survival beyond day 80
    - In this case, we don't know when death occurred after day 80
    - we only know for certain they survived until day 80
    - Thus, it is reasonable to increase probability of survival beyond day 80
    - It is correct to decrease probability of death before day 80 and increase survival probability after
- Key points:
  - This loss function properly handles censored data
  - Considers both PDF and survival function for more accurate predictions
  - Choice of ε (random term) distribution affects baseline survival time T₀

Basic Assumption:
- ε ~ N(0, σ²) # Error term follows normal distribution
- This means:
  - log(survival time) follows normal distribution
    - WHY???
      - log(T) = Xβ + ε
      - When Y = a + bX
        - If X follows normal distribution N(μ, σ²)
        - Then Y follows normal distribution N(a + bμ, b²σ²)
      - log(T) = Xβ + ε
        # Since ε follows N(0, σ²)
        # log(T) follows N(Xβ, σ²)
        Because:
        - Xβ is constant term (mean shift)
        - Coefficient of ε is 1 (variance remains same)
  - Actual survival time follows log-symmetric distribution
Probability Density Function (PDF):
- f(t; μ, σ) = (1/tσ√2π) * exp(-(log(t)-μ)²/2σ²)
  Components:
  - t: observed time
  - μ: predicted log-survival time (Xβ)
  - σ: parameter controlling variance
Survival Function:
- S(t; μ, σ) = 1 - Φ((log(t)-μ)/σ)
  where:
  - Φ: cumulative distribution function (CDF) of standard normal distribution
  - Represents probability of survival beyond time point t
Use Cases:
- # Suitable cases:
  - Symmetrically distributed survival times
  - Constant variability
  Example: Component lifetime in manufacturing
- # Unsuitable cases:
  - Distributions with very long tails
  - Highly asymmetric distributions
Advantages:
- Intuitive interpretation
- Relatively simple calculations
- Good fit for symmetric survival time data
Real Example:
- # Predicting medical device lifetime
  survival_time = exp(Xβ + ε)
  ε ~ N(0, σ²)
  # This means lifetime follows log-normal distribution
  # i.e., log(lifetime) follows normal distribution

Basic Assumption:
- ε ~ Log-Normal(μ, σ²)
  # Error term follows log-normal distribution
  # This means survival time T directly follows log-normal distribution
Log-Normal Distribution Characteristics:
- # Properties:
  - Only takes positive values
  - Has a heavy right tail
  - Asymmetric distribution
Difference between AFT:Normal and AFT:Log:
- AFT:Normal
  - log(survival time) follows normal distribution
  - Survival time is symmetrically distributed
  - Example: Manufacturing component lifetime
- AFT:Log
  - Survival time directly follows log-normal distribution
  - Survival time is asymmetrically distributed (long tail)
  - Example: Cancer patient survival period
Use Cases:
- # Suitable cases:
  - When some patients survive much longer than others
  - Biological processes or reliability data
  - Cancer patient survival analysis
  # Reasons:
  - Most show similar survival periods but
  - Some show very long survival periods
  - Can model such long-tail distributions well

In simple terms

In simple terms, AFT assumes that different factors (i.e., input variables or features of the model) affect the rate at which events occur by "stretching" or "compressing" the timeline.

It's like adjusting the playback speed while watching a video:

Double speed play (fast forward) : Time speeds up, events happen faster.
Slow play (slow down) : Time slows down and events occur later.

Imagine you're studying the survival time of two cancer patients:

Patient A receives standard treatment.
Patient B receives a new experimental treatment.

Case 1: $θ = 0.5$ (Acceleration Factor < 1)

This means the timeline is stretched by 2x for Patient B compared to Patient A.

If Patient A survives for 1 year, Patient B is expected to survive for 2 years under the new treatment.

Case 2: $θ = 2$ (Acceleration Factor > 1)

This means the timeline is compressed for Patient B, reducing their survival time by half.

If Patient A survives for 1 year, Patient B is expected to survive for 6 months.

압력 없이는 다이아몬드가 만들어지지 않는다
- 토마스 칼라일 -

저작자표시 비영리 변경금지

'대회' 카테고리의 다른 글

CIBMTR - Equity in post-HCT Survival Predictions #9 NN Starter Notebook (0)	2025.02.06
CIBMTR - Equity in post-HCT Survival Predictions #8 Finding the best target transformation (0)	2025.02.05
CIBMTR - Equity in post-HCT Survival Predictions #7 AFT model (0)	2025.02.05
CIBMTR - Equity in post-HCT Survival Predictions #6 How To Train XGBoost with Survival Loss (0)	2025.02.05
CIBMTR - Equity in post-HCT Survival Predictions #5 How To Get Started - Understanding the Metric (0)	2025.02.05

현재글CIBMTR - Equity in post-HCT Survival Predictions #10 A general Understanding for AFT Loss function

캐글에 도전중인 Stony Brook University 졸업생입니다 =)

코인, Express, 비트코인, 캐글, cibmtr - equity in post-hct survival predictions, dl, translation invariance, backend, home credit default risk, 투자, Kaggle, Prompt Engineering, 오블완, nodejs, SUNYTIME, Underfitting, nlp, 티스토리챌린지, llm, ML,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

동선생