CIBMTR - Equity in post-HCT Survival Predictions #13 How to make sense of the race group distribution in the data?

대회

CIBMTR - Equity in post-HCT Survival Predictions #13 How to make sense of the race group distribution in the data?

dongsunseng 2025. 2. 10. 20:27

https://dongsunseng.com/entry/CIBMTR-Equity-in-post-HCT-Survival-Predictions-11-ESP-EDA-which-makes-sense-%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F-AFT-Loss-func-sol-1

CIBMTR - Equity in post-HCT Survival Predictions #11 ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️ (AFT Loss func sol

Annotation post about AFT loss function solution:https://www.kaggle.com/code/ambrosm/esp-eda-which-makes-sense ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️Explore and run machine learning code with Kaggle Notebooks | Using data from CIBMTR - E

dongsunseng.com

From my other blog post, we discussed about

This blog is about the "further discussion": https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/discussion/550302

CIBMTR - Equity in post-HCT Survival Predictions

Improve prediction of transplant survival rates equitably for allogeneic HCT patients

www.kaggle.com

How to make sense of the race group distribution in the data ?

Counting values of race groups I get the following:

Having worked on the topic of equity for sensitive applications, I have found one of the main problem to be imbalance in data of interest.
Typically some less represented races will end up with wider estimates.
However the data at hand seems to have been resampled (or generated as balanced).
While this can be achieved on real data by downsampling the majority class, it usually kills representativeness of the population.
I am concerned a model optimised with this metric on this balanced dataset would perform worse on real life 'race imbalanced' data.
How does 'race-balancing' the dataset make sense in an equity competition ?

Comments:

Maybe the idea behind balanced, synthetic data is to accentuate differences in risk prediction due only to the available features, by taking imbalance out of the problem.
- By eliminating racial imbalances in the actual data, one can more clearly see differences in risk predictions that are "purely attributable to available features"
- This allows for more accurate evaluation of actual prediction performance differences rather than differences in population ratios
This could suggest a need for additional predictors if certain groups are more poorly predicted.
- If predictions are less accurate for certain groups, this could indicate that current features don't adequately explain those groups
- This could signal the need for additional predictors that better characterize these groups

완벽하려고 미루는 것보다 지속적으로 고쳐나가는 것이 낫습니다.
- 마크 트웨인 -

저작자표시 비영리 변경금지

'대회' 카테고리의 다른 글

CIBMTR - Equity in post-HCT Survival Predictions #14 Feature Engineering Ideas (0)	2025.02.10
CIBMTR - Equity in post-HCT Survival Predictions #12 Deep understanding of (C-index) evaluation measure for better model (0)	2025.02.10
CIBMTR - Equity in post-HCT Survival Predictions #11 ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️ (AFT Loss func sol #1) (0)	2025.02.10
CIBMTR - Equity in post-HCT Survival Predictions #10 A general Understanding for AFT Loss function (0)	2025.02.06
CIBMTR - Equity in post-HCT Survival Predictions #9 NN Starter Notebook (0)	2025.02.06

현재글CIBMTR - Equity in post-HCT Survival Predictions #13 How to make sense of the race group distribution in the data?

캐글에 도전중인 Stony Brook University 졸업생입니다 =)

Kaggle, cibmtr - equity in post-hct survival predictions, ML, translation invariance, llm, SUNYTIME, Underfitting, nlp, Express, backend, home credit default risk, 코인, 캐글, dl, 투자, 비트코인, 티스토리챌린지, nodejs, Prompt Engineering, 오블완,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

동선생