반응형
There are several types of correlation analysis:
- Pearson Correlation
- Most commonly used method
- Measures linear relationships
- Suitable for continuous variables
- Values range from -1 to 1
- Assumes normal distribution
- Spearman Correlation
- Measures rank-based correlation
- Captures non-linear relationships well
- Suitable for ordinal data
- Less sensitive to outliers
- Can be used even when data is not normally distributed
- Kendall's Tau
- Rank-based correlation
- More suitable for small samples
- Less sensitive to outliers
- More complex to calculate than Spearman but can be more accurate
import pandas as pd
import scipy.stats as stats
# Example data
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
# Pearson correlation
pearson_corr, _ = stats.pearsonr(x, y)
# Spearman correlation
spearman_corr, _ = stats.spearmanr(x, y)
# Kendall's tau
kendall_corr, _ = stats.kendalltau(x, y)
- Choosing the appropriate correlation coefficient is important depending on your data characteristics:
- Use Pearson for continuous data that follows normal distribution
- Use Spearman for ordinal data or when data doesn't follow normal distribution
- Use Kendall's tau for small samples or when rank relationships are important
Success is not about achieving a destination, but rather enjoying the journey.
- Max Holloway -
반응형
'캐글 보충' 카테고리의 다른 글
[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화 (0) | 2024.11.17 |
---|---|
[Kaggle Extra Study] 16. Handling Categorical Variables (2) | 2024.11.11 |
[Kaggle Extra Study] 15. GBM vs. XGBoost (0) | 2024.11.10 |
[Kaggle Extra Study] 14. Tree-based Ensemble Models (1) | 2024.11.10 |
[Kaggle Extra Study] 13. Weight Initialization (3) | 2024.11.09 |