캐글 보충

[Kaggle Extra Study] 18. Types of Correlation Analysis

dongsunseng 2024. 11. 23. 15:14
반응형

There are several types of correlation analysis:

  1. Pearson Correlation
    • Most commonly used method
    • Measures linear relationships
    • Suitable for continuous variables
    • Values range from -1 to 1
    • Assumes normal distribution
  2. Spearman Correlation
    • Measures rank-based correlation
    • Captures non-linear relationships well
    • Suitable for ordinal data
    • Less sensitive to outliers
    • Can be used even when data is not normally distributed
  3. Kendall's Tau
    • Rank-based correlation
    • More suitable for small samples
    • Less sensitive to outliers
    • More complex to calculate than Spearman but can be more accurate
import pandas as pd
import scipy.stats as stats
# Example data
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
# Pearson correlation
pearson_corr, _ = stats.pearsonr(x, y)
# Spearman correlation
spearman_corr, _ = stats.spearmanr(x, y)
# Kendall's tau
kendall_corr, _ = stats.kendalltau(x, y)
  • Choosing the appropriate correlation coefficient is important depending on your data characteristics:
    • Use Pearson for continuous data that follows normal distribution
    • Use Spearman for ordinal data or when data doesn't follow normal distribution
    • Use Kendall's tau for small samples or when rank relationships are important

Success is not about achieving a destination, but rather enjoying the journey.
- Max Holloway -
반응형