Coefficient of Determination (\(R^2\))
Description
\(R^2\) is a regression metric that measures the proportion of the variance in the actual values that is explained by the model. It provides insight into how well the model captures the patterns in the data.
- Ranges from 0 to 1 (can be interpreted as 0% to 100%)
- Higher values indicate a better fit, with 1 meaning perfect prediction
- Lower values suggest the model explains little of the variance in the data
- Can be misleading in nonlinear relationships or when used with inappropriate models
Formula
\[ R^2 = 1 - \dfrac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \]
- \(y_i\) = actual value
- \(\hat{y}_i\) = predicted value
- \(\bar{y}\) = mean of actual values
- \(n\) = number of samples
Example
Actual | Predicted | Residual (y - ลท) | Squared Residual |
---|---|---|---|
5 | 4 | 1 | 1 |
7 | 6 | 1 | 1 |
10 | 9 | 1 | 1 |
Total variance: \(( (5 - 7.33)^2 + (7 - 7.33)^2 + (10 - 7.33)^2 ) = 8.67\) Explained variance (MSE numerator): \((1 + 1 + 1) = 3\)
\[ R^2 = 1 - \dfrac{3}{8.67} \approx 0.654 \approx 65.4\% \]