Title: | Corrected Test Statistics for Comparing Machine Learning Models on Correlated Samples |
---|---|
Description: | Calculate a set of corrected test statistics for cases when samples are not independent, such as when classification accuracy values are obtained over resamples or through k-fold cross-validation, as proposed by Nadeau and Bengio (2003) <doi:10.1023/A:1024068626366> and presented in Bouckaert and Frank (2004) <doi:10.1007/978-3-540-24775-3_3>. |
Authors: | Trent Henderson [cre, aut] |
Maintainer: | Trent Henderson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.1 |
Built: | 2024-11-08 04:12:43 UTC |
Source: | https://github.com/hendersontrent/correctr |
Compute correlated t-statistic and p-value for k-fold cross-validated results
kfold_ttest(x, y, n, k, tailed = c("two", "one"), greater = NULL)
kfold_ttest(x, y, n, k, tailed = c("two", "one"), greater = NULL)
x |
|
y |
|
n |
|
k |
|
tailed |
|
greater |
|
data.frame
containing the test statistic and p-value
Trent Henderson
Nadeau, C., and Bengio, Y. Inference for the Generalization Error. Machine Learning 52, (2003).
Corani, G., Benavoli, A., Demsar, J., Mangili, F., and Zaffalon, M. Statistical comparison of classifiers through Bayesian hierarchical modelling. Machine Learning, 106, (2017).
x <- rnorm(100, mean = 95, sd = 0.5) y <- rnorm(100, mean = 90, sd = 1) kfold_ttest(x = x, y = y, n = 100, k = 5, tailed = "two")
x <- rnorm(100, mean = 95, sd = 0.5) y <- rnorm(100, mean = 90, sd = 1) kfold_ttest(x = x, y = y, n = 100, k = 5, tailed = "two")
Compute correlated t-statistic and p-value for repeated k-fold cross-validated results
repkfold_ttest(data, n1, n2, k, r, tailed = c("two", "one"), greater = NULL)
repkfold_ttest(data, n1, n2, k, r, tailed = c("two", "one"), greater = NULL)
data |
|
n1 |
|
n2 |
|
k |
|
r |
|
tailed |
|
greater |
value specifying which value in the |
data.frame
containing the test statistic and p-value
Trent Henderson
Nadeau, C., and Bengio, Y. Inference for the Generalization Error. Machine Learning 52, (2003).
Bouckaert, R. R., and Frank, E. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science, 3056, (2004).
tmp <- data.frame(model = rep(c(1, 2), each = 60), values = c(stats::rnorm(60, mean = 0.6, sd = 0.1), stats::rnorm(60, mean = 0.4, sd = 0.1)), k = rep(c(1, 1, 2, 2), times = 15), r = rep(c(1, 2), times = 30)) repkfold_ttest(data = tmp, n1 = 80, n2 = 20, k = 2, r = 2, tailed = "two")
tmp <- data.frame(model = rep(c(1, 2), each = 60), values = c(stats::rnorm(60, mean = 0.6, sd = 0.1), stats::rnorm(60, mean = 0.4, sd = 0.1)), k = rep(c(1, 1, 2, 2), times = 15), r = rep(c(1, 2), times = 30)) repkfold_ttest(data = tmp, n1 = 80, n2 = 20, k = 2, r = 2, tailed = "two")
Compute correlated t-statistic and p-value for resampled data
resampled_ttest(x, y, n, n1, n2, tailed = c("two", "one"), greater = NULL)
resampled_ttest(x, y, n, n1, n2, tailed = c("two", "one"), greater = NULL)
x |
|
y |
|
n |
|
n1 |
|
n2 |
|
tailed |
|
greater |
|
data.frame
containing the test statistic and p-value
Trent Henderson
Nadeau, C., and Bengio, Y. Inference for the Generalization Error. Machine Learning 52, (2003).
Bouckaert, R. R., and Frank, E. Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms. Advances in Knowledge Discovery and Data Mining. PAKDD 2004. Lecture Notes in Computer Science, 3056, (2004).
x <- rnorm(100, mean = 95, sd = 0.5) y <- rnorm(100, mean = 90, sd = 1) resampled_ttest(x = x, y = y, n = 100, n1 = 80, n2 = 20, tailed = "two")
x <- rnorm(100, mean = 95, sd = 0.5) y <- rnorm(100, mean = 90, sd = 1) resampled_ttest(x = x, y = y, n = 100, n1 = 80, n2 = 20, tailed = "two")