This is a large update. Three big changes:
- Substantial speed improvements -- should improve both absolute speed and scaling
- Added generics for plot, summary, print.
- Switched from the package
Rcpp
tocpp11
for the backend. This removes a runtime dependency onRcpp
, but adds one onC++11
and adds a compile time dependency oncpp11
.
Together, all this lead to many changes under the hood. As a consequence, permutation_test_builder
is substantially different (and no longer exported), and order_stl
no longer exists.
Each run of a *_test function now only sorts the data one time. Denoting the joint sample size N and the number of bootstraps K, this update moves the code from
- Particularly for large samples or large numbers of bootstraps, this means a substantial improvement in speed.
- This required reworking the underlying C++
_stat
functions as well as thepermutation_test_builder
.- Instead of breaking code in unpredictable ways, this function is no longer exported. If you used it, archived copies can be found on github (particularly under the example R code versions), or by emailing me.
- Functions
*_stat
that are syntactically identical to the old ones still exist, but are no longer what is used bypermutation_test_builder
. - These changes likely reduced memory requirements for most users, though this is offset by the new default of storing bootstrap outputs.
There is now a 'twosamples' class, and generics for print
, summary
, and plot
, as well as a function for combining outputs correctly. This should make the printed behavior much better. As well as making it easy to see a fair bit of information using summary.
- plotting currently shows a histogram of the bootstrap values and a red line where the test-statistic is.
- This required making the
*_test
functions export the bootstrap values. If you have memory intensive applications, this can be turned off with a togglekeep.boots
, at the cost of no longer being able to use the plotting. - In the future I may add the ability to plot the ECDFs and the test stat images. This is the main reason for the
keep.samples
toggle which is turned off by default.
- This required making the
- In order to only sort once, this is now a proper permutation test again. This should also resolve some classes of potential validity issues. Proofs in the associated paper are (at the moment) not relevant to this for the same reason.
order_stl
no longer exists. I do not believe anybody used this function outside its internal package use.permutation_test_builder
is no longer exported. I am not aware of anyone using this function outside its internal package use. A similar function is still available, but will require changing the syntax of functions for its inputs.- Dependency switch from
Rcpp
tocpp11
and an additional system requirement ofC++11
.
This version is primarily bug fixes and documentation updates. These bug fixes may affect outputs users see.
I expect this update to be purely cosmetic for the vast majority of users.
- For a few users of
ad_test
orcvm_test
it is possible that re-running code will make significant differences to conclusions. - For the rare users of
ad_test
ordts_test
(akatwo_sample
) who relied on the scale of the test stat (rather than merely the p-value), this update will change outputs substantially. In principal this change is merely re-scaling everything by$(n^p)/(2^{p/2})$ .
- Fixed a major bug in how
ad_stat
andcvm_stat
treated duplicates. This bug lead to excessive power in some situations. Re-running code, p-values and test stats may change. - Fixed a minor bug in how
ad_stat
anddts_stat
calculated standard deviations. Re-running code this will change the scale/location of the test stat, but should not affect p-values. - Some minor performance improvements: e.g. eliminated some unnecessary comparisons
if (sd >0)
. - renamed a functions internal variables to prevent an unlikely namespace conflict.
- Website using pkgdown now exists at https://twosampletest.com
- link to website in description
- Fixed an error in the documentation describing
ad_stat
anddts_stat
-- in which a square root term was dropped - updated discussion of order_stl
- added some notes about ability to use factors (ordered or not)
- added some automated testing of the basic functions
- added reverse dependency testing
- added automated R-CMD-CHECK for each github commit
This update is only fixing up documentation. Fixes a bug that lead to poor formatting, improves formatting of equations, adds graphs for test statistics, adds links between help pages. See v1.1.0 for recent improvements to codebase.
This update is primarily fixing a bug which meant that the test stat sorting routine was O(N^2), not O(Nlog(N)).
- order_cpp was using an O(N^2) sort routine that was supposed to be ditched before package release. It is now deprecated.
- order_stl replaces order_cpp, using the STL sort function to run the required sorting routine.
- All test stat calculations were using 3 more length N vectors than necessary. This has been fixed.
- A paper demonstrating package components was posted to arXiv, and linked to throughout the documentation.
- The folder R/Extras was updated to use the code for the simulations in the arXiv paper.
- permutation_test_builder is now sampling with replacement.
The package has been released. The package includes test statistic functions (written in C++) for the following two-sample distance measures:
- Kolmogorov-Smirnov
- Kuiper
- Cramer-Von Mises
- Anderson-Darling
- Wasserstein Metric
- An updated Wasserstein -- referred to as DTS
Each test statistic also has a corresponding permutation test function.
In addition there are two functions:
- permutation_test_builder
- order_cpp These are primarily intended for internal use, but there was no reason to not export them for other's use.