Title: | Quantify Dependence using Rearranged Dependence Measures |
---|---|
Description: | Estimates the rearranged dependence measure ('RDM') of two continuous random variables for different underlying measures. Furthermore, it provides a method to estimate the (SI)-rearrangement copula using empirical checkerboard copulas. It is based on the theoretical results presented in Strothmann et al. (2022) <arXiv:2201.03329> and Strothmann (2021) <doi:10.17877/DE290R-22733>. |
Authors: | Holger Dette [aut] |
Maintainer: | Christopher Strothmann <[email protected]> |
License: | GPL-2 |
Version: | 0.1.1 |
Built: | 2025-02-23 05:18:02 UTC |
Source: | https://github.com/christopherstrothmann/rdm |
Estimate a non-square checkerboard mass density
checkerboardDensity(X, Y, resolution1, resolution2)
checkerboardDensity(X, Y, resolution1, resolution2)
X |
First coordinate of the observations. |
Y |
Second coordinate of the observations. |
resolution1 |
A natural number specifying the resolution of the first component. |
resolution2 |
A natural number specifying the resolution of the second component. |
This implementation modifies the code of build_checkerboard_weights() published in 'qad', version 1.0.4, available at https://CRAN.R-project.org/package=qad,
to allow for non-square checkerboard mass densities.
For more details on the implementation see ECBC
and for more information on the implemented changes, see the file 'src/code.cpp'.
The estimated checkerboard mass density.
checkerboardDensity(runif(20), runif(20), 3, 3)
checkerboardDensity(runif(20), runif(20), 3, 3)
Estimate the value of the non-square checkerboard mass density.
checkerboardDensityIndex(X, Y, k, l, resolution1, resolution2)
checkerboardDensityIndex(X, Y, k, l, resolution1, resolution2)
X |
First coordinate of the observations. |
Y |
Second coordinate of the observations. |
k |
Index of the first component. |
l |
Index of the second component. |
resolution1 |
A natural number specifying the resolution of the first component. |
resolution2 |
A natural number specifying the resolution of the second component. |
This implementation modifies the code of build_checkerboard_weights() published in 'qad', version 1.0.4, available at https://CRAN.R-project.org/package=qad,
to allow for the evaluation of a single index of the non-square checkerboard mass densities.
For more details on the implementation see ECBC
and for more information on the implemented changes, see the file 'src/code.cpp'.
The estimated checkerboard mass density .
U <- runif(20) V <- runif(20) checkerboardDensity(U, V, 3, 3) checkerboardDensityIndex(U, V, 1, 2, 3, 3)
U <- runif(20) V <- runif(20) checkerboardDensity(U, V, 3, 3) checkerboardDensityIndex(U, V, 1, 2, 3, 3)
An implementation of the cross-validation principle for the bandwidth selection as presented in Strothmann, Dette and Siburg (2022) <arXiv:2201.03329>.
computeBandwidth(X, sL, sU, method = c("cvsym", "cvasym"), reduce = TRUE)
computeBandwidth(X, sL, sU, method = c("cvsym", "cvasym"), reduce = TRUE)
X |
A bivariate data.frame containing the observations. Each row contains one observation. |
sL |
Lower bound |
sU |
Upper bound |
method |
"cvsym" uses either a symmetric cross-validation principle (N_1 = N_2) and "cvasym" uses an asymmetric cross-validation principle (i.e. |
reduce |
In case reduce is set to TRUE, the parameter is chosen from N, N+2, ... instead of N, N+1, N+2, ... |
This function computes the optimal bandwidth given the bivariate observations of length
.
Currently, there are two different algorithms implemented:
"cvsym" - Computes the optimal bandwidth choice for a square checkerboard mass density according to the cross-validation principle. The bandwidth is a natural number between
"cvasym" - Computes the optimal bandwidth choice for a non-square checkerboard mass density according to the cross-validation principle.
The bandwidths
are natural numbers between
and may possibly attain different values.
The chosen bandwidth depending on the data.frame X.
n <- 20 X <- cbind(runif(n), runif(n)) computeBandwidth(X, sL = 0.25, sU = 0.5, method="cvsym", reduce=TRUE)
n <- 20 X <- cbind(runif(n), runif(n)) computeBandwidth(X, sL = 0.25, sU = 0.5, method="cvsym", reduce=TRUE)
Computes for some underlying measure for the checkerboard copula
.
This measure depends only on the input matrix A.
computeCBMeasure(A, method = c("spearman", "kendall", "bkr", "dss", "zeta1"))
computeCBMeasure(A, method = c("spearman", "kendall", "bkr", "dss", "zeta1"))
A |
A (possibly non-square) checkerboard mass density. |
method |
Determines the underlying dependence measure. Options include "spearman", "kendall", "bkr", "dss", "chatterjee" and "zeta1". |
This function computes for one of several underlying measures for a given checkerboard copula
.
Most importantly, the value only depends on the (possibly non-square) matrix
and implicitly assumes the form of
given in Strothmann, Dette and Siburg (2022) <arXiv:2201.03329>.
Currently, the following underlying measures are implemented:
"spearman" Implements the concordance measure Spearman's ,
"kendall" Implements the concordance measure Kendall's ,
"bkr" Implements the Blum–Kiefer–Rosenblatt , also known as the
-Schweizer-Wolff-measure <doi:10.1214/aos/1176345528>,
"dss" Implements the Dette-Siburg-Stoimenov measure of complete dependence <doi:10.1111/j.1467-9469.2011.00767.x>, also known as Chatterjee's <doi:10.1080/01621459.2020.1758115>,
"zeta1" Implements the -measure of complete dependence established by W. Trutschnig <doi:10.1016/j.jmaa.2011.06.013>.
The value of . For a sorted A, this corresponds to the rearranged dependence measure
.
n <- 10 A <- diag(n)/n computeCBMeasure(A, method="spearman")
n <- 10 A <- diag(n)/n computeCBMeasure(A, method="spearman")
This function estimates the asymmetric dependence between and
using the rearranged dependence measure
for different possible underlying measures
.
A value of 0 characterizes independence of
and
, while a value of 1 characterizes a functional relationship between
and
, i.e.
.
rdm( X, method = c("spearman", "kendall", "dss", "zeta1", "bkr", "all"), bandwidth_method = c("fixed", "cv", "cvsym"), bandwidth_parameter = 0.5, permutation = FALSE, npermutation = 1000, checkInput = FALSE )
rdm( X, method = c("spearman", "kendall", "dss", "zeta1", "bkr", "all"), bandwidth_method = c("fixed", "cv", "cvsym"), bandwidth_parameter = 0.5, permutation = FALSE, npermutation = 1000, checkInput = FALSE )
X |
A bivariate data.frame containing the observations. Each row contains one bivariate observation. |
method |
Options include "spearman", "kendall", "bkr", "dss", "chatterjee" and "zeta1".The option "all" returns the value for all aforementioned methods. |
bandwidth_method |
A character string indicating the use of either a cross-validation principle (square or non-square) or a fixed bandwidth (oftentimes called resolution). |
bandwidth_parameter |
A numerical vector which contains the necessary optional parameters for the exponent of the chosen bandwidth method.
In case of N observations, the bandwidth_parameter |
permutation |
Whether or not to perform a permutation test |
npermutation |
Number of repetitions of the permutation test |
checkInput |
Whether or not to perform validity checks of the input |
This function estimates using the empirical checkerboard mass density
.
To arrive at
,
is appropriately sorted and then evaluated for the underlying measure.
The estimated
always takes values between 0 and 1 with
if and only if
and
are independent.
if and only if
for some measurable function
.
Currently, the following underlying measures are implemented:
"spearman" Implements the concordance measure Spearman's (which is identical to the
-Schweizer-Wolff-measure),
"kendall" Implements the concordance measure Kendall's ,
"bkr" Implements the Blum–Kiefer–Rosenblatt , also known as the
-Schweizer-Wolff-measure <doi:10.1214/aos/1176345528>,
"dss" Implements the Dette-Siburg-Stoimenov measure of complete dependence <doi:10.1111/j.1467-9469.2011.00767.x>, also known as Chatterjee's <doi:10.1080/01621459.2020.1758115>,
"zeta1" Implements the -measure of complete dependence established by W. Trutschnig <doi:10.1016/j.jmaa.2011.06.013>.
The estimation of the checkerboard mass density depends on the choice of the bandwidth for the checkerboard copula.
For a detailed discussion of "cv" and "cvsym", see
computeBandwidth
.
The estimated value of the rearranged dependence measure
n <- 50 X <- cbind(runif(n), runif(n)) rdm(X, method="spearman", bandwidth_method="fixed", bandwidth_parameter=.3) n <- 20 U <- runif(n) rdm(cbind(U, U), method="spearman", bandwidth_method="cv", bandwidth_parameter=c(0.25, 0.5))
n <- 50 X <- cbind(runif(n), runif(n)) rdm(X, method="spearman", bandwidth_method="fixed", bandwidth_parameter=.3) n <- 20 U <- runif(n) rdm(cbind(U, U), method="spearman", bandwidth_method="cv", bandwidth_parameter=c(0.25, 0.5))
Sorts an arbitrary doubly stochastic matrix A into the matrix
such that the induced checkerboard copula
is stochastically increasing.
sortDSMatrix(A)
sortDSMatrix(A)
A |
A (possibly non-square) doubly stochastic matrix or (possibly non-square) checkerboard mass density. |
The algorithm to sort a doubly stochastic matrix is given in Strothmann, Dette and Siburg (2022) <arXiv:2201.03329>.
Since this implementation does not depend on the appropriate scaling of the matrix
, both doubly stochastic matrices and checkerboard mass densities are admissible inputs.
The sorted version of the matrix
.
n <- 4 A <- diag(n)[n:1, ] print(A) sortDSMatrix(A)
n <- 4 A <- diag(n)[n:1, ] print(A) sortDSMatrix(A)