Letters

2021-09-12

Supplemental R code to “Exploratory inference: localizing relevant effects with confidence” by A. Solari and J. Goeman.

Set up

Install and load the hommel and repmis R packages:

require("hommel") || install.packages("hommel")
require("repmis") || install.packages("repmis")

Load the data

source_data("https://github.com/aldosolari/letters/blob/main/rawdata.RData?raw=true")
source_data("https://github.com/aldosolari/letters/blob/main/letters.RData?raw=true")

Introduction

Suppose we have a moderate-scale spatial problem, structured in a \(m = 100 × 100\) square matrix, where for each variable (pixel) we measure the local magnitude of the effect. Assume a random sample of size \(n=30\) from \(m\)-variate Gaussian \(\mathcal{N}_m(\mu, \Sigma)\) with \(\mu=(\mu_1,\ldots,\mu_m)^\mathsf{T}\) and \(\mathrm{diag}(\Sigma) = (\sigma^2_1,\ldots,\sigma_m^2)^\mathsf{T}\).

To look at the \(n\times m\) data matrix (observations \(\times\) voxels):

x[1:3, 1:6]
          [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
[1,] -0.404556  0.3946698  0.6837976 -0.4468492 -1.0626433  0.23823091
[2,] -1.977747 -0.2800438 -0.5160666 -1.7695368 -0.4284432  0.09212929
[3,] -1.099681 -0.7257958 -1.2172958 -1.2947280 -0.5190015 -0.16559480

Store the sample size \(n\) and the dimension \(m\):

n <- nrow(x)
m <- ncol(x)

The parameter of interest is the effect size \(\theta_i = \mu_i/\sigma_i\) for \(i=1,\ldots,m\).

To get the point estimates \(\hat{\theta}_i= \hat{\mu}_i / \hat{\sigma}_i\):

muhat <- apply(x,2,mean)
sigmahat <- sqrt(apply(x,2,var))
thetahat <- muhat / sigmahat

We take the conventional value of \(\Delta=0.2\) for an effect of small magnitude. For testing \(H_i : \theta_i \in [-\Delta,\Delta]\) we compute the \(p\)-values as \(p_i = \mathrm{pr}(\mathcal{F}_{1,n-1, n\Delta^2} \geq n \hat{\theta}_i^2)\) where \(\mathcal{F}\) denotes the non-central F distribution with degrees of freedom 1 and \(n-1\) and non-centrality parameter \(n\Delta^2\).

To compute the \(p\)-values \(p_i\):

Delta <- 0.2
pval <- sapply(1:m, function(i) 
  pf( n*(thetahat[i])^2, df1=1, df2=n-1, ncp=n*(Delta^2), lower.tail = F) 
  )

Plot of \(p\)-values \(p_i \leq 0.01\):

pval_mat <- matrix(pval, nrow=sqrt(m))
image(t(apply(pval_mat <= 0.01, 2, rev)), col=0:1, asp=1)

Simultaneous inference

We set the target confidence level to \(\alpha=0.05\).

alpha <- 0.05

We use the hommel package to calculate the lower bound \(\underline{r}_{S}\) for the number of relevant effects in \(S\) by using closed testing with Simes local tests:

res <- hommel(pval)
summary(res)
#> A hommel object for 10000 hypotheses.
#> Simes inequality is assumed.
#> Use p.adjust(), discoveries() or localtest() to access this object.
#> 
#> With 0.95 confidence: at least 1271 discoveries.
#> 889 hypotheses with adjusted p-values below 0.05.

With 95% confidence, the overall number of relevant effects is at least 1271.

In our toy example, we assume that relevant effects manifest themselves only with the shape of the letters of the alphabet A, B, C, \(\ldots\), Z.

To plot the letter A:

names(lttrs)
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
A <- lttrs[["A"]]
image(t(apply(A, 2, rev)), col=0:1, asp=1)

To get the size of A:

sum(A)
[1] 2529

To get the lower bound \(\underline{r}_A\) for the number of relevant effects in \(A\):

discoveries(res, ix=which(A==TRUE), alpha=alpha)
#> [1] 307

To get the lower bound \(\underline{r}_A\) for the proportion of relevant effects in \(A\):

tdp(res, ix=which(A==TRUE), alpha=alpha)
#> [1] 0.1213919

To get the lower bound \(\underline{r}_{A^c}\) for the proportion of relevant effects outside \(A\):

tdp(res, ix=which(A==FALSE), alpha=alpha)
#> [1] 0.1191273

Find the the most likely letter!