using ClusterDepth
using Random
using CairoMakie
using UnfoldSim
using StatsBase
using Distributions
using DataFrames
using Unfold
using UnfoldMakie

How to use the ClusterDepth multiple comparison correction

Info

This tutorial focuses on single-channel data. For multichannel data, see the tutorial "Further EEG Example".

Simulating test-data

Let's setup an EEG simulation using UnfoldSim.jl. We simulate a one factor 1x2 design with 20 subjects, each with 40 trials

n_subjects = 20
design = MultiSubjectDesign(
    n_subjects=n_subjects,
    n_items=40,
    items_between=Dict(:condition => ["car", "face"]),
)
first(generate_events(design), 3)

3×3 DataFrame

Row	subject	item	condition
	String	String	String
1	S01	I01	car
2	S01	I02	face
3	S01	I03	car

Next we define a ground-truth signal based on a Linear Mixed Model.

We simulate a P100, a N170 and a P300 - but an effect only on the N170

p1 = MixedModelComponent(;
    basis=UnfoldSim.p100(; sfreq=250),
    formula=@formula(0 ~ 1 + (1 | subject)),
    β=[1.0],
    σs=Dict(:subject => [1]),
);
n170 = MixedModelComponent(;
    basis=UnfoldSim.n170(; sfreq=250),
    formula=@formula(0 ~ 1 + condition + (1 + condition | subject)),
    β=[1.0, -0.5], # condition effect - faces are more negative than cars
    σs=Dict(:subject => [1, 0.2]), # random slope yes please!
);

p300 = MixedModelComponent(;
    basis=UnfoldSim.p300(; sfreq=250),
    formula=@formula(0 ~ 1 + condition + (1 + condition | subject)),
    β=[1.0, 0], ## no p300 condition effect
    σs=Dict(:subject => [1, 1.0]), # but a random slope for condition
);

data, events = simulate(
    MersenneTwister(1),
    design,
    [p1, n170, p300],
    UniformOnset(; offset=500, width=100),
    RedNoise(noiselevel=1);
    return_epoched=true,
)
times = range(0, stop=size(data, 1) / 250, length=size(data, 1));

We now have data, but no multiple comparison problem yet - we have to fit one analysis regression model to each time-point of each subject. Thereby, we perform 113 tests and would (without multiple testing correction) expect 5-6 samples to be significant, even if there is no true effect (but there is one ;)).

Note

In principle, we do not need Unfold here - we could simply calculate (subjectwise) means of the conditions, and their time-resolved difference. Using Unfold.jl here simply generalizes it to more complex design, e.g. with continuous predictors etc.

models = map(
    (d, ev) -> (
        fit(UnfoldModel, @formula(0 ~ 1 + condition), DataFrame(ev), d, times),
        ev.subject[1],
    ),
    eachslice(data; dims=3),
    groupby(events, :subject),
);


Progress:   2%|▊                                        |  ETA: 0:00:19
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

now we can inspect the data easily, and extract the face-effect

function add_subject!(df, s)
    df[!, :subject] .= s
    return df
end
allEffects =
    map(
        (x) ->
            (effects(Dict(:condition => ["car", "face"]), x[1]), x[2]) |>
            (x) -> add_subject!(x[1], x[2]),
        models,
    ) |> e -> reduce(vcat, e)

plot_erp(allEffects; mapping=(color=:condition, group=:subject))

Every line is from one subject, the color indicates our two conditions.

It is easier to see potential differences if we would plot the difference:

First we extract all coefficients in a nice, tidy DataFrame

allCoefs =
    map(m -> (coeftable(m[1]), m[2]) |>
             (x) -> add_subject!(x[1], x[2]), models) |>
    e -> reduce(vcat, e)
plot_erp(allCoefs; mapping=(group=:subject, col=:coefname))

This plot now shows the intercept (in our contrast-case the condition:"car" ERP), and the difference curve.

Next we unstack the tidy-coef table into a matrix and put it to clusterdepth for clusterpermutation testing

faceCoefs = allCoefs |> x -> subset(x, :coefname => x -> x .== "condition: face")
erpMatrix =
    unstack(faceCoefs, :subject, :time, :estimate) |> x -> Matrix(x[:, 2:end])' |> collect
summary(erpMatrix)

"113×20 Matrix{Union{Missing, Float64}}"

Clusterdepth

pvals = clusterdepth(erpMatrix; τ=quantile(TDist(n_subjects - 1), 0.95), nperm=5000);

┌ Warning: Your data shows a cluster that starts before the first sample, or ends after the last sample. There exists a fundamental limit in the ClusterDepth method, that the clusterdepth for such a cluster cannot be determined. Maybe you can extend the epoch to include more samples? This 'half'-cluster will be ignored and cannot become significant
└ @ ClusterDepth ~/work/ClusterDepth.jl/ClusterDepth.jl/src/cluster.jl:62

well - that was fast, less than a second for a cluster permutation test. not bad at all!

Plotting

Some plotting, and we add the identified cluster

first calculate the ERP

faceERP =
    groupby(faceCoefs, [:time, :coefname]) |>
    x -> combine(x, :estimate => mean => :estimate, :estimate => (x -> std(x) / sqrt(length(x))) => :stderror);

put the pvalues into a nicer format

pvalDF =
    ClusterDepth.cluster(pvals .<= 0.05) |>
    x -> DataFrame(
        :from => x[1] ./ 250,
        :to => (x[1] .+ x[2]) ./ 250,
        :coefname => "condition: face",
    )
plot_erp(faceERP; stderror=true, significance=pvalDF)
hlines!([0])
current_figure()

Looks good! We identified the cluster :-)

This page was generated using Literate.jl.

How to use the ClusterDepth multiple comparison correction

Simulating test-data

Fit a model to each subject

Clusterdepth

Plotting