2024-05-03
The raw EEG can indicate general state, e.g., awake or asleep, but it can’t tell us much about specific neural processes that underlie perception, cognition or emotion.
Current methods
Grand ERP by averaging across subjects: lost subject-level information.
No uncertainty quantification.
Filtering for removing drifts or noises: no rule of thumb for which frequency/bandwidth should be used.
Pre-specify component search window for quantifying amplitude: no systematic ways of selecting it.
Separate models for control-experimental comparison.
Solve these issues by
\mathbf{Y}= \begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix} \boldsymbol \mu= \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix} \sigma_1^2 & \rho_{12}\sigma_1\sigma_2\\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix}
Suppose we draw a bunch of Gaussian variables, and we want to visualize them.
\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0\\ 0 & 1 \end{pmatrix}
\boldsymbol \mu= \begin{pmatrix} 0 \\ 1 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0\\ 0 & 0.2 \end{pmatrix}
\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0.9\\ 0.9 & 1 \end{pmatrix}
\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \boldsymbol \Sigma= \begin{pmatrix}1 & 0.99\\ 0.99 & 1 \end{pmatrix}
d = 5
\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \\ 0\\0\\0\end{pmatrix} \quad \quad \boldsymbol \Sigma= \begin{pmatrix}1 & 0.99 & 0.98 & 0.97 & 0.96\\ 0.99 & 1 & 0.99 & 0.98 & 0.97 \\ 0.98 & 0.99 & 1 & 0.99 & 0.98 \\ 0.97 & 0.98 & 0.97 & 1 & 0.99\\0.96 & 0.97 & 0.98 & 0.99 & 1 \end{pmatrix}
d = 50
\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \\ \vdots \\0\\0\end{pmatrix} \quad \quad \boldsymbol \Sigma= \begin{pmatrix} 1 & 0.99 & 0.98 & 0.97 & 0.96 & \cdots \\ 0.99 & 1 & 0.99 & 0.98 & 0.97 & \cdots\\ 0.98 & 0.99 & 1 & 0.99 & 0.98 & \cdots\\ 0.97 & 0.98 & 0.97 & 1 & 0.99 & \cdots\\ 0.96 & 0.97 & 0.98 & 0.99 & 1 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \end{pmatrix}
Each line is one sample (path).
Think of Gaussian processes as an infinite dimensional distribution over functions
A Gaussian process (GP) is a process f(\cdot): \mathbb{R}^p \rightarrow \mathbb{R} where any finite n-dimensional distributions are Gaussian, for any x_1\dots, x_n \in \mathbb{R}^p:
f(x_1), \dots, f(x_n) \sim N_n\left(\boldsymbol \mu, \boldsymbol \Sigma\right)
Write f(\cdot) \sim GP to denote that the function f is a GP.
The underlying true ERP waveform is assumed a GP.
\text{E}(f(x)) = m(x) \text{Cov}(f(x), f(x')) = k(x, x')
Popular choices of m(\cdot) are m(x) = 0 or \text{const} for all x, or m(x) = \beta'x
Care more about the covariance function or kernel as it governs the how the process looks like by defining the similarity between data points.
k(x, x' \mid \tau, h) = \tau^2 \exp\left(-\frac{(x - x')^2}{2h^2} \right)
k(x, x') = \exp\left(-\frac{1}{2}(x - x')^2 \right)
k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{0.25}}^2} \right)
k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{4}}^2} \right)
k(x, x') = {\color{red}{10^2}}\exp\left(-\frac{1}{2}(x - x')^2 \right)
\begin{aligned} y_i &= f(x_i) + \epsilon _i, ~~ \epsilon _i \sim N(0, \sigma^2), ~~ i = 1, \dots, n, \\ f(\cdot) &\sim GP\\ f'(t_m) &= 0, ~~m = 1, \dots, M. \end{aligned}
t_1, \dots, t_M are parameters to be estimated.
In ERP studies, M is usually known.
Use an univariate prior t \sim \pi(t) that corresponds to assuming that f has at least one critical point, and utilize the posterior of t to infer all critical points.
An univariate prior of t leads to
Efficient computation:
Avoids possible misspecification of M that distorts the fitted result.
\begin{aligned} y_i &= f(x_i) + \epsilon_i, ~~ \epsilon_i \sim N(0, \sigma^2), ~~i = 1, \dots, n, \\ f &\sim GP(0, k(\cdot, \cdot; \boldsymbol \theta)) , ~~ f'(t) = 0,\\ t &\sim \pi(t), ~~ \sigma^2 \sim \pi(\sigma^2), ~~ \boldsymbol \theta\sim \pi(\boldsymbol \theta) \end{aligned}
For iteration i = 1, 2, \dots until convergence:
Monte Carlo E-step: Metropolis-Hastings on t and Gibbs sampling on \sigma^2.
M-step: Given samples of t and \sigma^2, update \boldsymbol \theta to \hat{\boldsymbol \theta}^{(i+1)} by setting \begin{aligned} \hat{\boldsymbol \theta}^{(i+1)} &= \arg \max_{\boldsymbol \theta} \frac{1}{J} \sum_{j=1}^J \log p(\bm{y} \mid t_j^{(i)}, (\sigma_j^2)^{(i)}, \boldsymbol \theta^{(i)}) \end{aligned}
Two stationary points t_1 = 0.44 and t_2 = 1.46
\pi(t) = \text{Unif}(0, 2)
Current ERP Studies
Estimate all parameters in a unified framework.
Supports a multi-factor ANOVA setting that is capable of examining how multiple factors affect the latency and amplitude.
Bayesian probabilistic statements about the amplitude and latency
The hierarchical model provides estimates at both group and subject levels.
y_{igs} = f_{gs}(x_i) + \epsilon_{igs}, ~~ \epsilon_{igs} \stackrel{\rm iid}{\sim}N(0, \sigma^2).
Time point i = 1, \dots, n
Factor level g = 1, \dots, G, (g = \color{blue}{\text{young}, \text{old}} for Age factor)
Subject s = 1, \dots, S_{g} (\color{blue}{S_{young} = 18; S_{old} = 11})
f_{gs} \mid \bm{t}_{gs} \sim \text{DGP}(0, k(\cdot, \cdot; \boldsymbol \theta), \bm{t}_{gs}), where \bm{t}_{gs} = \{t_{gs}^m\}_{m = 1}^M.
\begin{aligned} t_{gs}^m \mid r_{g}^m, \eta_{g}^m &\stackrel{\rm iid}{\sim}\text{gbeta}\left(r_{g}^m\eta_{g}^m, \, (1 - r_{g}^m)\eta_{g}^m, \, a^{m}, \, b^{m}\right), \, s = 1, \dots, S_{g}. \end{aligned}
Subject-level t_{gs}^m \in [a^{m}, b^{m}] (\color{blue}{t_{young, 1}^{N100} \in [60, 140]; t_{old, 2}^{P200} \in [140, 220]})
Group-level r_{g}^m \in (0, 1): prior mean location in [a^{m}, b^{m}]
{\mathbb E}\left(t_{gs}^m\right) = (1 - r_{g}^m)a^{m} + r_{g}^mb^{m}
With the link function \phi(\cdot) \in (-\infty, \infty), and z_j \in \{0, 1\}, j = 1, \dots, G-1,
\phi\left(r_{g}^m\right) = \beta_0^m + \beta_{1}^mz_{1} + \cdots + \beta_{G-1}^mz_{G-1}.
\color{blue}{\phi\left(r_{young}^{N100}\right) = \beta_0^{N100}}.
\color{blue}{\phi\left(r_{old}^{N100}\right) = \beta_0^{N100} + \beta_{1}^{N100}}.
Accommodate any linear model with categorical and numerical variables
Being able to describe interactions
Any link function with domain (0, 1) can be used, logit, probit, complementary log-log, etc.
\begin{aligned} \beta_{j}^m &\sim N(\mu_j^m, (\sigma_j^m)^2), ~~ j = 0, 1, \dots, G-1\\ \eta_{g}^m &\sim Ga(\alpha_{\eta}, \beta_{\eta}),\\ \pi(\sigma^2, \tau^2, h) &= \pi(\sigma^2) \pi(\tau^2) \pi(h) = IG(\alpha_{\sigma}, \beta_{\sigma})IG(\alpha_{\tau}, \beta_{\tau})Ga(\alpha_{h}, \beta_{h}). \end{aligned}
10 subjects (regression functions) in each group
Data of size n=100 are generated with \sigma = 0.25
Method | Sine | Cosine |
---|---|---|
SLAM-posterior mean | 0.004 (0.0000) | 0.007 (0.0002) |
LOESS1 | 0.012 (0.0021) | 0.025 (0.0036) |
Method | Sine | Cosine |
---|---|---|
SLAM-posterior mean | 0.04 (0.0004) | 0.06 (0.0016) |
LOESS | 0.07 (0.0021) | 0.07 (0.0066) |
Experiment on speech recognition. Choose the word you hear from
N100 (60 ~ 140 msec) component captures phonological (syllable) representation.
P200 (140 ~ 220 msec) component represents higher-order perceptual processing.
The older group has longer N100 and P200 latency.
The time lag between N100 and P200 in the young group is shorter.
Novel Bayesian SLAM estimates the latency and amplitude of ERP components with uncertainty quantification.
Solves the search window specification problem by generating posterior distribution of latency.
Incorporates the ANOVA and possibly a latent generalized linear model structure in a single unified framework.
Examines both individual and group differences that facilitate comparing different characteristics or factors, such as age/gender.
R package slam at https://github.com/chenghanyustats/slam
Working in progress: