Electroencephalography (EEG)
Let’s first talk about EEG signals.
Basically when a EEG experiment is conducted, a person’s brain activity is recorded from the electrodes placed on the person’s head.
When you look at the raw EEG, it’s a pretty complicated signal. It’s basically the sum of everything the brain is doing.
It can indicate if someone is awake or asleep, but in its raw form, the EEG cannot reveal the specific neural processes that underlie perception, cognition or emotion.
The raw EEG can indicate general state, e.g., awake or asleep, but it can’t tell us much about specific neural processes that underlie perception, cognition or emotion.
What ERP Can Do
So what can we do with ERPs?
Read words…
Estimate ERP Components
Interested in estimating the amplitude and latency of some ERP component.
In a typical ERP study, we focus on estimating the amplitude and latency of some ERP components or the important peaks or dips that are related to human sensory, cognitive or motor processes where amplitude is the magnitude of the component and the latency is the time when the ERP component occurs.
Inference Methods
Current methods
grand ERP by averaging across subjects: lost subject-level information
no uncertainty quantification
filtering for removing drifts or noises: no rule of thumb for which frequency/bandwidth should be used.
component search window specification
separate ANOVA for control-experimental comparison
Current methods have several issues or drawbacks.
In order to obtain precise point estimates of latency and amplitude, or the entire ERP waveform, people tend to get the grand ERP waveform by averaging across subjects.
By doing so, subject-level information is lost.
NO UQ
To get a smooth ERP curve, some kinds of filters are necessary to remove signal drifts and noises. But no rule of thumb for which frequency/bandwidth should be used.
To analyze ERP components, a time search window needs to be specified to frame the range of the components and an inappropriate time window may result in false analysis.
But so far there are no systematic or consistent ways of selecting such windows, and the window is chosen based on previous scientific findings.
Solve the issues by
Bayesian hierarchical model with a derivative Gaussian process prior : Yu et al., Biometrics (2022), Li et al., JASA T&M (2023), Yu et al., Data Science in Science (2024 expected)
Gaussian Process (GP)
A Gaussian process (GP) is a process f(\cdot): \mathbb{R}^D \rightarrow \mathbb{R} where all finite-dimensional distributions are Gaussian, for any n and x_1\dots, x_n \in \mathbb{R}^D :
f(x_1), \dots, f(x_n) \sim N_n\left(\boldsymbol \mu, \boldsymbol \Sigma\right)
Before getting into our model, let me briefly talk about GP.
Here shows a one-dimensional white noises and a realization or sample path of a GP.
Here shows a two-dimensional white noises and a realization or sample path of a GP.
Mean and Covariance Function
To fully specify a GP, we need the mean and covariance function , f(\cdot) \sim GP\left(m(\cdot), k(\cdot, \cdot)\right) where
\text{E}(f(x)) = m(x) \text{Cov}(f(x), f(x')) = k(x, x')
Popular choices of m(\cdot) are m(x) = 0 or \text{const} for all x , or m(x) = \beta'x
Care more about the covariance function or kernel as it governs the how the process looks like by defining the similarity between data points.
Squared Exponential (SE) Kernel
k(x, x' \mid \tau, h) = \tau^2 \exp\left(-\frac{(x - x')^2}{2h^2} \right)
k(x, x') = \exp\left(-\frac{1}{2}(x - x')^2 \right)
The parameter h is the characteristic length-scale that controls the number of level-zero upcrossings.
Squared Exponential (SE) Kernel
k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{0.25}}^2} \right)
Squared Exponential (SE) Kernel
k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{4}}^2} \right)
Squared Exponential (SE) Kernel
k(x, x') = {\color{red}{10^2}}\exp\left(-\frac{1}{2}(x - x')^2 \right)
The parameter \tau^2 is the variance of f(x) that controls the vertical variation of the process.
Derivative Gaussian Process (DGP)
Force f' = 0 at x = -4 , 0 , and 4 , and all paths follow the constraint.
The stationary points can be a maximum, minimum or inflection point.
Add 3 constraints, but a path will have at least 3 stationary points.
DGP is basically a GP with derivative constraints or information.
For example here on the left, we have GP sample paths of SE kernel without derivative constraints.
But we could add derivative information to the GP.
Like the one on the right that (…)
Shaped-Constrained Regression
\begin{aligned}
y_i &= f(x_i) + \epsilon _i, ~~ \epsilon _i \sim N(0, \sigma^2), ~~ i = 1, \dots, n, \\
f'(t_m) &= 0, ~~m = 1, \dots, M,\\
f(\cdot) &\sim GP
\end{aligned}
In ERP applications,
\{y_i\}_{i=1}^n are the ERP time series observations.
f(\cdot) is the true underlying smooth ERP waveform.
M peaks at t_1, \dots, t_M of a ERP waveform f are such that f'(t_m) = 0 .
Note that t_1 , …, t_M are parameters we are interested and want to learn.
Derivative Gaussian Process Prior
DGP prior of f , DGP\left(\mu(\cdot), k(\cdot, \cdot), \mathbf{t}\right) , \mathbf{t}= (t_1, \dots, t_M)
Mean function is x_i \mapsto \mu(x_i) - k_{01}(x_i, \bm{t}) k_{11}^{-1}(\bm{t}, \bm{t}) \mu'(\bm{t}) where \mu'() is the mean function of f' .
[k_{01}(\bm{x}, \bm{t})]_{ij} := \frac{\partial }{\partial t_j}k(x_i, t_j), ~~[k_{11}(\bm{t}, \bm{t})]_{ij} := \frac{\partial ^2 }{\partial t_i \partial t_j}k(t_i, t_j)
Covariance kernel is (x_i, x_j) \mapsto k(x_i, x_j) - k_{01}(x_i, \bm{t}) k_{11}^{-1}(\bm{t}, \bm{t}) k_{10}(\bm{t}, x_j)
k_{10}(\bm{x}, \bm{t}) = k_{01}^T(\bm{x}, \bm{t})
Prior on Stationary Points
t_1, \dots, t_M are parameters to be estimated.
In ERP studies, M is usually known.
Use an univariate prior t \sim \pi(t) that corresponds to assuming that f has at least one stationary point, and utilize the posterior of t to infer all stationary points.
An univariate prior of t leads to
Theorem: The posterior of t converges to a mixture of Gaussians as n \rightarrow \infty , and the estimated number of mixture components converges to the number of stationary points.
Mixture of Gaussians
Posterior Inference via Monte Carlo EM
\begin{aligned}
y_i &= f(x_i) + \epsilon_i, ~~ \epsilon_i \sim N(0, \sigma^2), ~~i = 1, \dots, n, \\
f &\sim GP(0, k(\cdot, \cdot; \boldsymbol \theta)) , ~~ f'(t) = 0,\\
t &\sim \pi(t), ~~ \sigma^2 \sim \pi(\sigma^2), ~~ \boldsymbol \theta\sim \pi(\boldsymbol \theta)
\end{aligned}
For iteration i = 1, 2, \dots until convergence:
Monte Carlo E-step : Metropolis-Hastings on t and Gibbs sampling on \sigma^2 .
M-step : Given samples of t and \sigma^2 , update \boldsymbol \theta to \hat{\boldsymbol \theta}^{(i+1)} by setting \begin{aligned}
\hat{\boldsymbol \theta}^{(i+1)} &= \arg \max_{\boldsymbol \theta} \frac{1}{J} \sum_{j=1}^J \log p(\bm{y} \mid t_j^{(i)}, (\sigma_j^2)^{(i)}, \boldsymbol \theta^{(i)})
\end{aligned}
Simulation: Uncertainty Quantification of t and f
f(x) = 0.3 + 0.4x+0.5\sin(3.2x) + 1.1 /(1 + x^2) , x\in [0, 2]
Two stationary points t_1 = 0.44 and t_2 = 1.46 .
\pi(t) = \text{Unif}(0, 2) .
Oracle DGP
Better curve fitting if true stationary points were known.
One thing I’d like to mention is that if the true s.p were known to us, which we called oracle DGP, its curve fitting would be better than the basic GPR. The reason is that now we have more information about how the regression function looks like, and the information is 100% correct. So in general, the uncertainty band of oracle DGP will be narrower than GPR or DGP when s.p is uncertain.
Current ERP Studies
Multi-subject
Averaging across subjects to obtain the grand ERP waveform.
Difference in latency and/or amplitude of ERP components for factors such as age, gender, etc.
Basic ANOVA is conducted.
A time window or search window needs to be specified to frame the range of the components.
No systematic and consistent ways of selecting such windows.
The model we just talked about is the basic skeleton for ERP analysis. But we need a more sophisticated one for addressing current ERP study issues.
Semiparametric Latent ANOVA Model (SLAM)
Estimates the amplitude and latency as well as the ERP waveform in a unified framework .
Supports a multi-factor ANOVA setting that is capable of examining how multiple factors or covariates affect ERP components’ latency and amplitude.
Bayesian approach having probabilistic statements about the amplitude and latency of ERP components.
The hierarchical nature of the model provides estimates at both the grand group and individual subject levels .
Simultaneously estimates the amplitude and latency as well as the ERP waveform in a unified framework.
Supports a multi-factor ANOVA setting that is capable of examining how multiple factors or covariates affect ERP components’ latency and amplitude.
Bayesian approach having probabilistic statements about the amplitude and latency of ERP components.
The hierarchical nature of the model provides estimates at the grand group level as well as the individual subject level that is masked by grand averaging.
SLAM One-way ANOVA - Likelihood with DGP
y_{igs} = f_{gs}(x_i) + \epsilon_{igs}, ~~ \epsilon_{igs} \stackrel{\rm iid}{\sim}N(0, \sigma^2).
Time point i = 1, \dots, n
Factor level g = 1, \dots, G
Subject s = 1, \dots, S_{g}
f_{gs} \mid \bm{t}_{gs} \sim \text{DGP}(0, k(\cdot, \cdot; \boldsymbol \theta), \bm{t}_{gs}), where \bm{t}_{gs} = \{t_{gs}^m\}_{m = 1}^M .
SLAM One-way ANOVA - Prior
\begin{aligned}
t_{gs}^m \mid r_{g}^m, \eta_{g}^m &\stackrel{\rm iid}{\sim}\text{gbeta}\left(r_{g}^m\eta_{g}^m, \, (1 - r_{g}^m)\eta_{g}^m, \, a^{m}, \, b^{m}\right), \, s = 1, \dots, S_{g}.
\end{aligned}
Subject-level t_{gs}^m \in [a^{m}, b^{m}]
Group-level r_{g}^m \in (0, 1) : location of prior mean in [a^{m}, b^{m}]
{\mathbb E}\left(t_{gs}^m\right) = (1 - r_{g}^m)a^{m} + r_{g}^mb^{m}
For the subject-level parameters, we assume it follows general beta distribution that has support a^m, b^m interval, which can be viewed as the most coarse search window for a latency.
The shape parameters are parameterized using r and \eta , so that the group-level parameter r indicates the location of the prior mean in the search window, and tells us a or b is closer to the mean location.
It represents the relative location of the interval, or the weight between the two end points of the search window.
SLAM One-way ANOVA - Link Function
With the link function \phi(\cdot) \in (-\infty, \infty) , and z_j \in \{0, 1\}, j = 1, \dots, G-1,
\phi\left(r_{g}^m\right) = \beta_0^m + \beta_{1}^mz_{1} + \cdots + \beta_{G-1}^mz_{G-1}.
Accommodate any linear model with categorical and numerical variables
Being able to describe interactions
Any link function with domain (0, 1) can be used, logit , probit , complementary log-log , etc.
To do the inference about r , we introduce an ANOVA structure to model how the factor the location of ERP component.
Since r is bounded between 0 and 1, we use a link function \phi() followed by an ANOVA which is written as a regression or linear model using dummy variables.
SLAM One-way ANOVA - Other Priors
\begin{aligned}
\beta_{j}^m &\sim N(\mu_j^m, (\sigma_j^m)^2), ~~ j = 0, 1, \dots, G-1\\
\eta_{g}^m &\sim Ga(\alpha_{\eta}, \beta_{\eta}),\\
\pi(\sigma^2, \tau^2, h) &= \pi(\sigma^2) \pi(\tau^2) \pi(h) = IG(\alpha_{\sigma}, \beta_{\sigma})IG(\alpha_{\tau}, \beta_{\tau})Ga(\alpha_{h}, \beta_{h}).
\end{aligned}
SLAM One-way ANOVA - Inference
For iteration i = 1, 2, \dots until convergence:
Monte Carlo E-step :
Metropolis-Hastings on \mathbf{t} , \beta_{j}^m , \eta_g^m , and Gibbs sampling on \sigma^2 .
M-step : Given samples of \mathbf{t} and \sigma^2 , update \boldsymbol \theta to \hat{\boldsymbol \theta}^{(i+1)} by setting \begin{aligned}
\hat{\boldsymbol \theta}^{(i+1)} &= \arg \max_{\boldsymbol \theta} \frac{1}{L} \sum_{l=1}^L \log p(\bm{y} \mid \mathbf{t}_l^{(i)}, (\sigma_l^2)^{(i)}, \boldsymbol \theta^{(i)})
\end{aligned}
where
\bm{y}_{gs} \mid \mathbf{t}_{gs}, \hat{\boldsymbol \theta}^{(i)} \stackrel{\rm ind}{\sim}N\left(\boldsymbol \mu_{gs}, \boldsymbol \Sigma_{gs} +\sigma^2\mathbf{I}\right).
Simulation - Data
For subject s = 1, 2, \dots, 10 , group g = 1, 2 ,
\begin{aligned}
f(x)_{1s} = -2\sin(2 \pi x + s/15 - 0.3),\\
f(x)_{2s} = \cos(2\pi x + s/10 + 1.2) - 3x
\end{aligned}
Data of size n=100 are generated with \sigma = 0.25
Point and Interval Estimation of Subject-level Latency
Point and Interval Estimation of Subject-level Latency
Uncertainty Quantification of Group-level Latency
Latency root mean square error
SLAM-posterior mean
0.0037 (0.0000)
0.0068 (0.0002)
SLAM-median
0.0036 (0.0001)
0.0068 (0.0002)
LOESS1
0.0119 (0.0021)
0.0251 (0.0036)
Amplitude Estimates
Amplitude root mean square error
SLAM-posterior mean
0.0421 (0.0004)
0.0582 (0.0016)
SLAM-median
0.0421 (0.0005)
0.0581 (0.0016)
LOESS 1
0.0661 (0.0021)
0.0654 (0.0066)
The amplitude estimation relies on a search window that is specified by the posterior of t or r .
Simulation from the Model
Parameter
True value
10 subjects
20 subjects
10 subjects
20 subjects
r_1^1
0.57
0.54 (0.033)
0.56 (0.027)
0.077 (0.016)
0.050 (0.012)
r_1^2
0.43
0.46 (0.028)
0.44 (0.024)
0.077 (0.012)
0.051 (0.011)
r_2^1
0.45
0.45 (0.035)
0.45 (0.029)
0.075 (0.010)
0.051 (0.009)
r_2^2
0.67
0.63 (0.033)
0.65 (0.025)
0.084 (0.017)
0.052 (0.010)
\beta_0^1
0.3
0.17 (0.135)
0.24 (0.109)
0.314 (0.064)
0.205 (0.049)
\beta_0^2
-0.3
-0.17 (0.114)
-0.24 (0.100)
0.317 (0.049)
0.210 (0.043)
\beta_1^1
-0.5
-0.38 (0.276)
-0.44 (0.222)
0.463 (0.115)
0.319 (0.100)
\beta_1^2
1
0.72 (0.244)
0.88 (0.203)
0.538 (0.127)
0.351 (0.093)
\sigma
0.5
0.57 (0.003)
0.57 (0.003)
0.074 (0.002)
0.074 (0.002)
ERP Data
Experiment on speech recognition.
The N100 (60 ~ 140 msec) component captures phonological (syllable) representation.
P200 (140 ~ 220 msec) component represents higher-order perceptual processing.
The experiment involved 18 college-age students and 11 older controls, as aging is known to cause difficulties in perceiving speech, especially in noisy environments.
The older subjects have ages ranging from 47 to 91 years old with a mean age of 67.78 years.
The experiment resulted in a total of 2304 trials.
The six electrodes Fp1, F3, F7, C3, FC5, and T7
Latency Difference Between Subjects
Latency Difference Between Groups
Amplitude Difference Between Groups
Conclusion
Novel Bayesian SLAM estimates the latency and amplitude of ERP components with uncertainty quantification.
Solves the search window specification problem by generating posterior distribution of latency.
Incorporates the ANOVA and possibly a latent generalized linear model structure in a single unified framework.
Examines both individual and group differences that facilitate comparing different characteristics or factors, such as age/gender.
R package slam at https://github.com/chenghanyustats/slam
Working in progress:
autoregressive correlated noises or trial variability.
2D spatial modeling and mapping on electrodes.