Bayesian Modeling of Event-Related Potentials 🧠

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

2024-05-03

Electroencephalography (EEG)

Source: https://en.wikipedia.org/wiki/Electroencephalography

The raw EEG can indicate general state, e.g., awake or asleep, but it can’t tell us much about specific neural processes that underlie perception, cognition or emotion.

What ERP Can Do

Source: ERP Boot Camp (Luck and Kappenman, 2020)

Estimate ERP Components

Interested in estimating the amplitude and latency of some ERP component.

Inference Methods

Current methods

Grand ERP by averaging across subjects: lost subject-level information.
No uncertainty quantification.
Filtering for removing drifts or noises: no rule of thumb for which frequency/bandwidth should be used.
Pre-specify component search window for quantifying amplitude: no systematic ways of selecting it.
Separate models for control-experimental comparison.

Solve these issues by

Bayesian hierarchical model with a derivative Gaussian process prior: Yu et al., Biometrics (2022), Li et al., JASA T & M (2023), Yu et al., Data Science in Science (2024)

Gaussian Variables

Multivariate Gaussian

\mathbf{Y}\in \mathbb{R}^d \sim N_d\left(\boldsymbol \mu, \boldsymbol \Sigma\right)

Bivariate Gaussian (d=2):

\mathbf{Y}= \begin{pmatrix} Y_1 \\ Y_2 \end{pmatrix} \boldsymbol \mu= \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix} \sigma_1^2 & \rho_{12}\sigma_1\sigma_2\\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix}

Suppose we draw a bunch of Gaussian variables, and we want to visualize them.

Two Independent Standard Normals

\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0\\ 0 & 1 \end{pmatrix}

Two Different Independent Normals

\boldsymbol \mu= \begin{pmatrix} 0 \\ 1 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0\\ 0 & 0.2 \end{pmatrix}

Two Correlated Standard Normals

\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix} \boldsymbol \Sigma= \begin{pmatrix}1 & 0.9\\ 0.9 & 1 \end{pmatrix}

Three or More Variables

Hard to visualize in dimensions > 2, so stack points next to each other.

\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \boldsymbol \Sigma= \begin{pmatrix}1 & 0.99\\ 0.99 & 1 \end{pmatrix}

d = 5

\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \\ 0\\0\\0\end{pmatrix} \quad \quad \boldsymbol \Sigma= \begin{pmatrix}1 & 0.99 & 0.98 & 0.97 & 0.96\\ 0.99 & 1 & 0.99 & 0.98 & 0.97 \\ 0.98 & 0.99 & 1 & 0.99 & 0.98 \\ 0.97 & 0.98 & 0.97 & 1 & 0.99\\0.96 & 0.97 & 0.98 & 0.99 & 1 \end{pmatrix}

Each line is one sample (path).

d = 50

\boldsymbol \mu= \begin{pmatrix} 0 \\ 0 \\ \vdots \\0\\0\end{pmatrix} \quad \quad \boldsymbol \Sigma= \begin{pmatrix} 1 & 0.99 & 0.98 & 0.97 & 0.96 & \cdots \\ 0.99 & 1 & 0.99 & 0.98 & 0.97 & \cdots\\ 0.98 & 0.99 & 1 & 0.99 & 0.98 & \cdots\\ 0.97 & 0.98 & 0.97 & 1 & 0.99 & \cdots\\ 0.96 & 0.97 & 0.98 & 0.99 & 1 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \end{pmatrix}

Each line is one sample (path).
Think of Gaussian processes as an infinite dimensional distribution over functions
- all we need to do is change notation

Gaussian Process (GP)

A Gaussian process (GP) is a process f(\cdot): \mathbb{R}^p \rightarrow \mathbb{R} where any finite n-dimensional distributions are Gaussian, for any x_1\dots, x_n \in \mathbb{R}^p:

f(x_1), \dots, f(x_n) \sim N_n\left(\boldsymbol \mu, \boldsymbol \Sigma\right)

Write f(\cdot) \sim GP to denote that the function f is a GP.
The underlying true ERP waveform is assumed a GP.

Mean and Covariance Function

To fully specify a GP, need the mean and covariance function, f(\cdot) \sim GP\left(m(\cdot), k(\cdot, \cdot)\right) where

\text{E}(f(x)) = m(x) \text{Cov}(f(x), f(x')) = k(x, x')

Popular choices of m(\cdot) are m(x) = 0 or \text{const} for all x, or m(x) = \beta'x
Care more about the covariance function or kernel as it governs the how the process looks like by defining the similarity between data points.

Squared Exponential (SE) Kernel

k(x, x' \mid \tau, h) = \tau^2 \exp\left(-\frac{(x - x')^2}{2h^2} \right)

k(x, x') = \exp\left(-\frac{1}{2}(x - x')^2 \right)

The parameter h is the characteristic length-scale that controls the number of level-zero upcrossings.

Squared Exponential (SE) Kernel

k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{0.25}}^2} \right)

Squared Exponential (SE) Kernel

k(x, x') = \exp\left(-\frac{1}{2}\frac{(x - x')^2}{ {\color{red}{4}}^2} \right)

Squared Exponential (SE) Kernel

k(x, x') = {\color{red}{10^2}}\exp\left(-\frac{1}{2}(x - x')^2 \right)

The parameter \tau^2 is the variance of f(x) that controls the vertical variation of the process.

Derivative Gaussian Process (DGP)

Force f' = 0 at x = -4, 0, and 4, and all paths follow the constraint.
The stationary points can be a maximum, minimum or inflection point.
Add 3 constraints, but a path will have at least 3 stationary points.

Shaped-Constrained DGP Regression

\begin{aligned} y_i &= f(x_i) + \epsilon _i, ~~ \epsilon _i \sim N(0, \sigma^2), ~~ i = 1, \dots, n, \\ f(\cdot) &\sim GP\\ f'(t_m) &= 0, ~~m = 1, \dots, M. \end{aligned}

In ERP applications,
- \{y_i\}_{i=1}^n: the ERP time series observations.
- f(\cdot): the true underlying smooth ERP waveform.
- t_1, \dots, t_M: the latency of the ERP components.

Prior on Stationary Points

t_1, \dots, t_M are parameters to be estimated.
In ERP studies, M is usually known.
Use an univariate prior t \sim \pi(t) that corresponds to assuming that f has at least one critical point, and utilize the posterior of t to infer all critical points.

An univariate prior of t leads to

Efficient computation:
- The covariance matrix only adds one more dimension to from n by n to (n+1) by (n+1).
- Simple one-dimensional sampling on t.
Avoids possible misspecification of M that distorts the fitted result.

Theorem: The posterior of t converges to a mixture of Gaussians as n \rightarrow \infty, and the estimated number of mixture components converges to the number of stationary points.

Mixture of Gaussians

Posterior Inference via Monte Carlo EM

\begin{aligned} y_i &= f(x_i) + \epsilon_i, ~~ \epsilon_i \sim N(0, \sigma^2), ~~i = 1, \dots, n, \\ f &\sim GP(0, k(\cdot, \cdot; \boldsymbol \theta)) , ~~ f'(t) = 0,\\ t &\sim \pi(t), ~~ \sigma^2 \sim \pi(\sigma^2), ~~ \boldsymbol \theta\sim \pi(\boldsymbol \theta) \end{aligned}

For iteration i = 1, 2, \dots until convergence:

Monte Carlo E-step: Metropolis-Hastings on t and Gibbs sampling on \sigma^2.
M-step: Given samples of t and \sigma^2, update \boldsymbol \theta to \hat{\boldsymbol \theta}^{(i+1)} by setting \begin{aligned} \hat{\boldsymbol \theta}^{(i+1)} &= \arg \max_{\boldsymbol \theta} \frac{1}{J} \sum_{j=1}^J \log p(\bm{y} \mid t_j^{(i)}, (\sigma_j^2)^{(i)}, \boldsymbol \theta^{(i)}) \end{aligned}

Simulation: Uncertainty Quantification of t and f

Two stationary points t_1 = 0.44 and t_2 = 1.46

\pi(t) = \text{Unif}(0, 2)

Oracle DGP

Better curve fitting if true critical points were known.

Model Extension

Current ERP Studies

Multi-subject
- Averaging across subjects to obtain the grand ERP waveform.
Difference in latency and/or amplitude of ERP components for factors such as age, gender, etc.
- Basic ANOVA is conducted.
A time search window needs to be specified to frame the range of the components.
- No systematic and consistent ways of selecting such windows.

Semiparametric Latent ANOVA Model (SLAM)

Estimate all parameters in a unified framework.
Supports a multi-factor ANOVA setting that is capable of examining how multiple factors affect the latency and amplitude.
Bayesian probabilistic statements about the amplitude and latency
The hierarchical model provides estimates at both group and subject levels.

SLAM One-way ANOVA - Likelihood with DGP

y_{igs} = f_{gs}(x_i) + \epsilon_{igs}, ~~ \epsilon_{igs} \stackrel{\rm iid}{\sim}N(0, \sigma^2).

Time point i = 1, \dots, n
Factor level g = 1, \dots, G, (g = \color{blue}{\text{young}, \text{old}} for Age factor)
Subject s = 1, \dots, S_{g} (\color{blue}{S_{young} = 18; S_{old} = 11})

f_{gs} \mid \bm{t}_{gs} \sim \text{DGP}(0, k(\cdot, \cdot; \boldsymbol \theta), \bm{t}_{gs}), where \bm{t}_{gs} = \{t_{gs}^m\}_{m = 1}^M.

M is pre-specified.

SLAM One-way ANOVA - Prior

\begin{aligned} t_{gs}^m \mid r_{g}^m, \eta_{g}^m &\stackrel{\rm iid}{\sim}\text{gbeta}\left(r_{g}^m\eta_{g}^m, \, (1 - r_{g}^m)\eta_{g}^m, \, a^{m}, \, b^{m}\right), \, s = 1, \dots, S_{g}. \end{aligned}

Subject-level t_{gs}^m \in [a^{m}, b^{m}] (\color{blue}{t_{young, 1}^{N100} \in [60, 140]; t_{old, 2}^{P200} \in [140, 220]})
Group-level r_{g}^m \in (0, 1): prior mean location in [a^{m}, b^{m}]
{\mathbb E}\left(t_{gs}^m\right) = (1 - r_{g}^m)a^{m} + r_{g}^mb^{m}

SLAM One-way ANOVA - Link Function

With the link function \phi(\cdot) \in (-\infty, \infty), and z_j \in \{0, 1\}, j = 1, \dots, G-1,

\phi\left(r_{g}^m\right) = \beta_0^m + \beta_{1}^mz_{1} + \cdots + \beta_{G-1}^mz_{G-1}.

\color{blue}{\phi\left(r_{young}^{N100}\right) = \beta_0^{N100}}.

\color{blue}{\phi\left(r_{old}^{N100}\right) = \beta_0^{N100} + \beta_{1}^{N100}}.

Accommodate any linear model with categorical and numerical variables
Being able to describe interactions
Any link function with domain (0, 1) can be used, logit, probit, complementary log-log, etc.

SLAM One-way ANOVA - Other Priors

\begin{aligned} \beta_{j}^m &\sim N(\mu_j^m, (\sigma_j^m)^2), ~~ j = 0, 1, \dots, G-1\\ \eta_{g}^m &\sim Ga(\alpha_{\eta}, \beta_{\eta}),\\ \pi(\sigma^2, \tau^2, h) &= \pi(\sigma^2) \pi(\tau^2) \pi(h) = IG(\alpha_{\sigma}, \beta_{\sigma})IG(\alpha_{\tau}, \beta_{\tau})Ga(\alpha_{h}, \beta_{h}). \end{aligned}

SLAM One-way ANOVA - Inference

Still use Monte Carlo EM for posterior inference.

Everybody’s tired. Check algorithm details in the paper. Let’s see some analysis results!

Simulation - Data

10 subjects (regression functions) in each group
Data of size n=100 are generated with \sigma = 0.25

Point and Interval Estimation of Subject-level Latency

Point and Interval Estimation of Subject-level Latency

Uncertainty Quantification of Group-level Latency

Latency root mean square error (RMSE)
Method	Sine	Cosine
SLAM-posterior mean	0.004 (0.0000)	0.007 (0.0002)
LOESS¹	0.012 (0.0021)	0.025 (0.0036)

Smaller RMSE with less variation.

Method	Sine	Cosine
SLAM-posterior mean	0.0037 (0.0000)	0.0068 (0.0002)
SLAM-median	0.0036 (0.0001)	0.0068 (0.0002)
LOESS	0.0119 (0.0021)	0.0251 (0.0036)

Curve Fitting

Amplitude Estimates

Amplitude root mean square error
Method	Sine	Cosine
SLAM-posterior mean	0.04 (0.0004)	0.06 (0.0016)
LOESS	0.07 (0.0021)	0.07 (0.0066)

The amplitude estimation relies on a search window that is specified by the posterior of t or r.

Method	Sine	Cosine
SLAM-posterior mean	0.0421 (0.0004)	0.0582 (0.0016)
SLAM-median	0.0421 (0.0005)	0.0581 (0.0016)
LOESS	0.0661 (0.0021)	0.0654 (0.0066)

ERP Data

Experiment on speech recognition. Choose the word you hear from
- date-tate vs. dape-tape
- gate-cate vs. gake-cake.
N100 (60 ~ 140 msec) component captures phonological (syllable) representation.
P200 (140 ~ 220 msec) component represents higher-order perceptual processing.

Latency Difference Between Subjects

Latency Difference Between Groups

The older group has longer N100 and P200 latency.
The time lag between N100 and P200 in the young group is shorter.

Amplitude Difference Between Groups

The older group has larger N100 amplitude.

Conclusion

Novel Bayesian SLAM estimates the latency and amplitude of ERP components with uncertainty quantification.
Solves the search window specification problem by generating posterior distribution of latency.
Incorporates the ANOVA and possibly a latent generalized linear model structure in a single unified framework.
Examines both individual and group differences that facilitate comparing different characteristics or factors, such as age/gender.
R package slam at https://github.com/chenghanyustats/slam
Working in progress:
- trial variability
- autoregressive correlated noises
- spatial modeling and mapping on electrodes.

Bayesian Modeling of Event-Related Potentials 🧠

Why Bayesian and Event-Related Potentials (ERP)?

Why Bayesian and Event-Related Potentials (ERP)?

Electroencephalography (EEG)

EEG to Event-Related Potentials (ERP)

Event-Related Potentials (ERP)

What ERP Can Do

Estimate ERP Components

Inference Methods

Gaussian Variables

Multivariate Gaussian

Two Independent Standard Normals

Two Different Independent Normals

Two Correlated Standard Normals

Three or More Variables

Gaussian Process (GP)

Mean and Covariance Function

Squared Exponential (SE) Kernel

Squared Exponential (SE) Kernel

Squared Exponential (SE) Kernel

Squared Exponential (SE) Kernel

Derivative Gaussian Process (DGP)

Shaped-Constrained DGP Regression

Prior on Stationary Points

Mixture of Gaussians

Posterior Inference via Monte Carlo EM

Simulation: Uncertainty Quantification of t and f

Oracle DGP

Model Extension

Semiparametric Latent ANOVA Model (SLAM)

SLAM One-way ANOVA - Likelihood with DGP

SLAM One-way ANOVA - Prior

SLAM One-way ANOVA - Link Function

SLAM One-way ANOVA - Other Priors

SLAM One-way ANOVA - Inference

Simulation - Data

Point and Interval Estimation of Subject-level Latency

Point and Interval Estimation of Subject-level Latency

Uncertainty Quantification of Group-level Latency

Curve Fitting

Amplitude Estimates

ERP Data

Latency Difference Between Subjects

Latency Difference Between Groups

Amplitude Difference Between Groups

Conclusion