Bayesian Modeling of Event-Related Potentials 🧠

IISA-2023

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

6/3/23

Electroencephalography (EEG)

Source: https://en.wikipedia.org/wiki/Electroencephalography

The raw EEG can indicate general state, e.g., awake or asleep, but it can’t tell us much about specific neural processes that underlie perception, cognition or emotion.

EEG to Event-Related Potentials (ERP)

Source: ERP Boot Camp (Luck and Kappenman, 2020)

Event-Related Potentials (ERP)

Source: Luck (2012)

To pull out the brain’s consist response to some type of event, we can simply average across the epochs for that event type. When we average across enough epochs, any activity that’s consistent from trial to trial remains in the average, and any random noise simply averages out.

What ERP Can Do

Source: ERP Boot Camp (Luck and Kappenman, 2020)

Estimate ERP Components

  • Interested in estimating the amplitude and latency of some ERP component.

Source: Rossion & Jacques (2012)

Inference Methods

  • Current methods
    • grand ERP waveform by lots of averaging
    • no uncertainty quantification
    • filtering for removing drifts or noises
    • component search window specification
    • separate ANOVA for control-experimental comparison

Solve the issues by

  • Bayesian hierarchical model with the derivative Gaussian process prior
    • Yu et al., Biometrics (2022)
    • Li et al., JASA T&M (2023)
    • Yu et al., AoAS (2023 expected)

Derivative Gaussian Process (DGP)

  • Force f′=0f' = 0f′=0 at x=−4x = -4x=−4, 000, and 444, and all paths follow the constraint.
  • The stationary points can be a maximum, minimum or inflection point.
  • Add 3 constraints, but a path will have at least 3 stationary points.

Shaped-Constrained Regression

yi=f(xi)+ϵi,  ϵi∼N(0,σ2),  i=1,…,n,f′(tm)=0,  m=1,…,M,\begin{align*} y_i &= f(x_i) + \epsilon _i, ~~ \epsilon _i \sim N(0, \sigma^2), ~~ i = 1, \dots, n, \\ f'(t_m) &= 0, ~~m = 1, \dots, M, \end{align*}yi​f′(tm​)​=f(xi​)+ϵi​,  ϵi​∼N(0,σ2),  i=1,…,n,=0,  m=1,…,M,​

  • In ERP applications,
    • {yi}i=1n\{y_i\}_{i=1}^n{yi​}i=1n​ are the ERP time series observations.
    • f(⋅)f(\cdot)f(⋅) is the true underlying smooth ERP waveform.
    • MMM peaks at t1,…,tMt_1, \dots, t_Mt1​,…,tM​ of a ERP waveform fff are such that f′(tm)=0f'(t_m) = 0f′(tm​)=0.

Derivative Gaussian Process Prior

  • With the derivative-constrained regression, and a GP prior on fff, f∼GP(μ(⋅),k(⋅,⋅))f \sim GP(\mu(\cdot), k(\cdot, \cdot))f∼GP(μ(⋅),k(⋅,⋅)), we have DGP prior of fff, DGP(μ,k,t)DGP(\mu, k, \mathbf{t})DGP(μ,k,t), t=(t1,…,tM)\mathbf{t}= (t_1, \dots, t_M)t=(t1​,…,tM​)
  • Mean function is xi↦μ(xi)−k01(xi,t)k11−1(t,t)μ′(t)x_i \mapsto \mu(x_i) - k_{01}(x_i, \bm{t}) k_{11}^{-1}(\bm{t}, \bm{t}) \mu'(\bm{t})xi​↦μ(xi​)−k01​(xi​,t)k11−1​(t,t)μ′(t) where μ′()\mu'()μ′() is the mean function of f′f'f′.

[k01(x,t)]ij:=∂∂tjk(xi,tj),  [k11(t,t)]ij:=∂2∂ti∂tjk(ti,tj)[k_{01}(\bm{x}, \bm{t})]_{ij} := \frac{\partial }{\partial t_j}k(x_i, t_j), ~~[k_{11}(\bm{t}, \bm{t})]_{ij} := \frac{\partial ^2 }{\partial t_i \partial t_j}k(t_i, t_j)[k01​(x,t)]ij​:=∂tj​∂​k(xi​,tj​),  [k11​(t,t)]ij​:=∂ti​∂tj​∂2​k(ti​,tj​)

  • Covariance kernel is (xi,xj)↦k(xi,xj)−k01(xi,t)k11−1(t,t)k10(t,xj)(x_i, x_j) \mapsto k(x_i, x_j) - k_{01}(x_i, \bm{t}) k_{11}^{-1}(\bm{t}, \bm{t}) k_{10}(\bm{t}, x_j)(xi​,xj​)↦k(xi​,xj​)−k01​(xi​,t)k11−1​(t,t)k10​(t,xj​)

k10(x,t)=k01T(x,t)k_{10}(\bm{x}, \bm{t}) = k_{01}^T(\bm{x}, \bm{t})k10​(x,t)=k01T​(x,t)

Prior on Stationary Points

  • t1,…,tMt_1, \dots, t_Mt1​,…,tM​ are parameters to be estimated.

  • Challenge: Unknown number of stationary points MMM.

  • Use an univariate prior t∼π(t)t \sim \pi(t)t∼π(t) that corresponds to assuming that fff has at least one stationary point, and utilize the posterior of ttt to infer all stationary points.

An univariate prior of ttt leads to

  • Efficient computation:

    • The covariance matrix only adds one more dimension to from nnn by nnn to (n+1)(n+1)(n+1) by (n+1)(n+1)(n+1).
    • Simple one-dimensional sampling on ttt.
  • Avoids possible misspecification of MMM that distorts the fitted result.

  • Theorem: The posterior of ttt converges to a mixture of Gaussians as n→∞n \rightarrow \inftyn→∞, and the estimated number of mixture components converges to the number of stationary points.

Mixture of Gaussians

Posterior Inference via Monte Carlo EM

yi=f(xi)+ϵi,  ϵi∼N(0,σ2),  i=1,…,n,f∼GP(0,k(⋅,⋅;θ)),  f′(t)=0,t∼π(t),  σ2∼π(σ2),  θ∼π(θ)\begin{align*} y_i &= f(x_i) + \epsilon_i, ~~ \epsilon_i \sim N(0, \sigma^2), ~~i = 1, \dots, n, \\ f &\sim GP(0, k(\cdot, \cdot; \boldsymbol \theta)) , ~~ f'(t) = 0,\\ t &\sim \pi(t), ~~ \sigma^2 \sim \pi(\sigma^2), ~~ \boldsymbol \theta\sim \pi(\boldsymbol \theta) \end{align*}yi​ft​=f(xi​)+ϵi​,  ϵi​∼N(0,σ2),  i=1,…,n,∼GP(0,k(⋅,⋅;θ)),  f′(t)=0,∼π(t),  σ2∼π(σ2),  θ∼π(θ)​

For iteration i=1,2,…i = 1, 2, \dotsi=1,2,… until convergence:

  • Monte Carlo E-step: Metropolis-Hastings on ttt and Gibbs sampling on σ2\sigma^2σ2.

  • M-step: Given samples of ttt and σ2\sigma^2σ2, update θ\boldsymbol \thetaθ to θ^(i+1)\hat{\boldsymbol \theta}^{(i+1)}θ^(i+1) by setting θ^(i+1)=arg⁡max⁡θ1J∑j=1Jlog⁡p(y∣tj(i),(σj2)(i),θ(i))\begin{align*} \hat{\boldsymbol \theta}^{(i+1)} &= \arg \max_{\boldsymbol \theta} \frac{1}{J} \sum_{j=1}^J \log p(\bm{y} \mid t_j^{(i)}, (\sigma_j^2)^{(i)}, \boldsymbol \theta^{(i)}) \end{align*}θ^(i+1)​=argθmax​J1​j=1∑J​logp(y∣tj(i)​,(σj2​)(i),θ(i))​

Simulation: Uncertainty Quantification of ttt and fff

  • f(x)=0.3+0.4x+0.5sin⁡(3.2x)+1.1/(1+x2)f(x) = 0.3 + 0.4x+0.5\sin(3.2x) + 1.1 /(1 + x^2)f(x)=0.3+0.4x+0.5sin(3.2x)+1.1/(1+x2), x∈[0,2]x\in [0, 2]x∈[0,2]
  • Two stationary points t1=0.44t_1 = 0.44t1​=0.44 and t2=1.46t_2 = 1.46t2​=1.46.
  • π(t)=Unif(0,2)\pi(t) = Unif(0, 2)π(t)=Unif(0,2).

Oracle DGP

  • Better curve fitting if true stationary points were known.

Current ERP Studies

  • Multi-subject
    • Averaging across subjects to obtain the grand ERP waveform.
  • Difference in latency and/or amplitude of ERP components for factors such as age, gender, etc.
    • Basic ANOVA is conducted.
  • A time window or search window needs to be specified to frame the range of the components.
    • No systematic and consistent ways of selecting such windows.

Source: Rossion & Jacques (2012)

Semiparametric Latent ANOVA Model (SLAM)

  • Estimates the amplitude and latency as well as the ERP waveform in a unified framework.

  • Supports a multi-factor ANOVA setting that is capable of examining how multiple factors or covariates affect ERP components’ latency and amplitude.

  • Bayesian approach having probabilistic statements about the amplitude and latency of ERP components.

  • The hierarchical nature of the model provides estimates at both the grand group and individual subject levels.

  • Simultaneously estimates the amplitude and latency as well as the ERP waveform in a unified framework.

  • Supports a multi-factor ANOVA setting that is capable of examining how multiple factors or covariates affect ERP components’ latency and amplitude.

  • Bayesian approach having probabilistic statements about the amplitude and latency of ERP components.

  • The hierarchical nature of the model provides estimates at the grand group level as well as the individual subject level that is masked by grand averaging.

SLAM Two-way ANOVA example - Likelihood with DGP

yigls=fgls(xi)+ϵigls,  ϵigls∼iidN(0,σ2).\begin{equation*} y_{igls} = f_{gls}(x_i) + \epsilon_{igls}, ~~ \epsilon_{igls} \stackrel{\rm iid}{\sim}N(0, \sigma^2). \end{equation*}yigls​=fgls​(xi​)+ϵigls​,  ϵigls​∼iidN(0,σ2).​

  • Time point i=1,…,ni = 1, \dots, ni=1,…,n

  • Factor AAA level g=1,…,Gg = 1, \dots, Gg=1,…,G

  • Factor BBB level l=1,…,Ll = 1, \dots, Ll=1,…,L

  • Subject s=1,…,Sgls = 1, \dots, S_{gl}s=1,…,Sgl​

fgls∣tgls∼DGP(0,k(⋅,⋅;θ),tgls),\begin{equation*} f_{gls} \mid \bm{t}_{gls} \sim \text{DGP}(0, k(\cdot, \cdot; \boldsymbol \theta), \bm{t}_{gls}), \end{equation*}fgls​∣tgls​∼DGP(0,k(⋅,⋅;θ),tgls​),​ where tgls={tglsm}m=1M\bm{t}_{gls} = \{t_{gls}^m\}_{m = 1}^Mtgls​={tglsm​}m=1M​.

SLAM Two-way ANOVA example - Prior

tglsm∣rglm,ηglm∼iidgbeta(rglmηglm, (1−rglm)ηglm, am, bm), s=1,…,Sgl.\begin{align*} t_{gls}^m \mid r_{gl}^m, \eta_{gl}^m &\stackrel{\rm iid}{\sim}\text{gbeta}\left(r_{gl}^m\eta_{gl}^m, \, (1 - r_{gl}^m)\eta_{gl}^m, \, a^{m}, \, b^{m}\right), \, s = 1, \dots, S_{gl}. \end{align*}tglsm​∣rglm​,ηglm​​∼iidgbeta(rglm​ηglm​,(1−rglm​)ηglm​,am,bm),s=1,…,Sgl​.​

  • Subject-level tglsm∈[am,bm]t_{gls}^m \in [a^{m}, b^{m}]tglsm​∈[am,bm]

  • Group-level rglm∈(0,1)r_{gl}^m \in (0, 1)rglm​∈(0,1): location of prior mean in [am,bm][a^{m}, b^{m}][am,bm]

  • E(tglsm)=(1−rglm)am+rglmbm{\mathbb E}\left(t_{gls}^m\right) = (1 - r_{gl}^m)a^{m} + r_{gl}^mb^{m}E(tglsm​)=(1−rglm​)am+rglm​bm

SLAM Two-way ANOVA example - Link Function

Linear model and link function ϕglm∈(−∞,∞)\phi_{gl}^m \in (-\infty, \infty)ϕglm​∈(−∞,∞)

ϕglm(rglm)=β0m+β1,1mz1,1+⋯+β1,G−1mz1,G−1+β2,1mz2,1+⋯+β2,L−1mz2,L−1.\begin{equation*} \phi_{gl}^m\left(r_{gl}^m\right) = \beta_0^m + \beta_{1, 1}^mz_{1, 1} + \cdots + \beta_{1, G-1}^mz_{1, G-1}+ \beta_{2, 1}^mz_{2, 1} + \cdots + \beta_{2, L-1}^mz_{2, L-1}. \end{equation*}ϕglm​(rglm​)=β0m​+β1,1m​z1,1​+⋯+β1,G−1m​z1,G−1​+β2,1m​z2,1​+⋯+β2,L−1m​z2,L−1​.​

  • Accommodate any linear model with categorical and numerical variables

  • Being able to describe interactions

  • Any link function can be used, logit, probit, complementary log-log, etc.

SLAM Two-way ANOVA example - Other Priors

β0m∼N(μ0m,(σ0m)2),β1,am∼N(μ1m,(σ1m)2),β2,bm∼N(μ2m,(σ2m)2),ηglm∼Ga(αηm,βηm),σ2∼IG(ασ,βσ),θ∼π(θ)\begin{align*} \beta_0^m &\sim N(\mu_0^m, (\sigma_0^m)^2),\\ \beta_{1, a}^m &\sim N(\mu_1^m, (\sigma_1^m)^2), \\ \beta_{2, b}^m &\sim N(\mu_2^m, (\sigma_2^m)^2), \\ \eta_{gl}^m &\sim Ga(\alpha_{\eta}^m, \beta_{\eta}^m),\\ \sigma^2 &\sim IG(\alpha_{\sigma}, \beta_{\sigma}),\\ \boldsymbol \theta&\sim \pi(\boldsymbol \theta) \end{align*}β0m​β1,am​β2,bm​ηglm​σ2θ​∼N(μ0m​,(σ0m​)2),∼N(μ1m​,(σ1m​)2),∼N(μ2m​,(σ2m​)2),∼Ga(αηm​,βηm​),∼IG(ασ​,βσ​),∼π(θ)​

Simulation

Data are generated with σ=0.25\sigma = 0.25σ=0.25

Point and Interval Estimation of Subject-level Latency

Point and Interval Estimation of Subject-level Latency

Uncertainty Quantification of Group-level Latency

Curve Fitting

Amplitude Estimates

ERP Data

  • Experiment on speech recognition

  • The N100 (60 ~ 140 msec) component captures phonological (syllable) representation.

  • P200 (140 ~ 220 msec) component represents higher-order perceptual processing.

Latency Difference Between Groups

Amplitude Difference Between Groups

Conclusion

  • Novel Bayesian model SLAM estimates the amplitude and latency of ERP components with uncertainty quantification.

  • Solves the search window specification problem by generating posterior distribution of latency.

  • Incorporates the ANOVA and possibly a latent generalized linear model structure in a single unified framework.

  • Examines both the subjects’ individual differences and group-level differences that facilitate comparing different characteristics or factors, such as age/gender.

  • Extensions could include treatments of other noise distributions, for example, the autoregressive correlation or trial variability.

  • Spatial modeling and mapping on electrodes would be another extension.

1 / 31
Bayesian Modeling of Event-Related Potentials 🧠 IISA-2023 Dr. Cheng-Han Yu Department of Mathematical and Statistical Sciences Marquette University 6/3/23

  1. Slides

  2. Tools

  3. Close
  • Bayesian Modeling of Event-Related Potentials 🧠
  • Electroencephalography (EEG)
  • EEG to Event-Related Potentials (ERP)
  • Event-Related Potentials (ERP)
  • What ERP Can Do
  • Estimate ERP Components
  • Inference Methods
  • Derivative Gaussian Process (DGP)
  • Shaped-Constrained Regression
  • Derivative Gaussian Process Prior
  • Prior on Stationary Points
  • Mixture of Gaussians
  • Posterior Inference via Monte Carlo EM
  • Simulation: Uncertainty Quantification of t and f
  • Oracle DGP
  • Current ERP Studies
  • Semiparametric Latent ANOVA Model (SLAM)
  • SLAM Two-way ANOVA example - Likelihood with DGP
  • SLAM Two-way ANOVA example - Prior
  • SLAM Two-way ANOVA example - Link Function
  • SLAM Two-way ANOVA example - Other Priors
  • Simulation
  • Point and Interval Estimation of Subject-level Latency
  • Point and Interval Estimation of Subject-level Latency
  • Uncertainty Quantification of Group-level Latency
  • Curve Fitting
  • Amplitude Estimates
  • ERP Data
  • Latency Difference Between Groups
  • Amplitude Difference Between Groups
  • Conclusion
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help