+ - 0:00:00
Notes for current slide
Notes for next slide

Bayesian Modeling of Complex-valued fMRI 🧠

Spatiotemporal modeling via kernel convolution

Dr. Cheng-Han Yu

Mathematical and Statistical Sciences
Marquette University

Statistics group, KAUST March 21 2022

1 / 39

(Task-based) Functional Magnetic Resonance Imaging?

  • fMRI is a noninvasive neuroimaging method that measures the blood-oxygen level-dependent (BOLD) signals in the brain.
    • Large-dimensional
    • Low signal-to-noise ratios (SNRs)
    • Sophisticated spatio-temporal dependence of voxels.
Source: Martha Skup (2010)

Source: Martha Skup (2010)

2 / 39

Why Need Complex-valued Models

  • Raw fMRI data are complex-valued (CV) after Fourier transform and inverse Fourier transform image reconstruction.
  • Most fMRI studies use magnitude-only (MO) data and phase information is discarded.
Source: Lindquist (2008) and Adali at. el (2011)

Source: Lindquist (2008) and Adali at. el (2011)

3 / 39

Complex-valued fMRI (CV-fMRI) Data

4 / 39

Why Bayesian Models of CV-fMRI

  • Sophisticated Bayesian spatiotemporal models have been developed for MO-fMRI (Zhang Guindani, et al., 2016).
5 / 39

Why Bayesian Models of CV-fMRI

  • Sophisticated Bayesian spatiotemporal models have been developed for MO-fMRI (Zhang Guindani, et al., 2016).

... the most common software packages for fMRI analysis can result in false-positive rates of up to 70% --- Eklund et al. (PNAS 2016)

5 / 39

Why Bayesian Models of CV-fMRI

  • Sophisticated Bayesian spatiotemporal models have been developed for MO-fMRI (Zhang Guindani, et al., 2016).

... the most common software packages for fMRI analysis can result in false-positive rates of up to 70% --- Eklund et al. (PNAS 2016)

Can we make use of full information of CV-fMRI while keep the advantages of Bayesian nature?

5 / 39

Why Bayesian Models of CV-fMRI

  • Sophisticated Bayesian spatiotemporal models have been developed for MO-fMRI (Zhang Guindani, et al., 2016).

... the most common software packages for fMRI analysis can result in false-positive rates of up to 70% --- Eklund et al. (PNAS 2016)

Can we make use of full information of CV-fMRI while keep the advantages of Bayesian nature?

  • Start with the most common objective: detecting brain activations in task-based fMRI.
Source: https://blog.applysci.com

Source: https://blog.applysci.com

5 / 39

Goal: Computationally Effcient Models for CV-fMRI

Propose

  • computationally efficient models/algorithms that
    • jointly analyze real and imaginary parts of CV-fMRI data
    • better detect activation at the voxel level
    • infer brain connectivity (in preparation)
6 / 39

Goal: Computationally Effcient Models for CV-fMRI

Propose

  • computationally efficient models/algorithms that
    • jointly analyze real and imaginary parts of CV-fMRI data
    • better detect activation at the voxel level
    • infer brain connectivity (in preparation)



  • Complex-valued Expectation-Maximization Variable Selection with Autoregressive Processes (CV-EMVS-AR)
  • (Multi-subject) Bayesian Spatiotemporal Model via Kernel Convolution and Autoregressive Processes (CV-KC-AR)
6 / 39

Background: Rowe-Logan Constant Phase Model

From Rowe and Logan (2004), for time t=1:T, voxel v=1:V and p tasks, ytv=ρtvcos(ϕv)+iρtvsin(ϕv)+ηtv,ρtv=β0v+β1vx1,t++βpvxp,t

7 / 39

Background: Rowe-Logan Constant Phase Model

From Rowe and Logan (2004), for time t=1:T, voxel v=1:V and p tasks, ytv=ρtvcos(ϕv)+iρtvsin(ϕv)+ηtv,ρtv=β0v+β1vx1,t++βpvxp,t

[y1vyTv]=[1x11xp11x1TxpT][β0vcos(ϕv)+iβ0vsin(ϕv)βpvcos(ϕv)γRev+iβpvsin(ϕv)γImv]+[η1vηTv]

  • yv=XγRev+iXγImv+ηv
  • yv=Xγv+ηv, ηvCNT(0,Γv,Cv). ΓvCT×T, CvCT×T
  • ηv is circular normal if Cv=0
7 / 39

Background: Real-valued Representation

yv=Xγv+ηv, ηvCNT(0,Γv,Cv)



or equivalently, yrv=Xrγrv+ηrv, where ηrvN2T(0,Σv)


8 / 39

Background: Real-valued Representation

yv=Xγv+ηv, ηvCNT(0,Γv,Cv)



or equivalently, yrv=Xrγrv+ηrv, where ηrvN2T(0,Σv)



Source: Lindquist (2008)

Source: Lindquist (2008)

8 / 39

Brain Activation as Variable Selection

  • yv=Xγv+ηv
  • γjv0 iff voxel v at task j is activated (Xia, Liang, and Wang (2009); Zhang, Guindani, and Vannucci (2015)).

  • Complex normal spike-and-slab prior γjv(1ψjv)CN(0,ω0,λ0)spike+ψjvCN(0,ω1,λ1)slab, ω0<ω1C,λ0<λ1C.

9 / 39

Brain Activation as Variable Selection

  • Complex normal spike-and-slab prior γjv(1ψjv)CN(0,ω0,λ0)spike+ψjvCN(0,ω1,λ1)slab, ω0<ω1C,λ0<λ1C.
  • Non-active voxel: ψjv=0γjv=0
  • Active voxel: ψjv=1γjv0

  • Activation is inferred by borrowing information across voxels through a Bernoulli prior on ψjv with a common probability of activation for all voxels: ψjvBernoulli(θjv=θj), i.e., Pr(ψjv=1θj)=θj.

10 / 39

CV-EMVS (Yu, Prado, Ombao, and Rowe, 2018)

yv=Xγv+ηv,ηvCNT(0,2σv2I,0),v=1:V,γjvψjvindep(1ψjv)CN1(0,σv2ω0,σv2λ0)+γjvCN1(0,σv2ω1,σv2λ1),ω0<<ω1,λ0<<λ1,j=1:p,ψjvθjIIDBernoulli(θj),θjIIDBeta(aθ,bθ),σv2IG(aσ,bσ)

11 / 39

CV-EMVS (Yu, Prado, Ombao, and Rowe, 2018)

yv=Xγv+ηv,ηvCNT(0,2σv2I,0),v=1:V,γjvψjvindep(1ψjv)CN1(0,σv2ω0,σv2λ0)+γjvCN1(0,σv2ω1,σv2λ1),ω0<<ω1,λ0<<λ1,j=1:p,ψjvθjIIDBernoulli(θj),θjIIDBeta(aθ,bθ),σv2IG(aσ,bσ)

  • We determine if the real and imaginary parts of γjv are zero jointly: γjv0 if Pr(ψjv=1γ,θ,σ,y)>δ.
    • E-step: derive

      Pr(ψjv=1γ(l),θ(l),σ(l),y)

    • M-step: obtain maximum a posteriori

      γ(l+1),θ(l+1),σ(l+1)=maxargEψ|[logπ(γ,ψ,θ,σy)γ(l),θ(l),σ(l),y]

11 / 39

CV-EMVS (Yu, Prado, Ombao, et al., 2018)

yv=Xγv+ηv,ηvCNT(0,2σv2I,0),v=1:V,γjvψjvindep(1ψjv)CN1(0,σv2ω0,σv2λ0)+γjvCN1(0,σv2ω1,σv2λ1),ω0<<ω1,λ0<<λ1,j=1:p,ψjvθjIIDBernoulli(θj),θjIIDBeta(aθ,bθ),σv2IG(aσ,bσ)

  • Under circular normal prior on γjv, i.e., λ0=λ1=0,

    γ^Rev=(XX+2Dv)1(Xyv)Re,γ^Imv=(XX+2Dv)1(Xyv)Im.

12 / 39

Activation and Strength Maps: Low SNR

  • The CV model outperforms other MO models in terms of activation detection and strength.
  • CV: CV-EMVS
  • MO: MO-EMVS
  • ALA: Adaptive Lasso


13 / 39

Human CV-fMRI (Karaman, Bruce, and Rowe, 2015)

  • Unilateral finger tapping data of dimension 96 x 96 x 510.
  • KBR-CV: false positives outside the brain area.
  • KBR-MO: low detecting power.
  • DeTeCT-ING: Nonlinear model using the MR signal equation.
14 / 39

CV-EMVS with AR Noise

ytv=γ1v+γ2vt/T+γ3vxt+ηtv,ηtv=φvηt1v+ζtv,ζtviidCN1(0,2σv2,0),φvUniform(1,1).

15 / 39

General Bayesian spatio-temporal model

  • CV-EMVS-AR much improves detection but does not explicitly model the spatial dependence across voxels.
  • The execution of biological tasks involves populations of neurons spanning across several voxels, rather than a single voxel.
  • Propose Bayesian variable selection tools coupled with a spatial kernel convolution (KC) structure as well as temporal autoregressive processes for CV-fMRI data. (Yu, Prado, Ombao, and Rowe, 2022)
16 / 39

Kernel Convolution: Definition

  • Given a kernel k(z;ϕ),zS and a white noise process w(u),uS, the kernel convolution is S(z)=Sk(zu;ϕ)w(u)du.

  • In practice, for sites u1,...,uD, the process is defined as S(z)=d=1Dk(zud;ϕ)w(ud)

  • The spatial process S(z) governs the spatial dependence of voxels in the image, and affects the probability of voxels being activated.

17 / 39

Kernel Convolution: Properties

  • Dimension reduction: A small number D of parameters w(u1),...,w(uD) governs the entire process S(z) that may have tons of measurements at location z1,...,zV. (V>>D).

  • S=(S(z1),...,S(zV)) is the voxel-level spatial effects

  • w=(w(u1),...,w(uD)) is the D-dimensional latent spatial effect formed by the selected D sites.

18 / 39

Kernel convolution (KC) vs. Gaussian process (GP)

KC

  1. Select D sites to be representative of the image.
  2. Compute the voxel-level spatial effects: S(j)=(Sj1,...,SjV) via kernel convolution using the latent spatial process formed by the selected D sites.

GP

  1. Parcellate the image into G clusters of voxels.
  2. Compute the region-level spatial effects: S(j)=(Sj1,...,SjG) based on a Gaussian process and the "location" of the G regions. (Bezener et al. 2018)
19 / 39

Kernel convolution (KC) vs. Gaussian process (GP)

KC

  1. Select D sites to be representative of the image.
  2. Compute the voxel-level spatial effects: S(j)=(Sj1,...,SjV) via kernel convolution using the latent spatial process formed by the selected D sites.

GP

  1. Parcellate the image into G clusters of voxels.
  2. Compute the region-level spatial effects: S(j)=(Sj1,...,SjG) based on a Gaussian process and the "location" of the G regions. (Bezener et al. 2018)

DG but we use D=G for comparison.

19 / 39

Advantages of KC over GP

KC

  • Circumvents the need for assigning each voxel to some group.

GP

  • Needs to define and compute the location of regions.
  • Shape of region is sensitive to the detecting performance.
20 / 39

Complex-valued Bayesian Spatiotemporal Model

  • Likelihood: With the indicators ψjv such that γjv0 if ψjv=1 and γjv=0 if ψjv=0, yv=X(ψv)γv(ψv)+ηvηvCNT(0,2σv2Λv,0),

where Λv is the AR(1) correlation matrix.

  • Empirical Bayes estimator ρ^v (Λ^v) for AR coefficient for computation efficiency.
21 / 39

Complex-valued Bayesian Spatiotemporal Model

  • Likelihood: With the indicators ψjv such that γjv0 if ψjv=1 and γjv=0 if ψjv=0, yv=X(ψv)γv(ψv)+ηvηvCNT(0,2σv2Λv,0),

where Λv is the AR(1) correlation matrix.

  • Empirical Bayes estimator ρ^v (Λ^v) for AR coefficient for computation efficiency.

  • Complex-valued g-prior on γv: γv(ψv)ψv,σv2indCNp(γ^v(ψv),2Tσv2(X(ψv)Λ^v1X(ψv))1,0)γ^v(ψv)=(X(ψv)Λ^v1X(ψv))1X(ψv)yv

  • Integrate γv out for faster computation.

21 / 39

Spatial Priors

KC

π(ψ(j)|S(j))=v=1Vπ(ψjv|Sjv) ψjv|SjvindBernoulli(11+e(ajd+Sjv)) Sjv=d=1Dk(zvsd;ϕd)wjd

wjdτj2indN(0,τj2) τj2iidIG(aτ,bτ) ϕdiidGa(aϕ,bϕ)

22 / 39

Spatial Priors

KC

π(ψ(j)|S(j))=v=1Vπ(ψjv|Sjv) ψjv|SjvindBernoulli(11+e(ajd+Sjv)) Sjv=d=1Dk(zvsd;ϕd)wjd

wjdτj2indN(0,τj2) τj2iidIG(aτ,bτ) ϕdiidGa(aϕ,bϕ)

GP

π(ψ(j)|S(j))=g=1GvRgπ(ψjv|Sjg) ψjv|SjgindBernoulli(11+e(ajg+Sjg)) S(j)|δj2,rjindN(0,δj2Γj) Γj(i,k)=exp(||sisk||rj) π(δj2)δj2 rjGa(aj,bj)

22 / 39

Choice of Kernels

  • Gaussian kernel k(zvud;ϕ)=exp(zvud22ϕ)
  • Bezier kernel k(svud;ν,ϕ)={(1zvud2ϕ2)ν,zvud<ϕ0, otherwise , where ν is the smooth parameter and ϕ is the range parameter.
23 / 39

Choice of Kernels

  • Gaussian kernel k(zvud;ϕ)=exp(zvud22ϕ)
  • Bezier kernel k(svud;ν,ϕ)={(1zvud2ϕ2)ν,zvud<ϕ0, otherwise , where ν is the smooth parameter and ϕ is the range parameter.
  • The Bezier kernel has a compact support.
  • To capture neighboring spatial dependence
    • this avoids unrealistically relating any two voxels together
    • lets the model learn the neighboring structure of a given voxel by learning ϕ
23 / 39

Bezier Kernel: ν=2; ϕ=2,4,6

  • Here shows you what the Bezier kernel looks like with different value of parameters.
24 / 39

Bezier Kernel: ν=0.5,2,5; ϕ=3

25 / 39

Simulated data with AR coefficient 0.5

  • Estimate Pr(ψjv=1|y) by computing the number of 1s of ψjv in the posterior sample divided by the number of MCMC iterations.

Temporal vs. Non-temporal


Model Sensitivity Specificity Precision Accuracy F1 MCC
CV-KC-AR 0.92 0.99 0.99 0.99 0.95 0.95
CV-GP-AR 0.82 0.99 0.97 0.97 0.89 0.88
CV-EMVS-AR 0.96 0.99 0.93 0.98 0.94 0.93
CV-KC 1 0.79 0.47 0.82 0.63 0.61
26 / 39

Simulated data with AR coefficient 0.5

  • Estimate Pr(ψjv=1|y) by computing the number of 1s of ψjv in the posterior sample divided by the number of MCMC iterations.

Temporal vs. Non-temporal


27 / 39

Simulated data with AR coefficient 0

Spatial vs. Non-spatial

Model Sensitivity Specificity Precision Accuracy F1 MCC
CV-KC 0.79 0.99 0.99 0.97 0.87 0.86
CV-GP 0.58 0.99 0.99 0.93 0.73 0.72
CV-EMVS 0.66 0.99 0.92 0.94 0.77 0.75


28 / 39

KC

  • KC produces a finer and reasonable latent spatial effect map

ψv|SvindBernoulli(11+eSv)

Sv=d=1Dk(zvsd;ϕ)wd

GP

  • GP forces voxels in the same region share the same effect

ψjv|SjgindBernoulli(11+eSjg)

29 / 39

KC is Less Sensitive to Dimension Reduction


It matters because ...

30 / 39

Computing Time

  • Average computing time over 10 times of running the MCMC algorithms.
  • The time unit is seconds per 1000 MCMC iterations
Model 16 25 100
CV-KC 0.51 0.59 1.48
CV-GP 0.30 0.41 3.36
  • The example shows the KC model can use just about 15% to 20% of the computing time of the GP model per 1000 MCMC iterations to reach the similar detecting activation performance
31 / 39

CV vs. MO

  • The CV model is better than the MO model especially when signals are noisy.
  • The MO models improve more when an explicit spatial prior is included.

32 / 39

MO-GP Has a Region Probability Inflation

If a spatial region contains true activated voxels, other nonactivated voxels in the region will also have relatively high probability of activation increases false positives.

CV

MO

33 / 39

Spatial Models Encourage Activation in Clusters

34 / 39

Zoom In

35 / 39

36 / 39

Multi-resolution

  • D coarse-resolution sites (white dots) and H fine-resolution sites (green dots)

Sv=d=1Dkc(zvsd;ϕc)wd+h=1Hkf(zvsd;ϕf)bh

37 / 39

Take-home Message

  • Complex-valued modeling improves activation.

  • CV-EMVS and CV-KC-AR are computationally efficient.

  • Spatial models encourage activation in clusters.

  • The MO models need a sophisticated spatial structure to reach good performance as CV models.

  • The KC models are flexible and outperform the baseline GP models.

    • Any valid kernel can be used.
    • Avoid dealing with problems of shape of regions.
    • How a voxel is influenced by its neighboring voxels is model-based.
    • Less affected by dimension reduction, leading to fast computation.
38 / 39

References

[1] M. Karaman, I. P. Bruce, and D. B. Rowe. "Incorporating relaxivities to more accurately reconstruct MR images". In: Magnetic Resonance Imaging 33 (2015), pp. 374-384.

[2] D. B. Rowe and B. R. Logan. "A complex way to compute fMRI activation". In: NeuroImage 23 (2004), pp. 1078-1092.

[3] J. Xia, F. Liang, and Y. Wang. "FMRI analysis through Bayesian variable selection with a spatial prior". In: Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. Boston, MA, USA, 2009, pp. 714-717.

[4] C. Yu, R. Prado, H. Ombao, et al. "A Bayesian variable selection approach yields improved detection of brain activation from complex-valued fMRI". In: Journal of American Statistical Association: Applicaiton and Case Studies 113 (2018), pp. 1395-1410.

[5] C. Yu, R. Prado, H. Ombao, et al. "Bayesian spatiotemporal modeling on complex-valued fMRI signals via kernel convolutions". In: Biometrics (2022), pp. 1-13.

[6] L. Zhang, M. Guindani, and M. Vannucci. "Bayesian models for fMRI data analysis". In: WIREs Computational Statistics 7 (2015), pp. 21-41.

[7] L. Zhang, M. Guindani, F. Versace, et al. "A spatio-temporal non-parametric Bayesian model of multi-subject fMRI data". In: Annals of Applied Statistics (2016).

39 / 39

(Task-based) Functional Magnetic Resonance Imaging?

  • fMRI is a noninvasive neuroimaging method that measures the blood-oxygen level-dependent (BOLD) signals in the brain.
    • Large-dimensional
    • Low signal-to-noise ratios (SNRs)
    • Sophisticated spatio-temporal dependence of voxels.
Source: Martha Skup (2010)

Source: Martha Skup (2010)

2 / 39
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow