Scale Mixtures of Complex Gaussians and Bayesian Shrinkage

2025 Joint Statistical Meetings

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

2025-08-04

Complex-valued Data

  • Neuroimaging
    • (f)MRI
    • Frequency/spectral domain EEG

  • Electrical Engineering
    • Signal Processing
    • Communications

Complex-valued Data

  • Geosciences
    • Synthetic Aperture Radar
    • Seismic Imaging

  • Physics
    • Quantum Computing

and more!

Complex-valued fMRI (CV-fMRI)

CV-fMRI Data

  • Unilateral finger tapping CV-fMRI slice data of dimension 96 x 96 x 510. (Karaman, Bruce, and Rowe, 2015)
str(fmriC)
 cplx [1:96, 1:96, 1:510] -0.0226-0.0025i 0.0505-0.0475i 0.0722-0.0439i ...
  • Goal: Detect voxel activation, and estimate activation strength.
  • [M-Cplx] for CV-fMRI data vs. [M-Mag] for Magnitude-only (MO) data, i.e., y_{mo} = \sqrt{y_{re}^2 + y_{im}^2}

Models for Complex-valued Data

  • Modeling complex-valued data is not that common in statistics?
  • Using the entire complex-valued (CV) data improves statistical power, and inference and prediction performance than using a subset of the data, such as magnitude-only (MO) data or real-valued data, either real or imaginary part of the signals.
  • Complex-valued Linear Regression
    • Rowe & Logan (2004, 2005), Rowe (2005a, 2005b), Rowe (2009), Kociuba and Rowe (2016), Karaman et al. (2016), etc.
  • Complex-valued Gaussian Processes & Kernel Methods
    • Tobar & Turner (2014), Berg et al. (2015), Berg et al. (2015), Ambrogioni & Maris (2016), Devonport et al. (2023), etc.
  • Complex-Valued Deep Neural Networks
    • Dramsch et al. (2019), Singhal et al. (2021), Abdalla (2023), ChiYan Lee et al. (2025), etc.
  • There are not many Bayesian models for complex-valued data, and this work creates one with shrinkage priors built from scale mixture of complex Gaussian distributions.

Complex Gaussian Distribution

  • \mathbf{Z}\in \mathcal{C}^n \sim \text{CN}_n\left(\mathbf{0}, \boldsymbol \Omega, \boldsymbol \Lambda\right)
    • \boldsymbol \Omega\in \mathcal{C}^{n \times n} is the covariance matrix being positive definite and Hermitian, i.e., \boldsymbol \Omega= (\boldsymbol \Omega')^*.
    • \boldsymbol \Lambda\in \mathcal{C}^{n \times n} is the relation matrix being symmetric.
  • The pdf is f(\bm{z}) = \left(\pi\right)^{-n}[|\boldsymbol \Omega||\mathbf{P}|]^{-1/2} \exp \left\{ -q(\bm{z})/2\right\} where q(\bm{z}) = 2\left[\bm{z}^H\mathbf{P}^{-*}\bm{z} - \mathrm{Re}\left(\bm{z}'\mathbf{R}' \mathbf{P}^{-*} \bm{z}\right) \right], \mathbf{P}:= \boldsymbol \Omega^* - \boldsymbol \Lambda^H\boldsymbol \Omega^{-1}\boldsymbol \Lambda is Hermitian and positive definite, \mathbf{P}^{-*} means \left( \mathbf{P}^{-1} \right)^* and \mathbf{R} = \boldsymbol \Lambda^H\boldsymbol \Omega^{-1}.
  • When \boldsymbol \Lambda= \mathbf{0}, the 2nd-order statistics of \mathbf{Z} and its rotated variable e^{i\alpha}\mathbf{Z} are identical for any rotation \alpha \in [-\pi, \pi), \mathbf{Z} is called circular or circularly symmetric, and proper if the variance is finite.

Complex Gaussian Distribution

  • Circular

  • Non-Circular

Scale Mixture of Complex Gaussians

  • Suppose \mathbf{Z}\sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and the scale parameter \tau has the density h(\tau).

  • The scale mixture of complex Gaussians of \mathbf{Z} has its marginal density f(\hbox{\bf z}) = \int_{0}^{\infty} CN_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) h(\tau) \, d\tau.

  • Complex multivariate normal-gamma \mathbf{Z}\mid \tau^2 \sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and \tau^2 \sim \text{Ga}(\alpha, \beta)

    • Complex multivariate Laplace with \tau^2 \sim \text{Ga}(1, 1) = \text{Exp}(1)

    • Complex group Lasso with \tau^2 \sim \text{Ga}\left(\frac{1+2p}{2}, \frac{\lambda^2}{4}\right)

    • Group Lasso (Xu and Ghosh (2015), Bai and Ghosh (2021)) with \tau^2 \sim \text{Ga}\left(\frac{1+2p}{2}, \frac{\lambda^2}{4}\right), \boldsymbol \Omega= \mathbf{I} and \boldsymbol \Lambda= \mathbf{0}

  • Complex multivariate generalized double Pareto (GDP) \mathbf{Z}\mid \tau^2 \sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and \tau^2 \sim \text{Ga}\left(\frac{1}{2}+p, \frac{\lambda^2}{4}\right) and \lambda \sim \text{Ga}\left(\alpha, \eta\right)

    • GDP(\eta/\alpha, \alpha) (Armagan et al., 2013) when p=1, \boldsymbol \Omega= \mathbf{I} and \boldsymbol \Lambda= \mathbf{0}.

Complex t, Laplace, GDP - Circular

  • The three have heavier tails.

  • Laplace has the most pronounced peak and decays at the fastest rate.

  • GDP has the fattest tails.

Complex t, Laplace, GDP - Non-Circular

Shrinkage in Complex Bayesian Linear Regression

  • Complex Bayesian (group) Lasso Regression: \mathbf{y}\in \mathcal{C}^n, \boldsymbol \beta\in \mathcal{C}^p,

\begin{align*} \mathbf{y}&= \mathbf{X}\boldsymbol \beta+ \boldsymbol{\epsilon}, ~~ \boldsymbol{\epsilon}\sim \text{CN}(\mathbf{0}, 2\sigma^2\mathbf{I}_n, 2\sigma^2\rho\mathbf{I}_n),\\ \boldsymbol \beta&\sim \text{CN}(0, 2\sigma^2D_{\tau}, 0), ~~ \tau_j^2 \sim \text{Ga}\left(\frac{1 + 2}{2}, \frac{\lambda^2}{2}\right), ~~ \lambda^2 \sim \text{Ga} (r, \delta), ~~ \sigma^2 \sim \text{IG}(a, b), \end{align*} where D_{\tau} = \text{diag}(\tau_1^2, \dots, \tau_p^2).

  • \boldsymbol \beta, \tau^2, \lambda^2, and \sigma^2 are Gibbsable.
  • \rho is sampled via a Metropolis-Hastings step embedded in the MCMC algorithm.

Simulation

  • Examine performance on (1) variable selection, (2) parameter learning and (3) predictive accuracy.
  • M-Cplx: \boldsymbol{\epsilon}\sim CN(\mathbf{0}, 2\sigma^2\mathbf{I}_n, 2\sigma^2\rho\mathbf{I}_n) and \beta_j \sim CN(0, 2\sigma^2\tau_j^2, 0)

  • M-Re-Im: \boldsymbol{\epsilon}_{a} \sim N(\mathbf{0}, \sigma^2\mathbf{I}_n), \beta_{a, j} \sim N(0, \sigma^2\tau_j^2), a = re, im.

  • With p coefficients, the first three are non-zero and the rest are zero: \boldsymbol \beta_{a} = (3, 1.5, 2, 0, \dots, 0), a = re, im.

  • p = 10, 50, 200

  • n = 40, 200, 1000

  • Simulate 100 data replicates.

Variable Selection

  • Check whether credible interval for \boldsymbol \beta includes zero or not, then use F-score to evaluate performance.

  • M-Cplx is more consistent and robust than M-Re-Im across different n and p.

  • Less variation as n increases


Cauchy

Bayesian Lasso

GDP

Parameter Learning (\beta)

  • Use mean squared error (MSE) to measure the inference performance on coefficients.

  • M-Cplx performs better M-Re-Im in terms of MSE.

  • For non-zero \betas, M-Re-Im distributions are more right-skewed with larger variation.

  • The estimation gap between M-Cplx and M-Re-Im shrinks as n get large.

Cauchy

Bayesian Lasso

Predictive Accuracy (\hbox{\bf y})

  • Use mean squared prediction error (MSPE) to assess posterior predictive accuracy for 100 test data sets.

  • The median MSPE is obtained from the 100 data sets, and quantify uncertainty by 1000 bootstrapped samples

  • M-Cplx has better out-of-sample prediction than M-Re-Im in terms of MSPE.

  • MSPE gets large with as p increases.

Cauchy

Bayesian Lasso

CV-fMRI Data Analysis

  • Activation is viewed as variable selection and done by inclusion of zero of credible intervals.

  • The strength of M-Cplx is measured by \sqrt{\beta_{re}^2 + \beta_{im}^2} \in (0, \infty) and that of M-Mag is the \beta from the real-valued model.

CV-GDP

CV-Bayesian Lasso

MO-GDP

MO-Bayesian Lasso

  • Given the same credible level, M-Cplx tends to generate more postives than M-Mag.

Activation and Strength Maps

CV-GDP

CV-Bayesian Lasso

MO-GDP

MO-Bayesian Lasso

  • Either complex or real-valued, GDP shrinks coefficients more when their intensity is small.

  • Strength by MO models is weaker.

  • The NON-spatial NON-temporal CV models could perform as good as sophisticated spatiotemporal real-valued MO models (Yu et al., 2018, 2023).

Selection of Best Activation

  • We are working on selecting the best activation via information criteria such as WAIC and DIC.

  • One credible level cannot serve for all data and prior types.

  • Using Gaussian likelihood for M-Mag in WAIC/DIC is distorted since magnitude is positive and closer to Rician distribution.

Conclusion

  • Extend the scale mixture of Gaussians from the real-valued domain into the complex-valued domain, deriving the general form of scale mixtures.

  • Demonstrate how the complex-valued scale mixtures can be used as a shrinkage prior in Bayesian regression.

    • Cauchy, Laplace, GDP
    • Applying complex-valued shrinkage leads to better variable selection, regression coefficient estimation, and posterior predictive accuracy
  • Contribute to CV-fMRI activation studies.

  • Developing R package cplxrv (https://github.com/chenghanyustats/cplxrv) for simulating complex-valued random variables, and fitting complex Bayesian shrinkage regression.

  • Future work include complex-valued horseshoes and global-local shrinkage priors, and spatiotemporal modeling.