Scale Mixtures of Complex Gaussians and Bayesian Shrinkage

2025 Joint Statistical Meetings

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

2025-08-04

Complex-valued Data

Neuroimaging
- (f)MRI
- Frequency/spectral domain EEG

Electrical Engineering
- Signal Processing
- Communications

Complex-valued Data

Geosciences
- Synthetic Aperture Radar
- Seismic Imaging

Physics
- Quantum Computing

and more!

Complex-valued fMRI (CV-fMRI)

CV-fMRI Data

Unilateral finger tapping CV-fMRI slice data of dimension 96 x 96 x 510. (Karaman, Bruce, and Rowe, 2015)

str(fmriC)
 cplx [1:96, 1:96, 1:510] -0.0226-0.0025i 0.0505-0.0475i 0.0722-0.0439i ...

Goal: Detect voxel activation, and estimate activation strength.

[M-Cplx] for CV-fMRI data vs. [M-Mag] for Magnitude-only (MO) data, i.e., y_{mo} = \sqrt{y_{re}^2 + y_{im}^2}

Models for Complex-valued Data

Modeling complex-valued data is not that common in statistics?
Using the entire complex-valued (CV) data improves statistical power, and inference and prediction performance than using a subset of the data, such as magnitude-only (MO) data or real-valued data, either real or imaginary part of the signals.

Complex-valued Linear Regression
- Rowe & Logan (2004, 2005), Rowe (2005a, 2005b), Rowe (2009), Kociuba and Rowe (2016), Karaman et al. (2016), etc.

Complex-valued Gaussian Processes & Kernel Methods
- Tobar & Turner (2014), Berg et al. (2015), Berg et al. (2015), Ambrogioni & Maris (2016), Devonport et al. (2023), etc.

Complex-Valued Deep Neural Networks
- Dramsch et al. (2019), Singhal et al. (2021), Abdalla (2023), ChiYan Lee et al. (2025), etc.

There are not many Bayesian models for complex-valued data, and this work creates one with shrinkage priors built from scale mixture of complex Gaussian distributions.

Complex Gaussian Distribution

\mathbf{Z}\in \mathcal{C}^n \sim \text{CN}_n\left(\mathbf{0}, \boldsymbol \Omega, \boldsymbol \Lambda\right)
- \boldsymbol \Omega\in \mathcal{C}^{n \times n} is the covariance matrix being positive definite and Hermitian, i.e., \boldsymbol \Omega= (\boldsymbol \Omega')^*.
- \boldsymbol \Lambda\in \mathcal{C}^{n \times n} is the relation matrix being symmetric.
The pdf is f(\bm{z}) = \left(\pi\right)^{-n}[|\boldsymbol \Omega||\mathbf{P}|]^{-1/2} \exp \left\{ -q(\bm{z})/2\right\} where q(\bm{z}) = 2\left[\bm{z}^H\mathbf{P}^{-*}\bm{z} - \mathrm{Re}\left(\bm{z}'\mathbf{R}' \mathbf{P}^{-*} \bm{z}\right) \right], \mathbf{P}:= \boldsymbol \Omega^* - \boldsymbol \Lambda^H\boldsymbol \Omega^{-1}\boldsymbol \Lambda is Hermitian and positive definite, \mathbf{P}^{-*} means \left( \mathbf{P}^{-1} \right)^* and \mathbf{R} = \boldsymbol \Lambda^H\boldsymbol \Omega^{-1}.

When \boldsymbol \Lambda= \mathbf{0}, the 2nd-order statistics of \mathbf{Z} and its rotated variable e^{i\alpha}\mathbf{Z} are identical for any rotation \alpha \in [-\pi, \pi), \mathbf{Z} is called circular or circularly symmetric, and proper if the variance is finite.

Complex Gaussian Distribution

Circular

Non-Circular

Scale Mixture of Complex Gaussians

Suppose \mathbf{Z}\sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and the scale parameter \tau has the density h(\tau).
The scale mixture of complex Gaussians of \mathbf{Z} has its marginal density f(\hbox{\bf z}) = \int_{0}^{\infty} CN_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) h(\tau) \, d\tau.

Complex multivariate normal-gamma \mathbf{Z}\mid \tau^2 \sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and \tau^2 \sim \text{Ga}(\alpha, \beta)
- Complex multivariate Laplace with \tau^2 \sim \text{Ga}(1, 1) = \text{Exp}(1)
- Complex group Lasso with \tau^2 \sim \text{Ga}\left(\frac{1+2p}{2}, \frac{\lambda^2}{4}\right)
- Group Lasso (Xu and Ghosh (2015), Bai and Ghosh (2021)) with \tau^2 \sim \text{Ga}\left(\frac{1+2p}{2}, \frac{\lambda^2}{4}\right), \boldsymbol \Omega= \mathbf{I} and \boldsymbol \Lambda= \mathbf{0}

Complex multivariate generalized double Pareto (GDP) \mathbf{Z}\mid \tau^2 \sim \text{CN}_p\left(\mathbf{0}, \tau^2\boldsymbol \Omega, \tau^2\boldsymbol \Lambda\right) and \tau^2 \sim \text{Ga}\left(\frac{1}{2}+p, \frac{\lambda^2}{4}\right) and \lambda \sim \text{Ga}\left(\alpha, \eta\right)
- GDP(\eta/\alpha, \alpha) (Armagan et al., 2013) when p=1, \boldsymbol \Omega= \mathbf{I} and \boldsymbol \Lambda= \mathbf{0}.

Complex t, Laplace, GDP - Circular

The three have heavier tails.
Laplace has the most pronounced peak and decays at the fastest rate.
GDP has the fattest tails.

Complex t, Laplace, GDP - Non-Circular

Shrinkage in Complex Bayesian Linear Regression

Complex Bayesian (group) Lasso Regression: \mathbf{y}\in \mathcal{C}^n, \boldsymbol \beta\in \mathcal{C}^p,

\begin{align*} \mathbf{y}&= \mathbf{X}\boldsymbol \beta+ \boldsymbol{\epsilon}, ~~ \boldsymbol{\epsilon}\sim \text{CN}(\mathbf{0}, 2\sigma^2\mathbf{I}_n, 2\sigma^2\rho\mathbf{I}_n),\\ \boldsymbol \beta&\sim \text{CN}(0, 2\sigma^2D_{\tau}, 0), ~~ \tau_j^2 \sim \text{Ga}\left(\frac{1 + 2}{2}, \frac{\lambda^2}{2}\right), ~~ \lambda^2 \sim \text{Ga} (r, \delta), ~~ \sigma^2 \sim \text{IG}(a, b), \end{align*} where D_{\tau} = \text{diag}(\tau_1^2, \dots, \tau_p^2).

\boldsymbol \beta, \tau^2, \lambda^2, and \sigma^2 are Gibbsable.
\rho is sampled via a Metropolis-Hastings step embedded in the MCMC algorithm.

Simulation

Examine performance on (1) variable selection, (2) parameter learning and (3) predictive accuracy.

M-Cplx: \boldsymbol{\epsilon}\sim CN(\mathbf{0}, 2\sigma^2\mathbf{I}_n, 2\sigma^2\rho\mathbf{I}_n) and \beta_j \sim CN(0, 2\sigma^2\tau_j^2, 0)
M-Re-Im: \boldsymbol{\epsilon}_{a} \sim N(\mathbf{0}, \sigma^2\mathbf{I}_n), \beta_{a, j} \sim N(0, \sigma^2\tau_j^2), a = re, im.

With p coefficients, the first three are non-zero and the rest are zero: \boldsymbol \beta_{a} = (3, 1.5, 2, 0, \dots, 0), a = re, im.
p = 10, 50, 200
n = 40, 200, 1000
Simulate 100 data replicates.

Variable Selection

Check whether credible interval for \boldsymbol \beta includes zero or not, then use F-score to evaluate performance.
M-Cplx is more consistent and robust than M-Re-Im across different n and p.
Less variation as n increases

Cauchy

Bayesian Lasso

GDP

Parameter Learning (\beta)

Use mean squared error (MSE) to measure the inference performance on coefficients.
M-Cplx performs better M-Re-Im in terms of MSE.
For non-zero \betas, M-Re-Im distributions are more right-skewed with larger variation.
The estimation gap between M-Cplx and M-Re-Im shrinks as n get large.

Cauchy

Bayesian Lasso

Predictive Accuracy (\hbox{\bf y})

Use mean squared prediction error (MSPE) to assess posterior predictive accuracy for 100 test data sets.
The median MSPE is obtained from the 100 data sets, and quantify uncertainty by 1000 bootstrapped samples
M-Cplx has better out-of-sample prediction than M-Re-Im in terms of MSPE.
MSPE gets large with as p increases.

Cauchy

Bayesian Lasso

CV-fMRI Data Analysis

Activation is viewed as variable selection and done by inclusion of zero of credible intervals.
The strength of M-Cplx is measured by \sqrt{\beta_{re}^2 + \beta_{im}^2} \in (0, \infty) and that of M-Mag is the \beta from the real-valued model.

CV-GDP

CV-Bayesian Lasso

MO-GDP

MO-Bayesian Lasso

Given the same credible level, M-Cplx tends to generate more postives than M-Mag.

Activation and Strength Maps

CV-GDP

CV-Bayesian Lasso

MO-GDP

MO-Bayesian Lasso

Either complex or real-valued, GDP shrinks coefficients more when their intensity is small.
Strength by MO models is weaker.
The NON-spatial NON-temporal CV models could perform as good as sophisticated spatiotemporal real-valued MO models (Yu et al., 2018, 2023).

Selection of Best Activation

We are working on selecting the best activation via information criteria such as WAIC and DIC.
One credible level cannot serve for all data and prior types.
Using Gaussian likelihood for M-Mag in WAIC/DIC is distorted since magnitude is positive and closer to Rician distribution.

Conclusion

Extend the scale mixture of Gaussians from the real-valued domain into the complex-valued domain, deriving the general form of scale mixtures.
Demonstrate how the complex-valued scale mixtures can be used as a shrinkage prior in Bayesian regression.
- Cauchy, Laplace, GDP
- Applying complex-valued shrinkage leads to better variable selection, regression coefficient estimation, and posterior predictive accuracy
Contribute to CV-fMRI activation studies.
Developing R package cplxrv (https://github.com/chenghanyustats/cplxrv) for simulating complex-valued random variables, and fitting complex Bayesian shrinkage regression.
Future work include complex-valued horseshoes and global-local shrinkage priors, and spatiotemporal modeling.