When Deep? When Bayesian? Learn from COVID X-rays

2025 Statistical Methods in Imaging Conference

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

2025-05-21

Image Data for Benchmarking

  • MNIST

  • CIFAR-10/CIFAR-100

  • ImageNet

When people do image classification research, they tend to use the famous data sets such as MNIST, CIFAR, and ImageNet as benchmark.

What Lacks in Current Image Classification Studies

  • High accuracy makes it difficult to differentiate model performance.

  • Benchmark data for high-risk fields such as medical imaging needs adequate uncertainty quantification.

  • However, we feel that those data are mostly basic objects, and the accuracy rate has been pretty high, making it difficult to differentiate model performance.
  • When uncertainty quantification is the main focus, we got to use data in high-risk fields such as medical imaging that really care about UQ.

Data: COVID-19 Radiography Database from Kaggle

  • X-ray images of 4 lung conditions: Normal, COVID, lung opacity (LO), and viral pneumonia (VP)

  • Each of size 299 by 299 pixels having 10192, 3616, 6012, and 1345 images respectively.

Normal

COVID

Lung Opacity

Viral Pneumonia
  • Use open-source COVID database from Kaggle

What Lacks in Current Image Classification Studies

Classical deep learning leads to

Overfitting

Source: Julian (2019)

Oveconfidence/Inadequate Calibration/Uncertainty

Source: Guo et al. (2017)

How much BNN can improve prediction performance and uncertainty quantification quality.

Bayesian Neural Networks (BNNs) Come Into Play

  • Several methods were proposed to overcome such issues (regularization, dropout, early stopping, temperature scaling, data augmentation, etc)
  • Bayesian approaches come into play because

    • incorporate regularization

    • systematic and unified framework for quantifying uncertainty in predictions via Bayesian Model Averaging (BMA)

    • BMA can be considered an ensembling method (Jospin et al. 2022; Wilson and Izmailov 2020) that has been shown to improve predictive performance

Research Questions

Bayesian deep learning is an emerging field, making it worthwhile to survey existing methods and compare and contrast their differences, strengths, and weaknesses.

👉 How do BNN algorithms compare to classical deep neural networks (DNNs) and machine learning (ML) algorithms?


👉 Which methods perform best in terms of accuracy and uncertainty quality?


👉 When is it better to use which methods?

  • Using lung X-ray images, under convolutional neural network (CNN), we implement 14 BNN algorithms, the classical stochastic gradient descent (SGD), and 8 ML algorithms to compare
    • Predictive accuracy on binary and multiclass classification
    • Uncertainty quantification quality on in-sample and out-of-distribution (OOD) data

Models and Learning Algorithms

Method Full Method Name Method Full Method Name
SGLD Stochastic gradient Langevin dynamics SWAG Stochastic weight averaging Gaussian
pSGLD Preconditioned SGLD MSWAG MultiSWAG
SGHMC Stochastic gradient Hamiltonian Monte Carlo SGD Stochastic gradient descent
BBB Bayes by backpropagation GB Gradient boosting
MCD Monte Carlo dropout RF Random forests
LIVI Linearized implicit variational inference GP Gaussian process classification
KFL Kronecker factored Laplace SVM Support vector machine
LL Low-rank Laplace KNN K-nearest neighbors
DL Diagonal Laplace LR Logistic regression
SL Subnetwork Laplace DT Decision tree classifier
DE Deep ensembles NB Naive Bayes

Neural Network Architecture1

  • AlexNet (2012) (Krizhevsky, Sutskever, and Hinton 2017) for its historical importance, simplicity, and being used as a benchmark.

  • It still outperforms some modern architectures such as VGG16 and ResNet-50 on lung image data (Jaffar, Khan, and Mosavi 2022).

  1. Performance of ResNet is discussed in Yu and Wang (2025).

Bayesian Model Averaging (BMA)

With prior distribution on weights π(θ)\pi\left(\boldsymbol{\theta}\right)π(θ) and training data, D\hbox{\bf D}D,

p(y∗∣x∗,D,BNN)=∫Θp(y∗∣x∗,θ′,BNN)π(θ′∣D,BNN) dθ′ p\left(\hbox{\bf y}^* \mid \hbox{\bf x}^*, \hbox{\bf D}, \mathcal{BNN} \right) = \int_{\Theta} p(\hbox{\bf y}^* \mid \hbox{\bf x}^*, \boldsymbol{\theta}', \mathcal{BNN})\pi(\boldsymbol{\theta}' \mid \hbox{\bf D}, \mathcal{BNN}) \, d\boldsymbol{\theta}' p(y∗∣x∗,D,BNN)=∫Θ​p(y∗∣x∗,θ′,BNN)π(θ′∣D,BNN)dθ′

Source: Gawlikowski et al. (2021)
  • [Pros] Aggregate different sets of weights that produce various high-performing networks that interpret data differently yet with equal accuracy.

  • BMA assumes there is exactly one true model described by only one set of parameters that generates the data. (Minka 2002; Monteith et al. 2011)

  • [Cons] When data are generated from a combination of models, overfitting remains even BMA is performed. (Domingos 2000)

Bayesian Learning: Stochastic Gradient MCMC

Stochastic Gradient MCMC (SG-MCMC) is a marriage of SGD and MCMC that uses minibatch for feasible computation.

SGLD (Welling and Teh 2011) pSGLD (Li et al. 2016) SGHMC (Chen, Fox, and Guestrin 2014)
SGD + Langevin Dynamics SGLD + preconditioner SGD + modified Hamiltonian Dynamics
suffer from pathological curvature the curvature is similar in all directions noise injected by SG breaks the dynamics
mode collapse moves faster away from ill-conditioned areas add a friction to the momentum update in HMC
  • SGD + Langevin Dynamics just adds Gaussian noise
  • Learning rate ϵt\epsilon_tϵt​ decreases toward zero, resulting in high auto-correlated samples.

  • In practice, ϵt\epsilon_tϵt​ is fixed at a small value for later steps.

    • MH steps could be negligible, but posterior samples will no longer be exact.
  • In practice, posterior size is small, leading to more varying and biased results.

  • Computationally intensive and memory hungry.

Bayesian Learning: Variational Inference (VI)

VI methods embrace the stochasticity introduced by SGD, leading to stochastic variational inference (SVI): Bayes by backpropagation (BBB), Monte Carlo dropout (MCD), and Linearized implicit variational inference (LIVI).

BBB (Blundell et al. 2015) MCD (Gal and Ghahramani 2016) LIVI (Uppal et al. 2023)
maximize approximated ELBO by θ(s)∼q(θ∣η)\boldsymbol{\theta}^{(s)} \sim q(\boldsymbol{\theta}\mid \boldsymbol{\eta})θ(s)∼q(θ∣η) q(W)q\left(\hbox{\bf W}\right)q(W) is such that Wl=Ml⋅diag([zlj]j=1Kl−1),  zlj∼Bernoulli(pl)\hbox{\bf W}_l = \hbox{\bf M}_l \cdot \text{diag}\left( \left[ z_{lj}\right]_{j=1}^{K_{l-1}} \right), ~~ z_{lj} \sim \text{Bernoulli}\left( p_l\right)Wl​=Ml​⋅diag([zlj​]j=1Kl−1​​),  zlj​∼Bernoulli(pl​) q(θ∣η)=∫q(θ∣η,z)q(z) dzq\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}\right) = \int q\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}, \hbox{\bf z}\right) q(\hbox{\bf z}) \, d\hbox{\bf z}q(θ∣η)=∫q(θ∣η,z)q(z)dz with q(θ∣η,z)=N(gη(z),σ2I)q\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}, \hbox{\bf z}\right) = N\left( g_{\boldsymbol{\eta}}(\mathbf{z}), \sigma^2\hbox{\bf I}\right)q(θ∣η,z)=N(gη​(z),σ2I) and q(z)=N(0,I)q(\hbox{\bf z}) = N(\mathbf{0}, \hbox{\bf I})q(z)=N(0,I)
reparameterization trick to ensure backpropagation equiv. to approx. deep Gaussian process (GP) (Damianou and Lawrence 2013) specify the correlation of parameters
the number of parameters is doubled adding normal prior on W\hbox{\bf W}W equiv. to L2L_2L2​ regularization of dropout NN or GP approximate ELBO by computable LIVI bound
increased computational costs high memory usage
convergence issues small sample size lead to variability in predictive accuracy and underconfident predictions
mode collapse
  • BBB needs sampling, usually 20 per minibatch
  • In LIVI the variational distribution is implicitly defined through another generating process. gη(z)g_{\boldsymbol{\eta}}(\mathbf{z})gη​(z) is linearized

Bayesian Learning: Laplace Approximation

Gaussian approximation includes Laplace methods and Stochastic Weight Averaging Gaussian.

Laplace Approximation π(θ∣D)≈N(θ∗,H(θ∗)−1)\pi(\boldsymbol{\theta}\mid \hbox{\bf D}) \approx N\left(\boldsymbol{\theta}^{*}, H(\boldsymbol{\theta}^{*})^{-1} \right)π(θ∣D)≈N(θ∗,H(θ∗)−1), where H(θ∗)−1H(\boldsymbol{\theta}^{*})^{-1}H(θ∗)−1 is the Hessian matrix of log⁡π(θ∣D)\log \pi(\boldsymbol{\theta}\mid \hbox{\bf D})logπ(θ∣D) evaluated at the MAP estimate θ∗\boldsymbol{\theta}^{*}θ∗

  • Infeasible to learn the entire HHH.

  • Hessian approximation methods:

    • Diagonal Laplace (DL)
    • Kronecker factored approximate curvature (KFAC, KFL)
    • Subnetwork Laplace (SL)
    • Low-rank Laplace (LL)

Bayesian Learning: Laplace Approximation

DL (Farquhar, Smith, and Gal 2020) KFL (Ritter, Botev, and Barber 2018) SL (Daxberger et al. 2022)
HHH is diagonal approximate HHH by a block-diagonal matrix treats a subset of weights probabilistically
naive for CNN Hl≈Vl⊗UlH_l \approx \mathcal{V}_{l} \otimes \mathcal{U}_{l}Hl​≈Vl​⊗Ul​ the remaining parameters at their MAP
captures essential second-order curvature infers a full-covariance Gaussian posterior over a subnetwork
  • LL (Daxberger et al. 2021) approximates HHH by its first kkk eigenvectors corresponding to the largest kkk eigenvalues.

diagonal elements equal to the diagonal of FFF π(Wl∣D)∼MN(WMAP,Vl−1,Ul−1)\pi\left(\hbox{\bf W}_l \mid \hbox{\bf D}\right) \sim MN\left(W^{MAP}, \mathcal{V}_{l}^{-1}, \mathcal{U}_{l}^{-1}\right)π(Wl​∣D)∼MN(WMAP,Vl−1​,Ul−1​)

Bayesian Learning: Stochastic Weight Averaging

  • Stochastic Weight Averaging (SWA) (Izmailov et al. 2018) takes average of the weights traversed by SGD with a cyclical or constant learning rate.

Source: (Izmailov et al. 2018)

Bayesian Learning: Stochastic Weight Averaging Gaussian

Source: (Maddox et al. 2019)
  • SWAG approximates the posterior by a Gaussian distribution N(θSWA,12(Σdiag+Σlow-rank))N\left( \boldsymbol{\theta}_{SWA}, \frac{1}{2} \left( \Sigma_{\text{diag}} + \Sigma_{\text{low-rank}}\right) \right)N(θSWA​,21​(Σdiag​+Σlow-rank​))

Bayesian Learning?! Ensembling

Ensembling has been interpreted as an approximation to BMA (Wilson and Izmailov 2020; Jospin et al. 2022)

DE (Lakshminarayanan, Pritzel, and Blundell 2017) MultiSWAG (Wilson and Izmailov 2020)
θ1,…,θM\boldsymbol{\theta}_1, \dots, \boldsymbol{\theta}_Mθ1​,…,θM​ learned independently from MMM models (non-Bayesian MultiSGD) Gaussian mixture approximation to the posterior
p(y∣x)=M−1∑m=1Mp(y∣x,θm)p(\hbox{\bf y}\mid \hbox{\bf x}) = M^{-1}\sum_{m=1}^Mp(\hbox{\bf y}\mid \hbox{\bf x}, \boldsymbol{\theta}_m)p(y∣x)=M−1∑m=1M​p(y∣x,θm​) approximates the BMA each Gaussian component centered around a different basin of attraction
  • Ensembling methods can be non-Bayesian

Binary Accuracy (Normal vs. COVID)

Method Accuracy ↑ MCC ↑
MSWAG 1 1
SL 2 2
DE 3.62 3.62
MCD 3.62 3.62
KFL 4.75 4.75
GB 6.12 6.12
LL 6.88 6.88
SWAG 8.25 8.12
BBB 9 8.88
pSGLD 9.75 10
SGD 11.2 13.2
GP 11.8 11
RF 13.1 12.2
SGHMC 13.9 13.5
SGLD 15.2 15.2
LIVI 16.2 16.2
SVM 16.5 16.5
KNN 18 18
LR 19 19
DT 20 20
NB 21 21
DL 22 22

The rank averaged over the 4 thresholds.

  • MSWAG, SL, DE, MCD, KFL consistent performance across different metrics.

  • Preconditioning is needed for MCMC-based BNN.

  • Due to small sampling, LIVI demonstrates large variability, resulting in low accuracy in average.

  • SGD performance falling between BNN and ML.

  • DNN outperforms ML, but Gradient Boosting (GB) stands out.

  • Random Forests (RF) and Gaussian Process (GP) better than MCMC-based BNN.

Important

  • The ability to capture multiple basins of attraction is crucial (SWAG to MSWAG, SGD to DE)

  • For Laplace methods, a well-approximated HHH using a subnetwork is better than a full network with a poorly approximated HHH.

Multiclass Prediction (Normal, COVID, LO, VP)

Method Accuracy (Avg)
MSWAG .932
MCD .905
SWAG .895
DE .882
KFL .877
SL .876
LL .875
GB .872
SGD .868
GP .848
LIVI .842
RF .833
SVM .807
pSGLD .782
BBB .768
SGLD .767
LR .753
SGHMC .735
  • MCMC methods and BBB demonstrate much lower accuracy, potentially suffering from the sampling inaccuracy or approximation error.

  • MCMC methods require a larger parameter sample size during training (accuracy increased by 5% from 20 to 100)

  • GB achieves the highest accuracy among the ML methods, followed by GP and RF.

ATTENTION

  • BNN performance depends on architecture.

  • ResNet-50 improves accuracy by 4 - 5% across BNN methods

    • MCMC achieving a 6% increase
    • SL improving by 8%

Multiclass Prediction (Normal, COVID, LO, VP)

Method Accuracy (Avg)
MSWAG .932
MCD .905
SWAG .895
DE .882
KFL .877
SL .876
LL .875
GB .872
SGD .868
GP .848
LIVI .842
RF .833
SVM .807
pSGLD .782
BBB .768
SGLD .767
LR .753
SGHMC .735

FINDINGS

Predictive accuracy is influenced by three key factors

  • 👉 How effectively high-dimensional data is reduced or better represented.
    • CNN extracts essential features containing spatial structures.
    • CNN features + ML improves accuracy 1 to 3%.
  • 👉 How well over-parameterized DNN models are regularized, and the level of sophistication in shallow ML models.
    • Subnetwork and dropout DNN
    • Flexible GP
  • 👉 How accurately posterior or BMA is approximated.
    • Ensembling that capture multimodality of posterior results in better approximation.

Uncertainty Quantification

  • Negative log-likelihood (NLL), Brier score, and expected calibration error (ECE) to measure the uncertainty quality.
  • Evaluate uncertainty quality using in-sample and out-of-distribution (OOD) data.
    • In-sample: training (Normal, COVID, LO, VP); test (Normal, COVID, LO, VP)
    • OOD: training (Normal, COVID); test (LO, VP)
  • For OOD prediction, a more ambiguous conclusion, a predictive probability not close to zero or one is expected.

Uncertainty Quality: In-sample Prediction

Method ECE Avg ↓ Confidence (correct) Avg ↑ Confidence (incorrect) Avg ↓
MSWAG .011 .928 .768
MCD .008 .961 .877
SWAG .048 .951 .878
DE .008 .922 .904
KFL .230 .485 .354
GB .012 .972 .847
SL .105 .986 .907
SGD .066 .924 .898
LL .436 .259 .256
GP .111 .699 .483
RF .095 .718 .501
SVM .021 .796 .615
LIVI .312 .256 .246
pSGLD .032 .954 .870
BBB .063 .998 .998
SGLD .042 .972 .930
LR .065 .853 .690
SGHMC .251 .472 .422
  • KFL, LL, LIVI, and SGHMC have the worst ECE due to their lack of confidence, and inability to assign high (low) probabilities to true positives (negatives).

  • While low ECE, DE tends to be overly confident in mislabeled predictions.

  • KFL is conservative. For incorrect predictions, it is much less less certain about the result.

  • BBB shows poor calibration by assigning extremely high probabilities to the wrong class. ~An arrogant and opinionated one!~

  • LL and LIVI exhibit poor calibration by distributing probabilities almost evenly across each class. ~One with no self-confidence and decidophobia!~

  • GB is slightly overconfident in the incorrectly labeled group, whereas GP and RF are better calibrated.

Uncertainty Quality: Out-of-distribution Prediction

  • VP images are likely to be labeled as COVID. Both involve lung inflammation with similar symptoms?

  • DNN methods tend to be overly confident when labeling OOD images.

    • pSGLD appears too rigid in adjusting its predictions for OOD data
    • KFL is prudent and assigns moderate predictive probabilities.

Here shows the distribution of estimated probability that the image belongs to COVID

Uncertainty Quality: Out-of-distribution Prediction

  • GB and GP tend to assign a higher probability of being COVID to a VP image.

  • They have minor OOD-overconfidence shown in DNN methods.

Summary

  • DNNs tend to be overconfident on either in-sample or OOD data.

  • Ensembling, purely Bayesian or not, enhances predictive accuracy much, but not calibration.

  • Bayesian methods that experience mode collapse, such as VI methods with unimodal variational distributions, may perform worse than non-Bayesian or ensemble methods.

  • When the data-generating process involves multiple hypotheses, prediction using BMA could result in overfitting/overconfidence. (Consider Bayesian model combination, Bayesian ensembling)

  • Fully connected networks often underperform compared to those incorporating regularization such as dropout or subnetwork inference.

Practical Guidelines

No single method outperforms in all performance measures!


❓ Computing power and resources are a concern

👉 ML ensembles, such as (extreme) gradient boosting.


❓ Powerful computing resources are available

👉 MCD, SL, MSWAG, and DE (ordered by computing time) are the best for predictive accuracy.

👉 For multiclass tasks, deep learning is preferred with more advanced architectures like ResNet-50 for improved performance.


❓ The data from different sources, may be contaminated, or overconfidence is a primary concern

👉 KFL or GP provides more reasonable confidence and calibration.

References

Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. “Weight Uncertainty in Neural Network.” In Proceedings of the 32nd International Conference on Machine Learning, edited by Francis Bach and David Blei, 37:1613–22. Proceedings of Machine Learning Research. Lille, France: PMLR. https://proceedings.mlr.press/v37/blundell15.html.
Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014. “Stochastic Gradient Hamiltonian Monte Carlo.” In Proceedings of the 31st International Conference on Machine Learning, edited by Eric P. Xing and Tony Jebara, 32:1683–91. Proceedings of Machine Learning Research. Bejing, China: PMLR. https://proceedings.mlr.press/v32/cheni14.html.
Damianou, Andreas, and Neil D. Lawrence. 2013. “Deep Gaussian Processes.” In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, edited by Carlos M. Carvalho and Pradeep Ravikumar, 31:207–15. Proceedings of Machine Learning Research. Scottsdale, Arizona, USA: PMLR. https://proceedings.mlr.press/v31/damianou13a.html.
Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. “Laplace Redux–Effortless Bayesian Deep Learning.” In NeurIPS.
Daxberger, Erik, Eric Nalisnick, James Urquhart Allingham, Javier Antorán, and José Miguel Hernández-Lobato. 2022. “Bayesian Deep Learning via Subnetwork Inference.” https://arxiv.org/abs/2010.14689.
Domingos, Pedro. 2000. “Bayesian Averaging of Classifiers and the Overfitting Problem.” In Proceedings of the Seventeenth International Conference on Machine Learning, 223–30. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Farquhar, Sebastian, Lewis Smith, and Yarin Gal. 2020. “Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc.
Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning, edited by Maria Florina Balcan and Kilian Q. Weinberger, 48:1050–59. Proceedings of Machine Learning Research. New York, New York, USA: PMLR. https://proceedings.mlr.press/v48/gal16.html.
Gawlikowski, Jakob, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, et al. 2021. “A Survey of Uncertainty in Deep Neural Networks.” CoRR. https://arxiv.org/abs/1506.02157.
Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. “On Calibration of Modern Neural Networks.” CoRR abs/1706.04599. http://arxiv.org/abs/1706.04599.
Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” In arXiv Preprint. https://arxiv.org/abs/1704.04861.
Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. “Averaging Weights Leads to Wider Optima and Better Generalization.” CoRR. https://arxiv.org/abs/1803.05407.
Jaffar, Arfan, Muhammad Adnan Khan, and Amir Mosavi. 2022. “Performance Analysis of State-of-the-Art CNN Architectures for LUNA16.” Sensors 22 (12): 4426. https://doi.org/10.3390/s22124426.
Jospin, Laurent Valentin, Hamid Laga, Farid Boussaid, Wray Buntine, and Mohammed Bennamoun. 2022. “Hands-On Bayesian Neural Networks — a Tutorial for Deep Learning Users.” IEEE Computational Intelligence Magazine 17 (2): 29–48. https://doi.org/10.1109/MCI.2022.3155327.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2017. “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6): 84–90. https://doi.org/10.1145/3065386.
Kumar, M. D., G. V. Sivanarayana, D. Indira, et al. 2023. “Skin Cancer Segmentation with the Aid of Multi-Class Dilated d-Net (MD2N) Framework.” Multimedia Tools and Applications 82: 35995–6018. https://doi.org/10.1007/s11042-023-14605-9.
Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. 2017. “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, edited by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 30:6405–16. NIPS’17. Long Beach, California, USA: Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf.
Li, Chunyuan, Changyou Chen, David Carlson, and Lawrence Carin. 2016. “Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks.” In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 1788–94.
Maddox, Wesley J, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson. 2019. “A Simple Baseline for Bayesian Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, 32:13153–64. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/118921efba23fc329e6560b27861f0c2-Paper.pdf.
Minka, Thomas P. 2002. “Bayesian Model Averaging Is Not Model Combination.” In. https://api.semanticscholar.org/CorpusID:116598428.
Monteith, Kristine, James L. Carroll, Kevin Seppi, and Tony Martinez. 2011. “Turning Bayesian Model Averaging into Bayesian Model Combination.” In The 2011 International Joint Conference on Neural Networks, 2657–63. https://doi.org/10.1109/IJCNN.2011.6033566.
Rayed, Md. Eshmam, S. M. Sajibul Islam, Sadia Islam Niha, Jamin Rahman Jim, Md Mohsin Kabir, and M. F. Mridha. 2024. “Deep Learning for Medical Image Segmentation: State-of-the-Art Advancements and Challenges.” Informatics in Medicine Unlocked 47: 101504. https://doi.org/https://doi.org/10.1016/j.imu.2024.101504.
Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A Scalable Laplace Approximation for Neural Networks.” In International Conference on Learning Representations. https://openreview.net/forum?id=Skdvd2xAZ.
Simonyan, Karen, and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv Preprint arXiv:1409.1556.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. “Going Deeper with Convolutions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9.
Tan, Mingxing, and Quoc V. Le. 2019. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” In Proceedings of the 36th International Conference on Machine Learning (ICML), 6105–14. https://arxiv.org/abs/1905.11946.
Uppal, Anshuk, Kristoffer Stensbo-Smidt, Wouter Boomsma, and Jes Frellsen. 2023. “Implicit Variational Inference for High-Dimensional Posteriors.” In Thirty-Seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=Sxu7xlUJGx.
Wang, Yaping, Hongjun Jia, Pew-Thian Yap, Bo Cheng, Chong-Yaw Wee, Lei Guo, and Dinggang Shen. 2012. “Groupwise Segmentation Improves Neuroimaging Classification Accuracy.” In Multimodal Brain Image Analysis, edited by Pew-Thian Yap, Tianming Liu, Dinggang Shen, Carl-Fredrik Westin, and Li Shen, 185–93. Berlin, Heidelberg: Springer Berlin Heidelberg.
Welling, Max, and Yee W Teh. 2011. “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–88.
Wilson, Andrew G, and Pavel Izmailov. 2020. “Bayesian Deep Learning and a Probabilistic Perspective of Generalization.” In Advances in Neural Information Processing Systems, edited by H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, 33:4697–4708. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/322f62469c5e3c7dc3e58f5a4d1ea399-Paper.pdf.
Yu, Cheng-Han, and Shuaizhou Wang. 2025. “A Comparative Study of Bayesian Neural Networks and Machine Learning Based on COVID-19 Image Classification.” Statistics and Data Science in Imaging. https://doi.org/https://doi.org/10.1080/29979676.2025.2497555.

Computational Time

Discussion

  • When medical imaging is the primary focus, tasks such as segmentation, registration, and reconstruction play crucial roles in enhancing inference and analysis quality. (Wang et al. 2012; Kumar et al. 2023; Rayed et al. 2024)

  • The development of BNNs specifically tailored for segmentation or general image pre-processing remains limited.

  • There are other popular architectures such as VGGNet (Simonyan and Zisserman 2014), GoogLeNet (Szegedy et al. 2015), MobileNet (Howard et al. 2017), EfficientNet (Tan and Le 2019).

  • A comprehensive study of how BNNs and associated algorithms perform on different architectures could be a valuable work that benefits Bayesian and deep learning communities.

https://chenghanyuslides.netlify.app/25-bnn-smi/

1 / 28
When Deep? When Bayesian? Learn from COVID X-rays 2025 Statistical Methods in Imaging Conference Dr. Cheng-Han Yu Department of Mathematical and Statistical Sciences Marquette University 2025-05-21

  1. Slides

  2. Tools

  3. Close
  • When Deep? When Bayesian? Learn from COVID X-rays
  • Image Data for Benchmarking
  • What Lacks in Current Image Classification Studies
  • What Lacks in Current Image Classification Studies
  • Bayesian Neural Networks (BNNs) Come Into Play
  • Research Questions
  • Models and Learning Algorithms
  • Neural Network Architecture1
  • Bayesian Model Averaging (BMA)
  • Bayesian Learning: Stochastic Gradient MCMC
  • Bayesian Learning: Variational Inference (VI)
  • Bayesian Learning: Laplace Approximation
  • Bayesian Learning: Laplace Approximation
  • Bayesian Learning: Stochastic Weight Averaging
  • Bayesian Learning: Stochastic Weight Averaging Gaussian
  • Bayesian Learning?! Ensembling
  • Binary Accuracy (Normal vs. COVID)
  • Multiclass Prediction (Normal, COVID, LO, VP)
  • Multiclass Prediction (Normal, COVID, LO, VP)
  • Uncertainty Quantification
  • Uncertainty Quality: In-sample Prediction
  • Uncertainty Quality: Out-of-distribution Prediction
  • Uncertainty Quality: Out-of-distribution Prediction
  • Summary
  • Practical Guidelines
  • References
  • Computational Time
  • Discussion
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help