When Deep? When Bayesian? Learn from COVID X-rays

Image Data for Benchmarking

MNIST

CIFAR-10/CIFAR-100

ImageNet

What Lacks in Current Image Classification Studies

High accuracy makes it difficult to differentiate model performance.
Benchmark data for high-risk fields such as medical imaging needs adequate uncertainty quantification.

Data: COVID-19 Radiography Database from Kaggle

X-ray images of 4 lung conditions: Normal, COVID, lung opacity (LO), and viral pneumonia (VP)
Each of size 299 by 299 pixels having 10192, 3616, 6012, and 1345 images respectively.

What Lacks in Current Image Classification Studies

Classical deep learning leads to

Overfitting

Oveconfidence/Inadequate Calibration/Uncertainty

Bayesian Neural Networks (BNNs) Come Into Play

Several methods were proposed to overcome such issues (regularization, dropout, early stopping, temperature scaling, data augmentation, etc)

Bayesian approaches come into play because
- incorporate regularization
- systematic and unified framework for quantifying uncertainty in predictions via Bayesian Model Averaging (BMA)
- BMA can be considered an ensembling method (Jospin et al. 2022; Wilson and Izmailov 2020) that has been shown to improve predictive performance

Research Questions

Bayesian deep learning is an emerging field, making it worthwhile to survey existing methods and compare and contrast their differences, strengths, and weaknesses.

👉 How do BNN algorithms compare to classical deep neural networks (DNNs) and machine learning (ML) algorithms?

👉 Which methods perform best in terms of accuracy and uncertainty quality?

👉 When is it better to use which methods?

Using lung X-ray images, under convolutional neural network (CNN), we implement 14 BNN algorithms, the classical stochastic gradient descent (SGD), and 8 ML algorithms to compare
- Predictive accuracy on binary and multiclass classification
- Uncertainty quantification quality on in-sample and out-of-distribution (OOD) data

Models and Learning Algorithms

Method	Full Method Name	Method	Full Method Name
SGLD	Stochastic gradient Langevin dynamics	SWAG	Stochastic weight averaging Gaussian
pSGLD	Preconditioned SGLD	MSWAG	MultiSWAG
SGHMC	Stochastic gradient Hamiltonian Monte Carlo	SGD	Stochastic gradient descent
BBB	Bayes by backpropagation	GB	Gradient boosting
MCD	Monte Carlo dropout	RF	Random forests
LIVI	Linearized implicit variational inference	GP	Gaussian process classification
KFL	Kronecker factored Laplace	SVM	Support vector machine
LL	Low-rank Laplace	KNN	K-nearest neighbors
DL	Diagonal Laplace	LR	Logistic regression
SL	Subnetwork Laplace	DT	Decision tree classifier
DE	Deep ensembles	NB	Naive Bayes

Neural Network Architecture¹

AlexNet (2012) (Krizhevsky, Sutskever, and Hinton 2017) for its historical importance, simplicity, and being used as a benchmark.
It still outperforms some modern architectures such as VGG16 and ResNet-50 on lung image data (Jaffar, Khan, and Mosavi 2022).

Bayesian Model Averaging (BMA)

With prior distribution on weights $\pi\left(\boldsymbol{\theta}\right)$ and training data, $\hbox{\bf D}$ ,

$p\left(\hbox{\bf y}^* \mid \hbox{\bf x}^*, \hbox{\bf D}, \mathcal{BNN} \right) = \int_{\Theta} p(\hbox{\bf y}^* \mid \hbox{\bf x}^*, \boldsymbol{\theta}', \mathcal{BNN})\pi(\boldsymbol{\theta}' \mid \hbox{\bf D}, \mathcal{BNN}) \, d\boldsymbol{\theta}'$

[Pros] Aggregate different sets of weights that produce various high-performing networks that interpret data differently yet with equal accuracy.
BMA assumes there is exactly one true model described by only one set of parameters that generates the data. (Minka 2002; Monteith et al. 2011)
[Cons] When data are generated from a combination of models, overfitting remains even BMA is performed. (Domingos 2000)

Bayesian Learning: Stochastic Gradient MCMC

Stochastic Gradient MCMC (SG-MCMC) is a marriage of SGD and MCMC that uses minibatch for feasible computation.

SGLD (Welling and Teh 2011)	pSGLD (Li et al. 2016)	SGHMC (Chen, Fox, and Guestrin 2014)
SGD + Langevin Dynamics	SGLD + preconditioner	SGD + modified Hamiltonian Dynamics
suffer from pathological curvature	the curvature is similar in all directions	noise injected by SG breaks the dynamics
mode collapse	moves faster away from ill-conditioned areas	add a friction to the momentum update in HMC

Learning rate $\epsilon_t$ decreases toward zero, resulting in high auto-correlated samples.
In practice, $\epsilon_t$ is fixed at a small value for later steps.
- MH steps could be negligible, but posterior samples will no longer be exact.
In practice, posterior size is small, leading to more varying and biased results.
Computationally intensive and memory hungry.

Bayesian Learning: Variational Inference (VI)

VI methods embrace the stochasticity introduced by SGD, leading to stochastic variational inference (SVI): Bayes by backpropagation (BBB), Monte Carlo dropout (MCD), and Linearized implicit variational inference (LIVI).

BBB (Blundell et al. 2015)	MCD (Gal and Ghahramani 2016)	LIVI (Uppal et al. 2023)
maximize approximated ELBO by $\boldsymbol{\theta}^{(s)} \sim q(\boldsymbol{\theta}\mid \boldsymbol{\eta})$	$q\left(\hbox{\bf W}\right)$ is such that $\hbox{\bf W}_l = \hbox{\bf M}_l \cdot \text{diag}\left( \left[ z_{lj}\right]_{j=1}^{K_{l-1}} \right), ~~ z_{lj} \sim \text{Bernoulli}\left( p_l\right)$	$q\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}\right) = \int q\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}, \hbox{\bf z}\right) q(\hbox{\bf z}) \, d\hbox{\bf z}$ with $q\left( \boldsymbol{\theta}\mid \boldsymbol{\eta}, \hbox{\bf z}\right) = N\left( g_{\boldsymbol{\eta}}(\mathbf{z}), \sigma^2\hbox{\bf I}\right)$ and $q(\hbox{\bf z}) = N(\mathbf{0}, \hbox{\bf I})$
reparameterization trick to ensure backpropagation	equiv. to approx. deep Gaussian process (GP) (Damianou and Lawrence 2013)	specify the correlation of parameters
the number of parameters is doubled	adding normal prior on $\hbox{\bf W}$ equiv. to $L_2$ regularization of dropout NN or GP	approximate ELBO by computable LIVI bound
increased computational costs		high memory usage
convergence issues		small sample size lead to variability in predictive accuracy and underconfident predictions
mode collapse

Bayesian Learning: Laplace Approximation

Gaussian approximation includes Laplace methods and Stochastic Weight Averaging Gaussian.

Laplace Approximation $\pi(\boldsymbol{\theta}\mid \hbox{\bf D}) \approx N\left(\boldsymbol{\theta}^{*}, H(\boldsymbol{\theta}^{*})^{-1} \right)$ , where $H(\boldsymbol{\theta}^{*})^{-1}$ is the Hessian matrix of $\log \pi(\boldsymbol{\theta}\mid \hbox{\bf D})$ evaluated at the MAP estimate $\boldsymbol{\theta}^{*}$

Infeasible to learn the entire $H$ .
Hessian approximation methods:
- Diagonal Laplace (DL)
- Kronecker factored approximate curvature (KFAC, KFL)
- Subnetwork Laplace (SL)
- Low-rank Laplace (LL)

Bayesian Learning: Laplace Approximation

DL (Farquhar, Smith, and Gal 2020)	KFL (Ritter, Botev, and Barber 2018)	SL (Daxberger et al. 2022)
$H$ is diagonal	approximate $H$ by a block-diagonal matrix	treats a subset of weights probabilistically
naive for CNN	$H_l \approx \mathcal{V}_{l} \otimes \mathcal{U}_{l}$	the remaining parameters at their MAP
	captures essential second-order curvature	infers a full-covariance Gaussian posterior over a subnetwork

LL (Daxberger et al. 2021) approximates $H$ by its first $k$ eigenvectors corresponding to the largest $k$ eigenvalues.

Bayesian Learning: Stochastic Weight Averaging

Stochastic Weight Averaging (SWA) (Izmailov et al. 2018) takes average of the weights traversed by SGD with a cyclical or constant learning rate.

Source: (Izmailov et al. 2018)

Bayesian Learning: Stochastic Weight Averaging Gaussian

SWAG approximates the posterior by a Gaussian distribution $N\left( \boldsymbol{\theta}_{SWA}, \frac{1}{2} \left( \Sigma_{\text{diag}} + \Sigma_{\text{low-rank}}\right) \right)$

Bayesian Learning?! Ensembling

Ensembling has been interpreted as an approximation to BMA (Wilson and Izmailov 2020; Jospin et al. 2022)

DE (Lakshminarayanan, Pritzel, and Blundell 2017)	MultiSWAG (Wilson and Izmailov 2020)
$\boldsymbol{\theta}_1, \dots, \boldsymbol{\theta}_M$ learned independently from $M$ models (non-Bayesian MultiSGD)	Gaussian mixture approximation to the posterior
$p(\hbox{\bf y}\mid \hbox{\bf x}) = M^{-1}\sum_{m=1}^Mp(\hbox{\bf y}\mid \hbox{\bf x}, \boldsymbol{\theta}_m)$ approximates the BMA	each Gaussian component centered around a different basin of attraction

Binary Accuracy (Normal vs. COVID)

Method	Accuracy ↑	MCC ↑
MSWAG	1	1
SL	2	2
DE	3.62	3.62
MCD	3.62	3.62
KFL	4.75	4.75
GB	6.12	6.12
LL	6.88	6.88
SWAG	8.25	8.12
BBB	9	8.88
pSGLD	9.75	10
SGD	11.2	13.2
GP	11.8	11
RF	13.1	12.2
SGHMC	13.9	13.5
SGLD	15.2	15.2
LIVI	16.2	16.2
SVM	16.5	16.5
KNN	18	18
LR	19	19
DT	20	20
NB	21	21
DL	22	22

The rank averaged over the 4 thresholds.

MSWAG, SL, DE, MCD, KFL consistent performance across different metrics.
Preconditioning is needed for MCMC-based BNN.
Due to small sampling, LIVI demonstrates large variability, resulting in low accuracy in average.
SGD performance falling between BNN and ML.
DNN outperforms ML, but Gradient Boosting (GB) stands out.
Random Forests (RF) and Gaussian Process (GP) better than MCMC-based BNN.

Important

The ability to capture multiple basins of attraction is crucial (SWAG to MSWAG, SGD to DE)
For Laplace methods, a well-approximated $H$ using a subnetwork is better than a full network with a poorly approximated $H$ .

Multiclass Prediction (Normal, COVID, LO, VP)

Method	Accuracy (Avg)
MSWAG	.932
MCD	.905
SWAG	.895
DE	.882
KFL	.877
SL	.876
LL	.875
GB	.872
SGD	.868
GP	.848
LIVI	.842
RF	.833
SVM	.807
pSGLD	.782
BBB	.768
SGLD	.767
LR	.753
SGHMC	.735

MCMC methods and BBB demonstrate much lower accuracy, potentially suffering from the sampling inaccuracy or approximation error.
MCMC methods require a larger parameter sample size during training (accuracy increased by 5% from 20 to 100)
GB achieves the highest accuracy among the ML methods, followed by GP and RF.

ATTENTION

BNN performance depends on architecture.
ResNet-50 improves accuracy by 4 - 5% across BNN methods
- MCMC achieving a 6% increase
- SL improving by 8%

Multiclass Prediction (Normal, COVID, LO, VP)

Method	Accuracy (Avg)
MSWAG	.932
MCD	.905
SWAG	.895
DE	.882
KFL	.877
SL	.876
LL	.875
GB	.872
SGD	.868
GP	.848
LIVI	.842
RF	.833
SVM	.807
pSGLD	.782
BBB	.768
SGLD	.767
LR	.753
SGHMC	.735

FINDINGS

Predictive accuracy is influenced by three key factors

👉 How effectively high-dimensional data is reduced or better represented.
- CNN extracts essential features containing spatial structures.
- CNN features + ML improves accuracy 1 to 3%.
👉 How well over-parameterized DNN models are regularized, and the level of sophistication in shallow ML models.
- Subnetwork and dropout DNN
- Flexible GP
👉 How accurately posterior or BMA is approximated.
- Ensembling that capture multimodality of posterior results in better approximation.

Uncertainty Quantification

Negative log-likelihood (NLL), Brier score, and expected calibration error (ECE) to measure the uncertainty quality.

Evaluate uncertainty quality using in-sample and out-of-distribution (OOD) data.
- In-sample: training (Normal, COVID, LO, VP); test (Normal, COVID, LO, VP)
- OOD: training (Normal, COVID); test (LO, VP)
For OOD prediction, a more ambiguous conclusion, a predictive probability not close to zero or one is expected.

Uncertainty Quality: In-sample Prediction

Method	ECE Avg ↓	Confidence (correct) Avg ↑	Confidence (incorrect) Avg ↓
MSWAG	.011	.928	.768
MCD	.008	.961	.877
SWAG	.048	.951	.878
DE	.008	.922	.904
KFL	.230	.485	.354
GB	.012	.972	.847
SL	.105	.986	.907
SGD	.066	.924	.898
LL	.436	.259	.256
GP	.111	.699	.483
RF	.095	.718	.501
SVM	.021	.796	.615
LIVI	.312	.256	.246
pSGLD	.032	.954	.870
BBB	.063	.998	.998
SGLD	.042	.972	.930
LR	.065	.853	.690
SGHMC	.251	.472	.422

KFL, LL, LIVI, and SGHMC have the worst ECE due to their lack of confidence, and inability to assign high (low) probabilities to true positives (negatives).
While low ECE, DE tends to be overly confident in mislabeled predictions.
KFL is conservative. For incorrect predictions, it is much less less certain about the result.
BBB shows poor calibration by assigning extremely high probabilities to the wrong class. ~An arrogant and opinionated one!~
LL and LIVI exhibit poor calibration by distributing probabilities almost evenly across each class. ~One with no self-confidence and decidophobia!~
GB is slightly overconfident in the incorrectly labeled group, whereas GP and RF are better calibrated.

Uncertainty Quality: Out-of-distribution Prediction

VP images are likely to be labeled as COVID. Both involve lung inflammation with similar symptoms?
DNN methods tend to be overly confident when labeling OOD images.
- pSGLD appears too rigid in adjusting its predictions for OOD data
- KFL is prudent and assigns moderate predictive probabilities.

Uncertainty Quality: Out-of-distribution Prediction

GB and GP tend to assign a higher probability of being COVID to a VP image.
They have minor OOD-overconfidence shown in DNN methods.

Summary

DNNs tend to be overconfident on either in-sample or OOD data.
Ensembling, purely Bayesian or not, enhances predictive accuracy much, but not calibration.
Bayesian methods that experience mode collapse, such as VI methods with unimodal variational distributions, may perform worse than non-Bayesian or ensemble methods.
When the data-generating process involves multiple hypotheses, prediction using BMA could result in overfitting/overconfidence. (Consider Bayesian model combination, Bayesian ensembling)
Fully connected networks often underperform compared to those incorporating regularization such as dropout or subnetwork inference.

Practical Guidelines

No single method outperforms in all performance measures!

❓ Computing power and resources are a concern

👉 ML ensembles, such as (extreme) gradient boosting.

❓ Powerful computing resources are available

👉 MCD, SL, MSWAG, and DE (ordered by computing time) are the best for predictive accuracy.

👉 For multiclass tasks, deep learning is preferred with more advanced architectures like ResNet-50 for improved performance.

❓ The data from different sources, may be contaminated, or overconfidence is a primary concern

👉 KFL or GP provides more reasonable confidence and calibration.

References

Blundell, Charles, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. “Weight Uncertainty in Neural Network.” In Proceedings of the 32nd International Conference on Machine Learning, edited by Francis Bach and David Blei, 37:1613–22. Proceedings of Machine Learning Research. Lille, France: PMLR. https://proceedings.mlr.press/v37/blundell15.html.

Chen, Tianqi, Emily Fox, and Carlos Guestrin. 2014. “Stochastic Gradient Hamiltonian Monte Carlo.” In Proceedings of the 31st International Conference on Machine Learning, edited by Eric P. Xing and Tony Jebara, 32:1683–91. Proceedings of Machine Learning Research. Bejing, China: PMLR. https://proceedings.mlr.press/v32/cheni14.html.

Damianou, Andreas, and Neil D. Lawrence. 2013. “Deep Gaussian Processes.” In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, edited by Carlos M. Carvalho and Pradeep Ravikumar, 31:207–15. Proceedings of Machine Learning Research. Scottsdale, Arizona, USA: PMLR. https://proceedings.mlr.press/v31/damianou13a.html.

Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. “Laplace Redux–Effortless Bayesian Deep Learning.” In NeurIPS.

Daxberger, Erik, Eric Nalisnick, James Urquhart Allingham, Javier Antorán, and José Miguel Hernández-Lobato. 2022. “Bayesian Deep Learning via Subnetwork Inference.” https://arxiv.org/abs/2010.14689.

Domingos, Pedro. 2000. “Bayesian Averaging of Classifiers and the Overfitting Problem.” In Proceedings of the Seventeenth International Conference on Machine Learning, 223–30. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Farquhar, Sebastian, Lewis Smith, and Yarin Gal. 2020. “Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc.

Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Machine Learning, edited by Maria Florina Balcan and Kilian Q. Weinberger, 48:1050–59. Proceedings of Machine Learning Research. New York, New York, USA: PMLR. https://proceedings.mlr.press/v48/gal16.html.

Gawlikowski, Jakob, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, et al. 2021. “A Survey of Uncertainty in Deep Neural Networks.” CoRR. https://arxiv.org/abs/1506.02157.

Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. “On Calibration of Modern Neural Networks.” CoRR abs/1706.04599. http://arxiv.org/abs/1706.04599.

Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.” In arXiv Preprint. https://arxiv.org/abs/1704.04861.

Izmailov, Pavel, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. “Averaging Weights Leads to Wider Optima and Better Generalization.” CoRR. https://arxiv.org/abs/1803.05407.

Jaffar, Arfan, Muhammad Adnan Khan, and Amir Mosavi. 2022. “Performance Analysis of State-of-the-Art CNN Architectures for LUNA16.” Sensors 22 (12): 4426. https://doi.org/10.3390/s22124426.

Jospin, Laurent Valentin, Hamid Laga, Farid Boussaid, Wray Buntine, and Mohammed Bennamoun. 2022. “Hands-On Bayesian Neural Networks — a Tutorial for Deep Learning Users.” IEEE Computational Intelligence Magazine 17 (2): 29–48. https://doi.org/10.1109/MCI.2022.3155327.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2017. “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM 60 (6): 84–90. https://doi.org/10.1145/3065386.

Kumar, M. D., G. V. Sivanarayana, D. Indira, et al. 2023. “Skin Cancer Segmentation with the Aid of Multi-Class Dilated d-Net (MD2N) Framework.” Multimedia Tools and Applications 82: 35995–6018. https://doi.org/10.1007/s11042-023-14605-9.

Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell. 2017. “Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, edited by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 30:6405–16. NIPS’17. Long Beach, California, USA: Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf.

Li, Chunyuan, Changyou Chen, David Carlson, and Lawrence Carin. 2016. “Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks.” In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 1788–94.

Maddox, Wesley J, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon Wilson. 2019. “A Simple Baseline for Bayesian Uncertainty in Deep Learning.” In Proceedings of the 33rd International Conference on Neural Information Processing Systems, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, 32:13153–64. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2019/file/118921efba23fc329e6560b27861f0c2-Paper.pdf.

Minka, Thomas P. 2002. “Bayesian Model Averaging Is Not Model Combination.” In. https://api.semanticscholar.org/CorpusID:116598428.

Monteith, Kristine, James L. Carroll, Kevin Seppi, and Tony Martinez. 2011. “Turning Bayesian Model Averaging into Bayesian Model Combination.” In The 2011 International Joint Conference on Neural Networks, 2657–63. https://doi.org/10.1109/IJCNN.2011.6033566.

Rayed, Md. Eshmam, S. M. Sajibul Islam, Sadia Islam Niha, Jamin Rahman Jim, Md Mohsin Kabir, and M. F. Mridha. 2024. “Deep Learning for Medical Image Segmentation: State-of-the-Art Advancements and Challenges.” Informatics in Medicine Unlocked 47: 101504. https://doi.org/https://doi.org/10.1016/j.imu.2024.101504.

Ritter, Hippolyt, Aleksandar Botev, and David Barber. 2018. “A Scalable Laplace Approximation for Neural Networks.” In International Conference on Learning Representations. https://openreview.net/forum?id=Skdvd2xAZ.

Simonyan, Karen, and Andrew Zisserman. 2014. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv Preprint arXiv:1409.1556.

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. “Going Deeper with Convolutions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9.

Tan, Mingxing, and Quoc V. Le. 2019. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” In Proceedings of the 36th International Conference on Machine Learning (ICML), 6105–14. https://arxiv.org/abs/1905.11946.

Uppal, Anshuk, Kristoffer Stensbo-Smidt, Wouter Boomsma, and Jes Frellsen. 2023. “Implicit Variational Inference for High-Dimensional Posteriors.” In Thirty-Seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=Sxu7xlUJGx.

Wang, Yaping, Hongjun Jia, Pew-Thian Yap, Bo Cheng, Chong-Yaw Wee, Lei Guo, and Dinggang Shen. 2012. “Groupwise Segmentation Improves Neuroimaging Classification Accuracy.” In Multimodal Brain Image Analysis, edited by Pew-Thian Yap, Tianming Liu, Dinggang Shen, Carl-Fredrik Westin, and Li Shen, 185–93. Berlin, Heidelberg: Springer Berlin Heidelberg.

Welling, Max, and Yee W Teh. 2011. “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–88.

Wilson, Andrew G, and Pavel Izmailov. 2020. “Bayesian Deep Learning and a Probabilistic Perspective of Generalization.” In Advances in Neural Information Processing Systems, edited by H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, 33:4697–4708. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/322f62469c5e3c7dc3e58f5a4d1ea399-Paper.pdf.

Yu, Cheng-Han, and Shuaizhou Wang. 2025. “A Comparative Study of Bayesian Neural Networks and Machine Learning Based on COVID-19 Image Classification.” Statistics and Data Science in Imaging. https://doi.org/https://doi.org/10.1080/29979676.2025.2497555.

Computational Time

Discussion

When medical imaging is the primary focus, tasks such as segmentation, registration, and reconstruction play crucial roles in enhancing inference and analysis quality. (Wang et al. 2012; Kumar et al. 2023; Rayed et al. 2024)
The development of BNNs specifically tailored for segmentation or general image pre-processing remains limited.
There are other popular architectures such as VGGNet (Simonyan and Zisserman 2014), GoogLeNet (Szegedy et al. 2015), MobileNet (Howard et al. 2017), EfficientNet (Tan and Le 2019).
A comprehensive study of how BNNs and associated algorithms perform on different architectures could be a valuable work that benefits Bayesian and deep learning communities.