Distilling Ensembles Improves Uncertainty Estimates

Ensembling
Uncertainty modeling

Zelda Mariet, Rodolphe Jenatton, Florian Wenzel, Dustin Tran

AABI 2021 Symposium

Publication year: 2021

External Link

We seek to bridge the performance gap between batch ensembles (ensembles of deep networks with shared parameters) and deep ensembles on tasks which require not only predictions, but also uncertainty estimates for these predictions. We obtain negative theoretical results on the possibility of approximating deep ensemble weights by batch ensemble weights, and so turn to distillation. Training a batch ensemble on the outputs of deep ensembles improves accuracy and uncertainty estimates, without requiring hyper-parameter tuning. This result is specific to the choice of batch ensemble architectures: distilling deep ensembles to a single network is unsuccessful, despite single networks having only marginally fewer parameters than batch ensembles.

Zelda Mariet

Bioptimus

Other useful links:

Distilling Ensembles Improves Uncertainty Estimates