Concept

Inverted Dropout Technique

In modern implementations of dropout (often called inverted dropout), the activation tensor is not only multiplied by a binary mask but the remaining values are also rescaled. If elements are dropped out with probability pp, the surviving elements are divided by 1p1 - p. This rescaling step, performed during training, preserves the expected value of the activations and eliminates the need for scaling adjustments during the test phase.

0

2

Updated 2026-05-07

Tags

Data Science

D2L

Dive into Deep Learning @ D2L