Learn Before
Concept

Leaky Units and Other Strategies for Multiple Time Scales

As mentioned in the introduction to long-term dependencies, there are 3 ways to address long-term dependencies, and 1 of them is "Leakage units and other multi-timescale strategies", which will be introduced in this unit

This idea has led to numerous approaches, I will introduce three approaches here.

1 - Adding Skip Connections through Time

Adding direct connections from variables in the distant past to current variables is one way to get coarse time scales. The gradient may vanish or exponentially about the time step. The algorithm introduces circular connections with d-delay to alleviate this problem. The rate of exponential decrease of the derivative is now related to Ď„/d correlation rather than Ď„. Since there is both a delay and a single-step connection, the gradient may still explode exponentially into t. This allows the learning algorithm to capture longer dependencies.

2- Leaky Units and a Spectrum of Different Time Scales

The idea is similar to the sliding average model and shadow variables in TensorFlow. The sliding average actually contains information over a long period of time. In the long-term dependence problem, we can use the sliding average instead of the real parameters, making more long-term information to be considered when the parameters are iterated.

3- Removing connection

Unlike skiping connections, which add edges, the unit can self-select favorable dependencies. Whereas a removing connection directly removes the connection of length 1, forcing the connection to be longer.

0

1

Updated 2021-07-01

Tags

Deep Learning (in Machine learning)

Data Science