LSTM-Based RNN Architecture
In the late 1990s, researchers Hochreiter and Schmidhuber proposed the Long Short-Term Memory (LSTM), which enables Recurrent Neural Networks (RNNs) to retain information over extended sequences rather than merely between consecutive time steps. Originally published in 1997, LSTMs gained significant recognition through victories in prediction competitions during the mid-2000s and became the dominant architecture for sequence learning from 2011 until the rise of Transformer models beginning in 2017. Even Transformers owe some of their key ideas to architectural design innovations first introduced by the LSTM. An LSTM-based RNN shares the same high-level architecture as a basic RNN (whether simple, bidirectional, or deep), but replaces standard activation functions with specialized LSTM cells.
0
0
Tags
Data Science
D2L
Dive into Deep Learning @ D2L
Related
Bidirectional RNNs
Stacked RNNs
Gated recurrent unit (GRU)
Neural Turing Machines (NTM)
Neural Turing Machines - Original Paper Reference
LSTM-Based RNN Architecture
LSTM Cell
Applictaions of Long Short-Term Memory Networks (LSTMs)
LSTM-Based RNN Architecture
Computational Cost of Training Sequence Models
Concise LSTM Implementation