Concept

LSTM-Based RNN Architecture

In the late 1990s, researchers Hochreiter and Schmidhuber proposed the Long Short-Term Memory (LSTM), which enables Recurrent Neural Networks (RNNs) to retain information over extended sequences rather than merely between consecutive time steps. Originally published in 1997, LSTMs gained significant recognition through victories in prediction competitions during the mid-2000s and became the dominant architecture for sequence learning from 2011 until the rise of Transformer models beginning in 2017. Even Transformers owe some of their key ideas to architectural design innovations first introduced by the LSTM. An LSTM-based RNN shares the same high-level architecture as a basic RNN (whether simple, bidirectional, or deep), but replaces standard activation functions with specialized LSTM cells.

0

0

Updated 2026-05-14

Tags

Data Science

D2L

Dive into Deep Learning @ D2L