Transformer models are based on attention mechanisms, that draw global dependencies between the input and output. The transformer is a made of the encoder-decoder architecture but each encoder is a stack of six encoders with each encoder containing a self-attention and point-wise fully connected feed forward neural networks. The decoder is also a stack of six decoders with each decoder containing the same components as the encoder, but with an additional attention layer that helps the decoder focus on relevant parts of the input sentence. The use of transfer models improved the existing state-of-the-art across a wide range of tasks including language modeling and NLG.

University of Michigan - Ann Arbor

Applying deep neural networks to Natural Language Processing has helped achieve state-of-the-art performance across different tasks, including the task of language generation due to the capability of neural networks to learn representations with different levels of abstraction. There are four main approaches that have been applied to the task of language generation:
- Language models
- Encoder-decoder architecture
- Memory networks
- Transformer models

Deep Learning approaches for Language Generation

Santhanam, Sashank and Shaikh, Samira. A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems - Past, Present and Future Directions. arXiv, https://arxiv.org/abs/1906.00500.

A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems - Past, Present and Future Directions

Language models are probabilistic models that are capable of predicting the next word given the preceding words in a sequence. There is evidence for the ability of language models to model sequential data of fixed length context using feed forward neural networks. 

Conditional language models are also used as a variant of language models where the language model is conditioned on variables other than the preceding words, like the work done by generating product reviews based on sentiment, author, item or category or generating text with emotional context.

Language Models for NLG

The usage of two RNNs in an end-to-end model overcame the significant limitation where the neural networks could only be applied to problems where input and target can be encoded with fixed dimensionality. A majority of the work done for the task of language generation was done using the encoder-decoder architecture.

Encoder-Decoder Architecture for NLG

Memory networks, a type of learning model, were introduced to overcome to short memory encoded in the hidden states. These networks were used for a variety of question-answering tasks where the answer is generated from a set of facts fed into the model. The answer generated by the model can be a one-word answer or paragraph of text.

Learn Before

Related