It learns a label-conditioned generator by finetuning GPT-2 on the training data, using this to generate candidate examples per class. A classifier trained on the original training set is then used to select top k candidate examples that confidently belong to the respective class for augmentation.

University of Michigan - Ann Arbor

Seq2seq and language models have also been used for DA. Techniques include applying selective gates, generating different samples, and augmenting word representations with a context-sensitive attention-based mixture. 

Model-Based DA Techniques in NLP

https://arxiv-download.xixiaoyao.cn/pdf/2105.03075.pdf

A Survey of Data Augmentation Approaches for NLP

Not Enough Data? Deep Learning to the Rescue!

The popular BACKTRANSLATION method translates a sequence into another language and then back into the original language.

BackTranslation

Improving Neural Machine Translation Models with Monolingual Data

https://aclanthology.org/2020.findings-emnlp.90/

Generative Data Augmentation for Commonsense Reasoning

It generates synthetic examples using pretrained transformer language models and selects the most informative and diverse set for augmentation.

G-DAUG

Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange

It involves adjusting the overall semantics of a text to fit the context of a new word/phrase that is inserted called the replacement entity (RE).

SEMANTIC TEXT EXCHANGE (STE) 

LAMBADA

In natural language processing (NLP), model-based data augmentation techniques customized for downstream tasks can have strong positive effects on performance, but they are often difficult to develop and utilize.

Learn Before

Related