- A baseline model trained on tf-idf vector representation of text and a logistic regression to detect WebText articles (online web pages) from text generated using GPT-2 models.

 - Study of different sizes of GPT-2 models indicated that models having a large number of parameters generated text somewhat similar to humans.

 - Text generated from Classifiers built with nucleus sampling are hard to detect.

 - Fine tuning GPT-2 specifically to amazon product reviews generated texts that are human generated.


San Jose State University

 - Bag-of-words classifier
 - Detecting machine configuration


Classifiers trained from scratch for machine generated text detection

Automatic Detection of Machine Generated Text: A Critical Survey
by Ganesh Jawahar, Muhammad Abdul-Mageed, Laks V.S. Lakshmanan
https://arxiv.org/pdf/2011.01314.pdf

Automatic Detection of Machine Generated Text: A Critical Survey


Classic machine learning models like logistic regression built from scratch to distinguish between machine generated and human generated text.

Bag-of-words classifier for machine generated text detection

Comparison of Bag-of-words classifier to detect TGMs 

- Identifying the modelling choice of TGM from the text generated by TGM.

 - It is easy to identify the modelling choice of TGM compared to distinguishing between machine and human generated text.


Learn Before

Related