Learn Before
Definition

Type-Token Ratio (Lexical Diversity)

The Type-Token Ratio (TTR) is a lexical-diversity metric defined as the number of distinct word types divided by the total number of word tokens in a text: TTR=typestokens\mathrm{TTR}=\dfrac{|\text{types}|}{|\text{tokens}|}. Values lie in (0,1](0,1]; higher TTR indicates richer vocabulary use, while lower TTR indicates more repetition. Introduced as a vocabulary-diversity index by Johnson (1944) and adopted as a standard measure of child-language development by Templin (1957), TTR is widely used in stylometry as a voice marker. A well-known limitation is its sensitivity to text length: TTR mechanically decreases as token count grows, motivating length-robust variants such as MTLD, vocd-D, and HD-D validated by McCarthy & Jarvis (2010). Reporting TTR deltas with confidence intervals between paired texts of comparable length, as in voice-preservation studies, controls for this length effect.

0

1

Updated 2026-05-16

Contributors are:

Who are from:

Tags

Science

Research Paper: Advanced Prompting