Learn Before
Designing a Robust Text Correction Model
A developer is building a language model intended to automatically correct common typos and grammatical errors in user-generated text. They decide to use a pre-training method where the model learns to reconstruct an original, clean sentence from an artificially corrupted version of it. Propose two distinct types of corruption that should be introduced into the training data to best achieve the developer's goal. For each type of corruption, explain precisely how it would help the model learn to handle the intended errors.
0
1
Tags
Data Science
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Training Encoder-Decoder Models with a Denoising Autoencoding Objective
A research team is pre-training a language model with the specific goal of making it highly proficient at understanding long-range contextual relationships and the logical flow of arguments within a paragraph. They use a method where the model learns to restore an original, clean text from a deliberately corrupted version. Which of the following corruption strategies applied to the input text would be most effective for achieving the team's specific goal?
Designing a Robust Text Correction Model
Analyzing the Impact of Input Corruption
Example of Span Masking in Denoising Autoencoding
Example of Sentinel Masking in Denoising Autoencoding