Learn Before
A data scientist is preparing a large text corpus scraped from public internet forums to train a general-purpose chatbot. To improve data quality, they apply a filter that automatically deletes any text segment containing words from a predefined list of profanities. Which statement provides the most accurate evaluation of this data cleaning strategy?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A data scientist is preparing a large text corpus scraped from public internet forums to train a general-purpose chatbot. To improve data quality, they apply a filter that automatically deletes any text segment containing words from a predefined list of profanities. Which statement provides the most accurate evaluation of this data cleaning strategy?
Refining a Customer Service Chatbot Dataset
You are tasked with creating a data processing pipeline to clean a large, raw text corpus for training a language model. Arrange the following cleaning steps into the most logical and efficient order.