Learn Before
A programmer is using a specific method to break down the sentence "Let's re-evaluate the model's performance." into a list of basic units. The method's rules are: 1) Split the text by spaces, and 2) Treat each punctuation mark (like '-', ''', and '.') as a separate unit. Which of the following outputs correctly applies these rules?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider the sentence:
The model's performance isn't great.This sentence is processed using two different methods for breaking down text into basic units (tokens), resulting in the following outputs:- Method A:
['The', 'model', ''s', 'performance', 'is', 'n't', 'great', '.'] - Method B:
['The', 'model's', 'performance', 'isn't', 'great', '.']
By analyzing the differences between these two lists of tokens, what can be inferred about the underlying rules of each method?
- Method A:
Distinguishing Words from Tokens
A programmer is using a specific method to break down the sentence "Let's re-evaluate the model's performance." into a list of basic units. The method's rules are: 1) Split the text by spaces, and 2) Treat each punctuation mark (like '-', ''', and '.') as a separate unit. Which of the following outputs correctly applies these rules?