Grounded commonsense inference is a text-pair classification task that involves determining whether a specific event is a likely outcome given its surrounding context. The model evaluates the relationship between the context (premise) and the potential event (hypothesis) to make a judgment on its plausibility.

Grounded Commonsense Inference

Question-answering inference is a text-pair classification task focused on validating if a given answer is appropriate for a specific question. The model processes the question and the potential answer as a text pair to determine their correspondence.

Question-Answering Inference

Natural Language Inference (NLI), also known as text entailment, is a text-pair task that determines whether a given hypothesis can be logically inferred or entailed from a premise text. This requires the model to understand the semantic relationship between the two texts.

Natural Language Inference

Sentence Textual Similarity (STS) is a task that measures the degree of semantic equivalence between two pieces of text. It is also referred to as semantic equivalence judgement, where the goal is to determine if two texts convey the same meaning.

Sentence Textual Similarity (STS) and Semantic Equivalence

This node illustrates the general pipeline for applying BERT to text-pair tasks. Two texts are concatenated into a single input sequence, formatted as `[CLS] Text 1 [SEP] Text 2 [SEP]`. This sequence is converted to embeddings and processed by BERT to produce hidden states. The aggregate representation from the `[CLS]` token's hidden state, $h_{cls}$, is then fed into a final prediction network. This network can be configured for different tasks, such as outputting a class label for classification or a real-valued score for regression.

The process for **classification** can be visualized as follows:

```
Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP]
                  ↓
Embeddings:   e_cls, e1, ..., e_len
                  ↓
                  BERT
                  ↓
Hidden States:h_cls, h1, ..., h_len
                  ↓ (select h_cls)
                  Prediction Network
                  ↓
                  Class
```

Similarly, the process for **regression** to output a numerical score is:

```
Input Tokens: [CLS] x1...xm [SEP] y1...yn [SEP]
                  ↓
Embeddings:   e_cls, e1, ..., e_len
                  ↓
                  BERT
                  ↓
Hidden States:h_cls, h1, ..., h_len
                  ↓ (select h_cls)
                  Prediction Network
                  ↓
                  Number (similarity, evaluation score, etc.)
```

Illustration of BERT for Text-Pair Tasks (Classification and Regression)

An NLP model is tasked with evaluating the following pair of sentences:

Premise: 'The athlete won the gold medal after years of dedicated training.'
Hypothesis: 'The athlete is successful.'

The model must determine if the hypothesis logically follows from the premise. Which specific type of text-pair classification problem does this scenario best exemplify?

When handling sentence pairs, BERT processes them as a unified sequence. This sequence begins with a `[CLS]` token, followed by the first sentence (denoted as $$\mathrm{Sent}_{A}$$), a separator token `[SEP]`, the second sentence ($$\mathrm{Sent}_{B}$$), and a concluding `[SEP]` token. As established in the original BERT paper, the `[SEP]` token explicitly marks the boundary between the two sentences. This general input representation is formally expressed as the sequence: `[CLS] SentA [SEP] SentB [SEP]`.

BERT Input Format for Sentence Pairs

The complete process for text-pair classification involves several sequential steps. Initially, two texts are formatted into a single input sequence, typically prepended with a `[CLS]` token and separated by a `[SEP]` token. This token sequence is then transformed into a corresponding sequence of numerical embeddings. A Transformer encoder like BERT processes these embeddings to produce a sequence of contextualized hidden states, \{$h_0, ..., h_m$\}. The hidden state $h_0$, corresponding to the `[CLS]` token, is selected as the aggregate representation for the entire text pair. Finally, this single vector is passed through a prediction network to generate the classification output.

End-to-End Pipeline for Text-Pair Classification

A language model is being used to determine if a product review and a one-sentence summary of that review are semantically equivalent. Arrange the following steps into the correct sequence for how the model processes this text pair to produce a classification.

You are an NLP engineer tasked with reducing redundant content on a large online Q&A forum. The goal is to automatically identify when a newly submitted question is a duplicate of a question that has already been asked and answered. Frame this problem as a text-pair classification task. Specifically, describe what constitutes the input text pair and what the potential output labels would be.

Duplicate Question Detection on a Q&A Forum

Text-pair classification extends standard classification methods to process two distinct texts simultaneously. When provided with a pair of texts composed of tokens $$x_1 \dots x_m$$ and $$y_1 \dots y_n$$, the two sequences are concatenated into a single combined sequence. The total length of this unified sequence is $$len$$, calculated as $$len = n + m + 2$$ to account for necessary special tokens. A classification label is subsequently predicted for the entire sequence by utilizing the aggregated representation vector, specifically $$\mathbf{h}_{\mathrm{cls}}$$. This overarching framework addresses multiple NLP challenges, including semantic equivalence judgement (assessing if two texts share identical meanings), text entailment judgement (evaluating if a hypothesis logically stems from a premise), grounded commonsense inference (gauging the likelihood of an event given its context), and question-answering inference (verifying if an answer correctly matches a question).

Google

The categorization of texts into organized groups, which is usually implemented by supervised learning.
Applications: sentiment analysis, spam detection, news categorization, topic analysis, and more.

Text Classification

Reference of Foundations of Large Language Models Course

One of the earliest linear embedding models, trained on 200,000 words using less than 1 million parameters.

Latent Semantic Analysis (LSA)

Graph neural networks capture the internal graph structures of natural language, like syntactic and semantic parse trees.

Graph Neural Networks

A text classification category that views text as a "bag of words." This is the simplest model for text classification, but is still very successful. 

Steps:
1) Learn the vector representation for each word using an embedding model
2) Take vector sum or average of embeddings as the text representation
3) Pass through one or more feed-forward layers, known as Multi-Layer Perceptrons (MLPs)
4) Perform classification on final layer's representation using a classifier

Feed-Forward Neural Networks

Siamese Neural Networks are a special case of text classification designed for text matching.

Siamese Neural Networks

Linear classifiers use a linear combination of inputs to predict the category.

Linear Classifiers

Sentiment analysis is used to refer to the task of automatically determining the whether a piece of text is positive, negative, or neutral. It can also determine someone's attitude towards a particular target or topic.

Sentiment Analysis

Text-Pair Classification

In text classification, a specific category or label is assigned to a piece of text. For example, in the context of sentiment analysis, the sentence 'We had a delightful dinner together.' would be assigned the label 'Positive'.

Learn Before

Related

Learn After