Learn Before
  • Applying and Adapting Pre-trained Models to Downstream Tasks

  • Role of Pre-training in Developing Latent Abilities

Fine-Tuning as a Mechanism for Activating Pre-Trained Knowledge

The pre-training and fine-tuning paradigm operates on the principle that LLMs acquire latent abilities for instruction comprehension and response generation during pre-training. However, these learned instruction-response mappings may not have a high probability of being generated during inference. Fine-tuning serves as a mechanism to activate these dormant capabilities by slightly adjusting the model's parameters using a small set of supervised data, which increases the likelihood of generating the desired responses to instructions.

0

1

3 days ago

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related
  • Transfer knowledge of a PTM to the downstream NLP tasks

  • Fine-Tuning Strategies

  • Applications of PTMs

  • Fine-tuning for Sequence Encoding Models

  • Fine-Tuning Pre-trained Models for Downstream Tasks

  • Freezing Encoder Parameters During Fine-Tuning

  • Discarding the Pre-training Head for Downstream Adaptation

  • Textual Instructions for Task Adaptation

  • Influence of Downstream Task on Model Architecture

  • Broad Applications of Fine-Tuning in LLM Development

  • Scope of Introductory Fine-Tuning Discussion

  • LLM Alignment

  • Pre-train and Fine-tune Paradigm for Encoder Models

  • Necessity of Fine-Tuning for Downstream Task Adaptation

  • Fine-Tuning as a Standard Adaptation Method for LLMs

  • Prompting in Language Models

  • Fine-Tuning as a Mechanism for Activating Pre-Trained Knowledge

  • A startup wants to adapt a large, pre-trained language model to classify customer sentiment (positive, negative, neutral). They have a very small labeled dataset (fewer than 500 examples) and extremely limited access to high-performance computing, making extensive retraining financially unfeasible. Which adaptation approach is most suitable for their situation?

  • Efficiency of LLM Adaptation via Prompting

  • A developer intends to specialize a general-purpose, pre-trained language model for a new text classification task by updating its internal parameters. Arrange the following steps in the correct chronological order to accomplish this adaptation.

  • Selecting an Adaptation Strategy for a Pre-trained Model

  • A research team develops a large language model by training it on a massive corpus of text from the internet. When they give the model the instruction, 'Translate the following English sentence to French,' the model instead continues the sentence in English with a grammatically correct but irrelevant phrase. However, after a second, much shorter training phase using a small, curated dataset of English-to-French sentence pairs, the model correctly performs the translation task. Which of the following statements best explains this change in the model's behavior?

  • Fine-Tuning as a Mechanism for Activating Pre-Trained Knowledge

  • Evaluating a Researcher's Conclusion on Model Training

  • The primary purpose of the supervisory phase that follows pre-training is to introduce entirely new capabilities, such as the ability to summarize text, which the model did not acquire in any form during its initial, large-scale training.

Learn After
  • Fine-Tuning Pre-trained Models for Downstream Tasks

  • Instruction Fine-Tuning

  • Superficial Alignment Hypothesis

  • Challenge of Opaque Pre-Training Data in Fine-Tuning

  • A team develops a large language model pre-trained on a massive, diverse corpus of text from the internet. When initially tested on the task of generating concise summaries of legal documents, its performance is poor and unstructured. The team then collects a small, curated dataset of 500 legal documents and their corresponding expert-written summaries. After training the model on this small dataset, its ability to summarize new legal documents improves dramatically. Which statement best analyzes the role of this second training phase?

  • Critiquing a Model Training Hypothesis

  • Implicit Learning of Instruction-Response Mappings During Pre-training

  • Explaining the Impact of Targeted Training