Learn Before
Rescoring and Reranking for Inference-Time Alignment
Rescoring, also known as reranking, is an inference-time alignment technique that evaluates and prioritizes a model's generated outputs. This method uses a scoring system, often a reward model, to select the best output from multiple candidates. Reranking has a history of use in NLP tasks like machine translation and is typically applied when training complex models is prohibitively expensive, as it offers a low-cost way to incorporate their capabilities.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.5 Inference - Foundations of Large Language Models
Related
Prompting as a Form of Inference-Time Alignment
Rescoring and Reranking for Inference-Time Alignment
A company deploys a large, pre-trained language model for its public-facing chatbot. Due to immense computational costs, they cannot alter the model's core programming or retrain it. To ensure the chatbot's responses are consistently helpful and harmless, they implement a new system. This system works by having the original model generate five different potential answers for every user query. A second, much smaller, specialized model then rapidly evaluates these five answers based on safety and helpfulness criteria, and only the highest-scoring answer is displayed to the user. Which principle does this company's strategy best illustrate?
Choosing an LLM Alignment Strategy
System Information in Prompts
LLM Deployment Strategy for a Startup
Learn After
Using Scoring Systems for Inference-Time Rescoring
Best-of-N Sampling (BoN Sampling)
Use of Reranking to Explore Model and Search Errors
The Challenge of Candidate Diversity in Reranking Methods
A development team uses a large, pre-trained language model to generate summaries of news articles. To improve the factual accuracy of the final output, their system first generates five different summary candidates. Then, a separate, specialized scoring model evaluates each of the five summaries for factual consistency with the original article and selects the one with the highest score. Which statement best analyzes the trade-offs of this approach?
Improving Chatbot Responses on a Budget
A system is designed to improve the quality of its generated responses at inference time without altering the base model's parameters. It does this by producing several options and then choosing the best one. Arrange the following actions into the correct operational sequence.