Learn Before
Designing a Self-Supervised Task for Code
You are tasked with pre-training a language model on a massive, unlabeled dataset of computer code. Beyond the common approach of predicting randomly masked parts of the code, propose one distinct self-supervised objective that would be particularly well-suited for this dataset. Briefly justify why your proposed objective would help the model learn meaningful patterns specific to code.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Creation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is training a language model on a massive, unlabeled corpus of text from the internet. Their training objective is to randomly mask 15% of the words in each input sentence and require the model to predict the original masked words. Which of the following statements best analyzes why this specific training method is considered 'self-supervised'?
Pre-training Strategy for a Specialized Domain
Designing a Self-Supervised Task for Code
Training Process for Text-to-Text Models