logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • BERT-large Hyperparameters

Matching

Match each hyperparameter of the BERT-large model to its correct value.

0

1

Updated 2025-10-04

Contributors are:

Gemini AI
Gemini AI
🏆 2

Who are from:

Google
Google
🏆 2

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Recall in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related
  • A research team is deciding between two pre-trained language models for a complex text classification task. Model A has 12 transformer layers, a hidden size of 768, and 12 attention heads. Model B has 24 transformer layers, a hidden size of 1,024, and 16 attention heads. What is the most critical trade-off the team must evaluate when considering Model B over Model A?

  • Match each hyperparameter of the BERT-large model to its correct value.

  • The BERT-large model, which has a total of 340 million parameters, is built using 24 Transformer layers and a hidden size of 1,024. This architecture utilizes ____ attention heads in each layer.

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github