BERT is applied to the CoNLL-2003 Named Entity Recognition (NER) task. From the table we can see that there is only 0.3 F1 difference between the best performing method that concatenates the token representations from the top four hidden layers of the pre-trained Transformer and the one fine-tuningthe entire model. Therefore, BERT is effective for both fine-tuning and feature-based approaches.

University of Michigan - Ann Arbor

- A task-specific model architecture is added to improve the representation of the tasks.
- Running many experiments with cheaper models after pre-computing the expensive representation of the training data once is beneficial for the computation.


Benefits of Feature-based Approach

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

https://paperswithcode.com/method/bert

Learn Before

Related