BERT (language model)

Bidirectional Encoder Representations from Transformers (BERT) is a Transformer-based machine learning technique for natural language processing (NLP) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google.[1][2] As of 2019, Google has been leveraging BERT to better understand user searches.[3]

The original English-language BERT has two models[1]: (1) the BERT_BASE: 12 Encoders with 12 bidirectional self-attention heads, and (2) the BERT_LARGE: 24 Encoders with 24 bidirectional self-attention heads. Both models are pre-trained from unlabeled data extracted from the BooksCorpus[4] with 800M words and English Wikipedia with 2,500M words[5].

Performance

When BERT was published, it achieved state-of-the-art performance on a number of natural language understanding tasks:[1]

GLUE (General Language Understanding Evaluation) task set (consisting of 9 tasks)
SQuAD (Stanford Question Answering Dataset) v1.1 and v2.0
SWAG (Situations With Adversarial Generations)

Analysis

The reasons for BERT's state-of-the-art performance on these natural language understanding tasks are not yet well understood.[6][7] Current research has focused on investigating the relationship behind BERT's output as a result of carefully chosen input sequences,[8][9] analysis of internal vector representations through probing classifiers,[10][11] and the relationships represented by attention weights.[6][7]

History

BERT has its origins from pre-training contextual representations including Semi-supervised Sequence Learning,[12] Generative Pre-Training, ELMo,[13] and ULMFit.[14] Unlike previous models, BERT is a deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus. Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, where BERT takes into account the context for each occurrence of a given word. For instance, whereas the vector for "running" will have the same word2vec vector representation for both of its occurrences in the sentences "He is running a company" and "He is running a marathon", BERT will provide a contextualized embedding that will be different according to the sentence.

On October 25, 2019, Google Search announced that they had started applying BERT models for English language search queries within the US.[15] On December 9, 2019, it was reported that BERT had been adopted by Google Search for over 70 languages.[16] In October 2020, almost every single English-based query was processed by BERT.[17]

Recognition

BERT won the Best Long Paper Award at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).[18]

References

Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
"Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-11-27.
"Understanding searches better than ever before". Google. 2019-10-25. Retrieved 2019-11-27.
Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].
Annamoradnejad, Issa (2020-04-27). "ColBERT: Using BERT Sentence Embedding for Humor Detection". arXiv:2004.12765 [cs.CL].
Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). "Revealing the Dark Secrets of BERT". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 4364–4373. doi:10.18653/v1/D19-1445. S2CID 201645145.
Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (2019). "What Does BERT Look at? An Analysis of BERT's Attention". Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 276–286. doi:10.18653/v1/w19-4828.
Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 284–294. arXiv:1805.04623. Bibcode:2018arXiv180504623K. doi:10.18653/v1/p18-1027. S2CID 21700944.
Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). "Colorless Green Recurrent Networks Dream Hierarchically". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 1195–1205. arXiv:1803.11138. Bibcode:2018arXiv180311138G. doi:10.18653/v1/n18-1108. S2CID 4460159.
Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). "Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 240–248. arXiv:1808.08079. Bibcode:2018arXiv180808079G. doi:10.18653/v1/w18-5426. S2CID 52090220.
Zhang, Kelly; Bowman, Samuel (2018). "Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 359–361. doi:10.18653/v1/w18-5448.
Dai, Andrew; Le, Quoc (4 November 2015). "Semi-supervised Sequence Learning". arXiv:1511.01432 [cs.LG].
Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer (15 February 2018). "Deep contextualized word representations". arXiv:1802.05365v2 [cs.CL].
Howard, Jeremy; Ruder, Sebastian (18 January 2018). "Universal Language Model Fine-tuning for Text Classification". arXiv:1801.06146v5 [cs.CL].
Nayak, Pandu (25 October 2019). "Understanding searches better than ever before". Google Blog. Retrieved 10 December 2019.
Montti, Roger (10 December 2019). "Google's BERT Rolls Out Worldwide". Search Engine Journal. Search Engine Journal. Retrieved 10 December 2019.
"Google: BERT now used on almost every English query". Search Engine Land. 2020-10-15. Retrieved 2020-11-24.
"Best Paper Awards". NAACL. 2019. Retrieved Mar 28, 2020.

External links

Official GitHub repository

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[2] "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-11-27.

[3] "Understanding searches better than ever before". Google. 2019-10-25. Retrieved 2019-11-27.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].

[5] Annamoradnejad, Issa (2020-04-27). "ColBERT: Using BERT Sentence Embedding for Humor Detection". arXiv:2004.12765 [cs.CL].

[:1-6] Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (November 2019). "Revealing the Dark Secrets of BERT". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 4364–4373. doi:10.18653/v1/D19-1445. S2CID 201645145.

[:2-7] Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (2019). "What Does BERT Look at? An Analysis of BERT's Attention". Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 276–286. doi:10.18653/v1/w19-4828.

[8] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan (2018). "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 284–294. arXiv:1805.04623. Bibcode:2018arXiv180504623K. doi:10.18653/v1/p18-1027. S2CID 21700944.

[9] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco (2018). "Colorless Green Recurrent Networks Dream Hierarchically". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics: 1195–1205. arXiv:1803.11138. Bibcode:2018arXiv180311138G. doi:10.18653/v1/n18-1108. S2CID 4460159.

[10] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem (2018). "Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 240–248. arXiv:1808.08079. Bibcode:2018arXiv180808079G. doi:10.18653/v1/w18-5426. S2CID 52090220.

[11] Zhang, Kelly; Bowman, Samuel (2018). "Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis". Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA, USA: Association for Computational Linguistics: 359–361. doi:10.18653/v1/w18-5448.

[12] Dai, Andrew; Le, Quoc (4 November 2015). "Semi-supervised Sequence Learning". arXiv:1511.01432 [cs.LG].

[13] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer (15 February 2018). "Deep contextualized word representations". arXiv:1802.05365v2 [cs.CL].

[14] Howard, Jeremy; Ruder, Sebastian (18 January 2018). "Universal Language Model Fine-tuning for Text Classification". arXiv:1801.06146v5 [cs.CL].

[15] Nayak, Pandu (25 October 2019). "Understanding searches better than ever before". Google Blog. Retrieved 10 December 2019.

[16] Montti, Roger (10 December 2019). "Google's BERT Rolls Out Worldwide". Search Engine Journal. Search Engine Journal. Retrieved 10 December 2019.

[17] "Google: BERT now used on almost every English query". Search Engine Land. 2020-10-15. Retrieved 2020-11-24.

[18] "Best Paper Awards". NAACL. 2019. Retrieved Mar 28, 2020.

Natural language processing
General terms	AI-complete Bag-of-words n-gram Bigram Trigram Natural language understanding Speech corpus Stopwords Text corpus
Text analysis	Collocation extraction Concept mining Compound term processing Coreference resolution Lemmatisation Named-entity recognition Ontology learning Parsing Part-of-speech tagging Semantic similarity Sentiment analysis Stemming Terminology extraction Text chunking Text segmentation Sentence segmentation Word segmentation Textual entailment Truecasing Word-sense disambiguation
Automatic summarization	Multi-document summarization Sentence extraction Text simplification
Machine translation	Computer-assisted Example-based Rule-based Neural
Automatic identification and data capture	Speech recognition Speech segmentation Speech synthesis Natural language generation Optical character recognition
Topic model	Latent Dirichlet allocation Latent semantic analysis Pachinko allocation
Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing
Natural language user interface	Chatbot Interactive fiction Question answering Virtual assistant Voice user interface