Masked Language Model
A model trained by randomly hiding some tokens in the input and predicting them from surrounding context. BERT is the most well-known example. This bidirectional training excels at understanding tasks like classification and entity recognition.