llm.loss
Training loss functions.
BertPretrainingCriterion
¶
Bases: Module
BERT pretraining loss.
Computes the sum of the cross entropy losses of the masked language model and (optionally) next sentence prediction tasks.
Parameters:
-
vocab_size
(int
) –Size of the pretraining vocabulary.
-
ignore_index
(int
, default:-100
) –Value to ignore when computing cross entropy loss. Defaults to -100 which is used by the provided BERT datasets as the value in
masked_lm_labels
which are not masked.
Source code in llm/loss.py
forward
¶
forward(
prediction_scores: FloatTensor,
masked_lm_labels: LongTensor,
seq_relationship_score: FloatTensor | None = None,
next_sentence_labels: LongTensor | None = None,
) -> float
Compute the pretraining loss.
Parameters:
-
prediction_scores
(FloatTensor
) –Masked token predictions.
-
masked_lm_labels
(LongTensor
) –True masked token labels.
-
seq_relationship_score
(FloatTensor | None
, default:None
) –Predicted sequence relationship score.
-
next_sentence_labels
(LongTensor | None
, default:None
) –True next sentence label.
Returns:
-
float
–Computed loss.