llm.loss
Training loss functions.
BertPretrainingCriterion
¶
Bases: Module
BERT pretraining loss.
Computes the sum of the cross entropy losses of the masked language model and (optionally) next sentence prediction tasks.
Parameters:
-
vocab_size(int) –Size of the pretraining vocabulary.
-
ignore_index(int, default:-100) –Value to ignore when computing cross entropy loss. Defaults to -100 which is used by the provided BERT datasets as the value in
masked_lm_labelswhich are not masked.
Source code in llm/loss.py
forward
¶
forward(
prediction_scores: FloatTensor,
masked_lm_labels: LongTensor,
seq_relationship_score: FloatTensor | None = None,
next_sentence_labels: LongTensor | None = None,
) -> float
Compute the pretraining loss.
Parameters:
-
prediction_scores(FloatTensor) –Masked token predictions.
-
masked_lm_labels(LongTensor) –True masked token labels.
-
seq_relationship_score(FloatTensor | None, default:None) –Predicted sequence relationship score.
-
next_sentence_labels(LongTensor | None, default:None) –True next sentence label.
Returns:
-
float–Computed loss.