llm.trainers.bert.data
NvidiaBertDatasetConfig
¶
RobertaDatasetConfig
¶
get_dataloader
¶
get_dataloader(
dataset: DistributedShardedDataset[Sample],
sampler: Sampler[int],
batch_size: int,
) -> DataLoader[Sample]
Create a dataloader from a sharded dataset.
Source code in llm/trainers/bert/data.py
get_dataset
¶
get_dataset(
config: NvidiaBertDatasetConfig | RobertaDatasetConfig,
) -> DistributedShardedDataset[Sample]
Load a sharded BERT pretraining dataset.