llm.trainers.bert.data
NvidiaBertDatasetConfig ¶
RobertaDatasetConfig ¶
get_dataloader() ¶
get_dataloader(
dataset: DistributedShardedDataset[Sample],
sampler: torch.utils.data.Sampler[int],
batch_size: int,
) -> torch.utils.data.DataLoader[Sample]
Create a dataloader from a sharded dataset.
Source code in llm/trainers/bert/data.py
get_dataset() ¶
get_dataset(
config: NvidiaBertDatasetConfig | RobertaDatasetConfig,
) -> DistributedShardedDataset[Sample]
Load a sharded BERT pretraining dataset.