llm.preprocess.download
Pretraining corpus downloader.
cli
¶
cli(
dataset: Literal["wikipedia", "bookcorpus"],
output_dir: str,
log_level: str,
rich: bool,
) -> None
Pretraining text downloader.
Pretraining corpus downloader.
cli
¶cli(
dataset: Literal["wikipedia", "bookcorpus"],
output_dir: str,
log_level: str,
rich: bool,
) -> None
Pretraining text downloader.
llm/preprocess/download.py