llm.utils
HParamT
module-attribute
¶
Supported Hyperparameter types (i.e., JSON types).
create_summary_writer
¶
create_summary_writer(
tensorboard_dir: str,
hparam_dict: dict[str, HParamT] | None = None,
metrics: list[str] | None = None,
**writer_kwargs: Any
) -> SummaryWriter
Create a SummaryWriter instance for the run annotated with hyperparams.
https://github.com/pytorch/pytorch/issues/37738#issuecomment-1124497827
Parameters:
-
tensorboard_dir
(str
) –TensorBoard run directory.
-
hparam_dict
(dict[str, HParamT] | None
, default:None
) –Optional hyperparam dictionary to log alongside metrics.
-
metrics
(list[str] | None
, default:None
) –Optional list of metric tags that will be used with
writer.add_scalar()
(e.g.,['train/loss', 'train/lr']
). Must be provided ifhparam_dict
is provided. -
writer_kwargs
(Any
, default:{}
) –Additional keyword arguments to pass to
SummaryWriter
.
Returns:
-
SummaryWriter
–Summary writer instance.
Source code in llm/utils.py
get_filepaths
¶
get_filepaths(
directory: Path | str,
extensions: list[str] | None = None,
recursive: bool = False,
) -> list[str]
Get list of filepaths in directory.
Note
Only files (not sub-directories will be returned. Though
sub-directories will be recursed into if recursive=True
.
Parameters:
-
directory
(Path | str
) –Pathlike object with the directory to search.
-
extensions
(list[str] | None
, default:None
) –Pptionally only return files that match these extensions. Each extension should include the dot. E.g.,
['.pdf', '.txt']
. Match is case sensitive. -
recursive
(bool
, default:False
) –Recursively search sub-directories.
Returns:
Source code in llm/utils.py
gradient_accumulation_steps
¶
gradient_accumulation_steps(
global_batch_size: int,
local_batch_size: int,
world_size: int,
) -> int
Compute the gradient accumulation steps from the configuration.
Parameters:
-
global_batch_size
(int
) –Target global/effective batch size.
-
local_batch_size
(int
) –Per rank batch size.
-
world_size
(int
) –Number of ranks.
Returns:
-
int
–Gradient accumulation steps needed to achieve the
global_batch_size
.
Raises:
-
ValueError
–If the resulting gradient accumulation steps would be fractional.
Source code in llm/utils.py
init_logging
¶
init_logging(
level: int | str = logging.INFO,
logfile: Path | str | None = None,
rich: bool = False,
distributed: bool = False,
) -> None
Configure global logging.
Parameters:
-
level
(int | str
, default:INFO
) –Default logging level.
-
logfile
(Path | str | None
, default:None
) –Optional path to write logs to.
-
rich
(bool
, default:False
) –Use rich for pretty stdout logging.
-
distributed
(bool
, default:False
) –Configure distributed formatters and filters.
Source code in llm/utils.py
log_step
¶
log_step(
logger: Logger,
step: int,
*,
fmt_str: str | None = None,
log_level: int = logging.INFO,
ranks: Iterable[int] = (0,),
skip_tensorboard: Iterable[str] = (),
tensorboard_prefix: str = "train",
writer: SummaryWriter | None = None,
**kwargs: Any
) -> None
Log a training step.
Parameters:
-
logger
(Logger
) –Logger instance to log to.
-
step
(int
) –Training step.
-
fmt_str
(str | None
, default:None
) –Format string used to format parameters for logging.
-
log_level
(int
, default:INFO
) –Level to log the parameters at.
-
ranks
(Iterable[int]
, default:(0,)
) –Ranks to log on (default to rank 0 only).
-
skip_tensorboard
(Iterable[str]
, default:()
) –List of parameter names to skip logging to TensorBoard.
-
tensorboard_prefix
(str
, default:'train'
) –Prefix for TensorBoard parameters.
-
writer
(SummaryWriter | None
, default:None
) –TensorBoard summary writer.
-
kwargs
(Any
, default:{}
) –Additional keyword arguments to log.