llm.engine.amp
Utilities for easy automatic mixed precision training.
AMPCriterion
¶
AMPModel
¶
AMPOptimizer
¶
AMPOptimizer(
model: Module,
optimizer: Optimizer,
scaler: GradScaler,
max_norm: float | None = None,
)
Bases: BaseOptimizer
Wrap an optimizer for AMP training.
Parameters:
-
model
(Module
) –Model being optimized.
-
optimizer
(Optimizer
) –Optimizer to wrap.
-
scaler
(GradScaler
) –Gradient scaler.
-
max_norm
(float | None
, default:None
) –Optionally clip gradient norm.
Source code in llm/engine/amp.py
zero_grad
¶
state_dict
¶
Dictionary containing references to the whole state of the module.
Includes the state of the grad_scaler
.
Source code in llm/engine/amp.py
load_state_dict
¶
Copy the state into this module.
Source code in llm/engine/amp.py
step
¶
Perform an optimization using the gradient scaler.
Source code in llm/engine/amp.py
initialize
¶
initialize(
model: Module,
optimizer: Optimizer,
criterion: Module,
dtype: dtype = torch.float16,
max_norm: float | None = None,
**kwargs: Any
) -> tuple[AMPModel, AMPOptimizer, AMPCriterion]
Initialize AMP training.
Parameters:
-
model
(Module
) –Model being optimized.
-
optimizer
(Optimizer
) –Optimizer to wrap.
-
criterion
(Module
) –Loss function to wrap.
-
dtype
(dtype
, default:float16
) –Data type to perform mixed precision in. Typically
torch.float16
ortorch.bfloat16
. -
max_norm
(float | None
, default:None
) –Optionally clip gradient norm.
-
kwargs
(Any
, default:{}
) –Additional keyword arguments to pass to the
GradScaler
.
Returns:
-
tuple[AMPModel, AMPOptimizer, AMPCriterion]
–A tuple of the wrapped model, optimizer, and loss.