llm.engine.accumulation
Utilities for easy gradient accumulation training.
GradientAccumulationOptimizer
¶
GradientAccumulationOptimizer(
optimizer: BaseOptimizer,
model: Module,
accumulation_steps: int,
)
Bases: BaseOptimizer
Optimizer wrapper for enabling gradient accumulation.
This wrapper will skip calls to
BaseOptimizer.step()
until
accumulation_steps
forward/backward passes have been performed.
Parameters:
-
optimizer
(BaseOptimizer
) –Optimizer to wrap.
-
model
(Module
) –Model being optimized.
-
accumulation_steps
(int
) –Number of iterations between optimization steps.
Source code in llm/engine/accumulation.py
accumulation_boundary
¶
accumulation_boundary() -> bool
Return if the current step is an accumulation boundary.
I.e., the last call to
step()
resulted in an optimization step and no accumulation for the next step
has started.
Source code in llm/engine/accumulation.py
backward
¶
backward(loss: Tensor) -> None
Perform a backward pass.
Note
If model
is a
DistributedDataParallel
instance, backward passes will be performed with
no_sync()
during gradient accumulation steps.
Parameters:
-
loss
(Tensor
) –Loss to compute gradients with respect to.
Source code in llm/engine/accumulation.py
step
¶
Perform an optimization step.
This method is a no-op unless accumulation_steps
have occurred.
Source code in llm/engine/accumulation.py
zero_grad
¶
GradientAccumulationLRScheduler
¶
GradientAccumulationLRScheduler(
scheduler: _LRScheduler, accumulation_steps: int
)
Bases: _LRScheduler
LR scheduler wrapper that accounts for gradient accumulation.
This wrapper allows you to call scheduler.step()
after every
forward/backward pass and will correctly skip the call if
it happens during a gradient accumulation period.
Parameters:
-
scheduler
(_LRScheduler
) –LR scheduler to wrap.
-
accumulation_steps
(int
) –Number of iterations between optimization steps.
Source code in llm/engine/accumulation.py
initialize
¶
initialize(
model: Module,
optimizer: BaseOptimizer,
scheduler: _LRScheduler,
accumulation_steps: int = 1,
) -> tuple[
GradientAccumulationOptimizer,
GradientAccumulationLRScheduler,
]
Initialize gradient accumulation training.
Parameters:
-
model
(Module
) –Model being optimized.
-
optimizer
(BaseOptimizer
) –Optimizer to wrap.
-
scheduler
(_LRScheduler
) –LR scheduler to wrap.
-
accumulation_steps
(int
, default:1
) –Number of iterations between optimization steps.
Returns:
-
tuple[GradientAccumulationOptimizer, GradientAccumulationLRScheduler]
–The wrapped optimizer and LR scheduler.