Learning Rate
A hyperparameter that controls how much model weights adjust in each gradient update. Too high causes unstable training; too low makes it slow. Schedules typically start higher and decrease over the course of training.
A hyperparameter that controls how much model weights adjust in each gradient update. Too high causes unstable training; too low makes it slow. Schedules typically start higher and decrease over the course of training.