Batch Size
The number of training examples processed together in one forward and backward pass. Larger batch sizes offer more stable gradient estimates and better GPU utilization; smaller batch sizes can improve generalization. Batch size is a key hyperparameter.