Training Arguments
Learn about the available training arguments in the PyTorch Image Model.
There are over 100 arguments available in the training script of the PyTorch Image Model.
These parameters can be organized into the following categories:
- Dataset
- Model
- Optimizer
- Learning rate
- Augmentation and regularization
- Batch normalization
- Model exponential moving average
- Miscellaneous
Dataset
The training script accepts the following arguments that are related to the datasets:
data_dir
: This is the path to datasets.dataset
: This is the dataset type. If it’s not specified, it defaults to ImageFolder/ImageTar.train-split
: This specifies whether to split the datasets into train segments.val-split
: This specifies whether to split the datasets into validation segments.dataset-download
: This allows us to download datasets for supported torch and TFDS datasets.class-map
: This is theclass-to-idx
mapping file path.
Model
We can specify the following arguments to configure our model:
model
: This is the name of the model to train (default isresnet50
).pretrained
: This specifies whether to start with a pretrained version of the specified network if available.initial-checkpoint
: We use this checkpoint to initialize the model.resume
: This specifies whether to resume the full model and optimizer state from a checkpoint.no-resume-opt
: This prevents the resumption of the optimizer state when resuming model.num-classes
: This is the total number of label classes.gp
: This is the type of global pool. It acceptsfast
,avg
,max
,avgmax
, oravgmaxc
.img-size
: This is the patch size of the image.input-size
: This is the dimensions of the input image (D H W). For example, we can use--input-size 3 224 224
for input of 224 x 224 RGB images.crop-pct
: This is the center crop percentage of the input image (used only for validation).mean
: This is the mean pixel value of datasets (it will override the default mean).std
: This is the standard deviation of datasets (it will override the default standard deviation).interpolation
: This is the type of theresize
interpolation.b
: This is the training input batch size (default is 128).vb
: This is the validation input batch size (default isNone
).
Optimizers
The arguments for optimizer are as follows:
opt
: This is the optimizer (default issgd
).opt-eps
: This is the epsilon of the optimizer.opt-betas
: This is the beta of the optimizer.momentum
: This is the momentum of the optimizer (default is 0.9).weight-decay
: This is the weight decay (default is 2e-5).clip-grad
: This is the clip gradient norm (default isNone
, which indicates that no clipping will occur).clip-mode
: This is the gradient clipping mode. It acceptsnorm
,value
oragc
.
Learning rate
The following arguments are useful to configure the learning rate of our model:
sched
: This is the learning rate scheduler (default iscosine
).lr
: This is the learning rate (default is 0.05).lr-noise
: This is the learning rate noise on/off epochs percentages.lr-noise-pct
: This is the learning rate’s noise limit percentages (default is 0.67).lr-noise-std
: This is the learning rate’s noise standard deviation (default is 1.0).lr-cycle-mul
: This is the learning rate cycle length multiplier (default is 1.0).lr-cycle-decay
: This is the amount to decay each learning rate cycle by (default is 0.5).lr-cycle-limit
: This is the learning rate cycle limit. The default value is 1.lr-k-decay
: This is the learning rate k-decay for cosine and poly (default is 1.0).warmup-lr
: This is the learning rate warm up (default is 0.0001).min-lr
: This is the lower learning rate bound for cyclic schedulers that hit 0 (default is 1e-6).epochs
: This is the number of epochs to train (default is 300).epoch-repeats
: This is the epoch repeat multiplier (number of times to repeat datasets epoch per trained epoch).start-epoch
: This configures the epoch number manually. It’s useful on restarts.decay-epochs
: This is the epoch interval to decay the learning rate.warmup-epochs
: This is the number of epochs to warm up the learning rate (applicable only if the scheduler supports it).cooldown-epochs
: This is the number of epochs to cool down the learning rate atmin_lr
(after a cyclic schedule has ended).patience-epochs
: This is the patience epochs for the Plateau learning rate scheduler (default is 10).decay-rate
: This is the decay rate of the learning rate (default is 0.1).
Augmentation and regularization
The training script also accepts the following arguments:
no-aug
: This specifies whether to disable all training augmentations.scale
: This is the random resize scale (default is from 0.08 to 1.0).ratio
: This is the random resize aspect ratio (default is [0.75, 1.33]).hflip
: This is the horizontal flip training augmentation probability (default is 0.5).vflip
: This is the vertical flip training augmentation probability.color-jitter
: This is the color jitter factor (default is 0.4).aa
: This enables theAutoAugment
policy. It acceptsv0
ororiginal
.aug-repeats
: This is the number of augmentation repetitions (default is 0). This is only for distributed training.aug-splits
: This is the number of augmentation splits (default is 0, and the input value must be 0 or greater or equal to 2).jsd-loss
: This enables the Jensen-Shannon Divergence and cross-entropy loss. We can use it with--aug-splits
.bce-loss
: This enables the BCE loss. We can complement it withmixup
orcutmix
augmentations.bce-target-thresh
: This is the binarization threshold for softened BCE targets.reprob
: This is the Random Erase probability (default is 0).remode
: This is the Random Erase mode (pixel
is the default value). recount: This is the Random Erase count (default is 1).resplit
: This specifies whether to erase the first (clean) augmentation split at random.mixup
: This is theMixup
alpha (default is 0, andMixup
will be enabled if greater than 0).cutmix
: This is theCutMix
alpha (default is 0, andCutMix
will be enabled if greater than 0).cutmix-minmax
: the CutMix minimum and maximum ratio (the default isNone
, and if set, it overrides alpha and enables CutMix).mixup-prob
: This is the probability of performing MixUp or CutMix augmentations when either or both are enabled.mixup-switch-prob
: This is the probability of switching toCutMix
when bothMixup
andCutMix
are enabled.mixup-mode
: This is theMixup
orCutMix
method. It acceptsbatch
,pair
, orelem
.mixup-off-epoch
: This specifies whetherMixup
should be disabled afterN
epochs (default is 0).smoothing
: This is the label smoothing (default is 0.1).train-interpolation
: This is the training interpolation mode. It acceptsrandom
(the default),bilinear
, orbicubic
.drop
: This is the dropout rate (default is 0).drop-path
: This is the drop path rate.drop-block
: This is the drop block rate.
Batch normalization
Currently, the following batch normalization arguments only work with gen_efficientnet
based models:
bn-momentum
: This is the batch normalization momentum.bn-eps
: This is the batch normalization epsilon.sync-bn
: This enables synchronized batch normalization with NVIDIA Apex or Torch.dist-bn
: This is the method to distribute batch normalization stats between nodes after each epoch. It acceptsbroadcast
,reduce
(the default), or an empty string.split-bn
: This enables separate batch normalization layers per augmentation split.
Model exponential moving average
The arguments for an exponential moving average are as follows:
model-ema
: This specifies whether to track the moving average of model weights.model-ema-force-cpu
: This forcefully tracks the exponential moving average on the CPU (only for rank = 0 nodes).model-ema-decay
: This is the decay factor for model weights. It’s the moving average (default is 0.9998).
Miscellaneous
There are several arguments available under miscellaneous. The most useful arguments are as follows:
seed
: This is the random seed (default is 42).checkpoint-hist
: This is the number of checkpoints to keep (default is 10).amp
: This specifies whether to use NVIDIA Apex AMP or Native AMP for mixed-precision training.apex-amp
: This uses NVIDIA Apex AMP mixed-precision.native-amp
: This uses Native Torch AMP mixed-precision.output
: This is the path to the output folder.torchscript
: This specifies whether to convert modeltorchscript
for inference.
Get hands-on with 1300+ tech skills courses.