Exercise Solution
Learn the in-depth solution of the Darknet-19 architecture.
We'll cover the following...
The Darknet-19 model is a deep convolutional neural network architecture introduced in YOLOv2. It consists of 19 convolutional layers and 5 max-pooling layers.
Press + to interact
import torchimport torch.nn as nnclass Darknet19(nn.Module):def __init__(self, num_classes=1000):super(Darknet19, self).__init__()# Helper function to add a convolutional blockdef conv_block(in_channels, out_channels, kernel_size, stride=1, max_pool=False):layers = [nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding=(kernel_size-1)//2, bias=False), # Convolutional layernn.BatchNorm2d(out_channels), # Batch normalizationnn.LeakyReLU(0.1, inplace=True) # Leaky ReLU activation]if max_pool:layers.append(nn.MaxPool2d(2, 2)) # Max poolingreturn nn.Sequential(*layers)# Darknet-19 architectureself.model = nn.Sequential(# Layer 1conv_block(3, 32, 3, max_pool=True),# Layer 2conv_block(32, 64, 3, max_pool=True),# Layer 3-4conv_block(64, 128, 3),conv_block(128, 64, 1),# Layer 5conv_block(64, 128, 3, max_pool=True),# Layer 6-7conv_block(128, 256, 3),conv_block(256, 128, 1),# Layer 8conv_block(128, 256, 3, max_pool=True),# Layer 9-11conv_block(256, 512, 3),conv_block(512, 256, 1),conv_block(256, 512, 3, max_pool=True),# Layer 12-14conv_block(512, 1024, 3),conv_block(1024, 512, 1),conv_block(512, 1024, 3),# Layer 15-16conv_block(1024, 512, 1),conv_block(512, 1024, 3))# Final layersself.fc = conv_block(1024, num_classes,1) # Fully connected layerself.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # Global average poolingdef forward(self, x):x = self.model(x) # Pass through Darknet-19x = self.avgpool(x) # Global average poolingx = torch.flatten(x, 1) # Flatten the tensorx = self.fc(x) # Classification layerreturn x# Create the Darknet-19 model and print its architecturemodel = Darknet19()print(model)
Explanation
The Darknet-19 architecture is a convolutional neural network designed primarily for object detection tasks. It consists of 19 convolutional layers, interspersed with max-pooling layers for spatial downsampling.
Lines 1–4: We import
torch
and ...