Exercise Solution

Learn the in-depth solution of the Darknet-19 architecture.

We'll cover the following...

The Darknet-19 model is a deep convolutional neural network architecture introduced in YOLOv2. It consists of 19 convolutional layers and 5 max-pooling layers.

Press + to interact
import torch
import torch.nn as nn
class Darknet19(nn.Module):
def __init__(self, num_classes=1000):
super(Darknet19, self).__init__()
# Helper function to add a convolutional block
def conv_block(in_channels, out_channels, kernel_size, stride=1, max_pool=False):
layers = [
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding=(kernel_size-1)//2, bias=False), # Convolutional layer
nn.BatchNorm2d(out_channels), # Batch normalization
nn.LeakyReLU(0.1, inplace=True) # Leaky ReLU activation
]
if max_pool:
layers.append(nn.MaxPool2d(2, 2)) # Max pooling
return nn.Sequential(*layers)
# Darknet-19 architecture
self.model = nn.Sequential(
# Layer 1
conv_block(3, 32, 3, max_pool=True),
# Layer 2
conv_block(32, 64, 3, max_pool=True),
# Layer 3-4
conv_block(64, 128, 3),
conv_block(128, 64, 1),
# Layer 5
conv_block(64, 128, 3, max_pool=True),
# Layer 6-7
conv_block(128, 256, 3),
conv_block(256, 128, 1),
# Layer 8
conv_block(128, 256, 3, max_pool=True),
# Layer 9-11
conv_block(256, 512, 3),
conv_block(512, 256, 1),
conv_block(256, 512, 3, max_pool=True),
# Layer 12-14
conv_block(512, 1024, 3),
conv_block(1024, 512, 1),
conv_block(512, 1024, 3),
# Layer 15-16
conv_block(1024, 512, 1),
conv_block(512, 1024, 3)
)
# Final layers
self.fc = conv_block(1024, num_classes,1) # Fully connected layer
self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # Global average pooling
def forward(self, x):
x = self.model(x) # Pass through Darknet-19
x = self.avgpool(x) # Global average pooling
x = torch.flatten(x, 1) # Flatten the tensor
x = self.fc(x) # Classification layer
return x
# Create the Darknet-19 model and print its architecture
model = Darknet19()
print(model)

Explanation

The Darknet-19 architecture is a convolutional neural network designed primarily for object detection tasks. It consists of 19 convolutional layers, interspersed with max-pooling layers for spatial downsampling.

  • Lines 1–4: We import torch and ...