This course demystifies convolutional neural network architectures using PyTorch for image classification and object detection.

Image Classification and Object Detection using CNNs-01.png

pytorch.tar.gz

Basics_Of_CNN-ex1

Basics_Of_CNN-ex2

Basics_Of_CNN-vgg16

Basics_Of_CNN-inceptionv1

Basics_Of_CNN-residuals

Basics_Of_CNN-depthwise

Train-data

Train-training

Train-trainingFT

Train-transfer

deploy-onnx

deploy-openvino

deploy-comparison

prepare_dataset_openimages

train_yolov7

inference_yolov7

onnx_yolov7

Image classification and object detection have gained widespread use in recent years. Content categorization and monitoring, disease diagnosis from medical images, identifying terrain in satellite images, and detecting road elements for self-driving cars are classification problems at their core. PyTorch is a popular framework for these tasks—offering a useful mix of user-friendliness, deep learning functionalities, customization, and optimization.

In this course, you will cover the fundamentals of classification and object detection models and apply them to actual datasets using PyTorch. You’ll learn popular architectures and how to implement and fine-tune them for better results. Finally, you’ll learn to convert models to ONNX and OpenVINO to deploy in edge devices.

By the end of this course, you will have acquired the necessary skills to be able to use PyTorch for image identification and object detection in real-world applications

Using PyTorch for Image Classification and Object Detection

# General structure

The general structure of AlexNet is as follows:

AlexNet is the image classification architecture that won the LSVRC competition in 2012.

* The AlexNet contains eight trainable layers: five convolutional in the middle and three fully connected at the end. 

* 3 maximum pooling layers spread between convolutional layers.

* [ReLU activation function](https://www.educative.io/answers/what-is-a-relu-layer) is used after each layer except for the last one. 

* For the last layer, a softmax activation function is used to obtain predictions as probabilities.

* A dropout mechanism is used with a rate = 0.5.

* To initialize the weights, a zero-mean [gaussian distribution](https://www.educative.io/answers/what-is-gaussian-distribution) is used (also called normal distribution) using standard deviation = 0.01. 

* The biases are initialized with a constant value of 1.

* The learning rate is initialized with 0.01 and divided by 10 every time the validation error rate stops improving. 

* Stochastic gradient descent with momentum is used with momentum = 0.9 and batch size = 128.


* L2 regularization is used.

* It’s a model created for 227x227 RGB images for 1000 classes of the ImageNet dataset. It contains ~60 million parameters. A simple view of the architecture is as follows:



Learn the fundamentals of the AlexNet image classification architecture.



AlexNet (2012)

Before We Start

Basics of Convolutional Neural Networks

Popular Neural Network Architectures for Image Classification

Using PyTorch for Image Classification

Model Deployment

Basics of Object Detection

Two-Stage Object Detection Architectures

One-Stage Object Detection Architectures

YOLOv7 Model Train and Inference on Edge

Conclusion

Appendix

AlexNet (2012)

General structure