AlexNet: The Architecture that Revolutionized Computer Vision

Introduction

This is the first Deep Convolutional Neural Network (CNN) that revolutionized image classification tasks. This architecture won the ImageNet challenge (ILSVRC) in 2012.

AlexNet has multiple deep layers. It consists of 60 million parameters making it one of the largest Neural Networks of that time. It was originally designed for Image classification and used to process high-resolution images. AlexNet was optimized to take advantage of GPU.

What makes it good over LeNet?

https://medium.datadriveninvestor.com/lenet-in-computer-vision-a-comprehensive-study-510b7de8251e?source=post_page-----8b9cf088eed5--------------------------------

AlexNet is deeper than LeNet, containing 5 convolutional layers, whereas LeNet has only 2 Convolutional layers.
AlexNet uses a much better activation function — ReLU. This activation function is an upgrade on the sigmoid or tanh function.
LeNet is limited to grayscale images where whereas AlexNet uses images with color, i.e., RGB channels.

Architecture

https://pub.towardsai.net/an-introduction-to-cnns-understanding-the-basics-88986fb3c6d1?source=post_page-----8b9cf088eed5--------------------------------

Let's dive in,

Note: if the layer uses padding or strides, i explicitly mentioned. if i dont mention it then its default.

Input:

The input size is 227x227x3. This means it has 227 pixels of height and width and has 3 channels (RGB).

Layer 1 (Convolutional Layer 1):

Input: 227x227x3
Number of Filters: 96
Filter Size: 11x11
Strides: 4
Activation Function: ReLU
Output Size (Feature map size): 55x55x96

Layer 2(Max Pooling Layer 1):

Input size: 55x55x96
Pool size: 3x3
Stride: 2
Output Size: 27x27x96

Layer 3(Convolutional Layer 2):

Input: 27x27x96
Number of Filters: 256
Padding: 1
Filter Size: 5x5
Activation Function: ReLU
Output Size: 27x27x256

Layer 4(Max Pooling Layer 2):

Input size: 27x27x256
Pool size: 3x3
Stride: 2
Output Size: 13x13x256

Layer 5(Convolutional Layer 3):

Input: 13x13x256
Number of Filters: 384
Filter Size: 3x3
Padding: 1
Activation Function: ReLU
Output Size (Feature map size): 13x13x384

Layer 6(Convolutional Layer 4):

Input: 13x13x384
Number of Filters: 384
Filter Size: 3x3
Padding: 1
Activation Function: ReLU
Output Size (Feature map size): 13x13x384

Layer 7(Convolutional Layer 5):

Input: 13x13x384
Number of Filters: 256
Filter Size: 3x3
Padding: 1
Activation Function: ReLU
Output Size (Feature map size): 13x13x256

Layer 8(Max Pooling Layer 3):

Input size: 13x13x256
Pool size: 3x3
Stride: 2
Output Size: 6x6x256

Layer 9 (Flatten):

Input size: 6x6x256
Output size: 9216

Layer 10 (Fully Connected Layer 1):

Input size: 9216
Nodes: 4096
Activation function: ReLU
Output size: 4096

Layer 11 (Fully Connected Layer 2):

Input size: 4096
Nodes:4096
Activation function: ReLU
Output size: 4096

Layer 12 (Fully Connected Layer 3) [Output Layer]:

Input size: 4096
Nodes: 1000
Activation function: Softmax
Output size: 1000

Code

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Define the AlexNet model
def alexnet():
    model = Sequential()

    model.add(Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=(227, 227, 3)))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))

    model.add(Conv2D(256, (5, 5), padding='same' , activation='relu'))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))

    model.add(Conv2D(384, (3, 3), padding='same' ,activation='relu'))
    model.add(Conv2D(384, (3, 3), padding='same' ,activation='relu'))

    model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))

    model.add(Flatten())
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

    model.add(Dense(1000, activation='softmax'))

    return model

# Create an instance of the AlexNet model
model = alexnet()

# Print the model summary
model.summary()

Let’s Connect!
If you enjoyed this article, a few claps 👏 would mean the world and motivate me to create more!

Feel free to reach out and explore more of my work:

LinkedIn: Pranay Rishith
GitHub: pranayrishith16
Hashnode Blog: Beyond Blog
Medium: Pranay Blog

Looking forward to connecting with you! 🚀

AlexNet: The Architecture that Revolutionized Computer Vision

Propelling Artificial Intelligence Forward with AlexNet

Table of contents

Introduction

What makes it good over LeNet?

Architecture

Code