Max Pooling

To limit the number of dimensions sent to the next layer, it is often necessary to squeeze the data a bit.

Example:

Assume you have 10,000 tomatos delivered in a 100x100-slot grid-like container from your “tomato-processing machine #1”.
Further, assume you also have a “tomato-processing machine #2”, taking its input from the first machine. But machine number 2 requires the input tomatoes to be supplied in a 50x50-slot grid. Not a 100x100 grid.
Now you have got a problem; you’ll have to press 2x2=4 tomatoes into each slot:

<aside> 💡

Note: I have no intention of drawing 100x100+50x50 tomatos, so the picture instead shows: 4x4 → 2x2, but the idea is exactly the same.

</aside>
That’s going to be slightly messy, and some finer details of each tomato may get lost in the process - but all 4 tomatos will contribute to the tomato-like mush in the corresponding 50x50 slot.
The mushiness is simply the “price” we have to pay for insisting on having a machine that expects a 50x50-grid as input. And there may actually be benefits of the mushiness as well.

Walk-through of dimensions, conv2d→conv2d

This is an explanation of how to understand the exact tensor dimensions involved when data flows through two consecutive convolutional conv2d layers. Let’s use the MNIST dataset dimensions in this example. It consists of grayscale (i.e. 1-channel) images, 28x28 pixels each. For simplicity, here are some limitations of this example:

No batching: The batch-size dimension (usually the first one) is omitted.
Activation function, e.g. ReLU: Operates element-wise. Doesn’t change the tensor size.
MaxPooling: Excluded. The below example focuses only on convolution layer dimensions.

Initial Input Dimension

Input Image Dimension: 1×28×28 (1 channel, 28 width, 28 height)

First Convolutional Layer

Number of Kernels: 32
Kernel Size: 3×3
Padding: To maintain the original size of the image (28x28), padding is typically set to 1. This is calculated as follows: for a kernel size of 3x3, the padding needed to preserve the input dimensions is (3−1)/2=1.

After applying this layer: