Why Are Convolutional Neural Networks Great For Images?

Universal Approximation Theorem states {that a} neural community with a single hidden layer and a nonlinear activation perform can approximate any steady perform.

Sensible points apart, such that the variety of neurons on this hidden layer would develop enormously giant, we don’t want different community architectures. A easy feed-forward neural community may do the trick.

It’s difficult to estimate what number of community architectures have been developed.

Whenever you open the favored AI mannequin platform Hugging Face immediately, you’ll discover a couple of million pretrained fashions. Relying on the duty, you’ll use completely different architectures, for instance transformers for pure language processing and convolutional networks for picture classification.

So, why can we want so many Neural Network architectures?

On this put up, I would like supply a solution to this query from a physics perspective. It’s the construction within the information that evokes novel neural community architectures.

Symmetry and invariance

Physicists love symmetry. The elemental legal guidelines of physics make use of symmetries, reminiscent of the truth that the movement of a particle will be described by the identical equations, no matter the place it finds itself in time and area.

Symmetry all the time implies invariance with respect to some transformation. These ice crystals are an instance of translational invariance. The smaller constructions look the identical, no matter the place they seem within the bigger context.

By Photograph by PtrQs, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=127396876

Exploiting symmetries: convolutional neural community

In the event you already know {that a} sure symmetry persists in your information, you’ll be able to exploit this reality to simplify your neural community structure.

Let’s clarify this with the instance of picture classification. The panel exhibits three scenes together with a goldfish. It could present up in any location inside the picture, however the picture ought to all the time be labeled as goldfish.

Three panels showing the same goldfish in different locations — Photos created by the creator utilizing Midjourney.

A feed-forward neural community may definitely obtain this, given enough coaching information.

This community structure requires a flattened enter picture. Weights are then assigned between every enter layer neuron (representing one pixel within the picture) and every hidden layer neuron. Additionally, weights are assigned between every neuron within the hidden and the output layer.

Together with this structure, the panel exhibits a “flattened” model of the three goldfish pictures from above. Do they nonetheless look alike to you?

Schematic explaining the flattening of an image and the resulting network architecture. — Picture created by the creator. Utilizing pictures created by the creator with Midjourney and ANN structure created with https://alexlenail.me/NN-SVG/.

By flattening the picture, now we have incurred two issues:

Photos that include an analogous object don’t look alike as soon as they’re flattened,
For prime-resolution pictures, we might want to prepare loads of weights connecting the enter layer and the hidden layer.

Convolutional networks, then again, work with kernels. Kernel sizes usually vary between 3 and seven pixels, and the kernel parameters are learnable in coaching.

The kernel is utilized like a raster to the picture. A convolutional layer can have a couple of kernel, permitting every kernel to concentrate on completely different points of the picture.

For instance, one kernel may choose up on horizontal strains within the picture, whereas one other may choose up on convex curves.

Convolutional neural networks protect the order of pixels and are nice to study localized constructions. The convolutional layers will be nested to create deep layers. Together with pooling layers, high-level options will be discovered.

The ensuing networks are significantly smaller than when you would use a fully-connected neural community. A convolutional layer solely requires kernel_size x kernel_size x n_kernel trainable parameters.

It can save you reminiscence and computational funds, all by exploiting the truth that your object could also be situated wherever inside your picture!

Extra superior deep studying architectures that exploit symmetries are Graph Neural Networks and physics-informed neural networks.

Abstract

Convolutional neural networks work nice with pictures as a result of they protect the native info in your picture. As a substitute of flattening all of the pixels, rendering the picture meaningless, kernels with learnable parameters choose up on native options.

Additional studying

Source link

How AI Agents “Talk” to Each Other

Stop Building AI Platforms | Towards Data Science

What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization

Introduction to Machine Learning. Exploring Types, Applications, and… | by Maryamansari | Feb, 2025

5 Industries Using Real-Time Data Visualization

Meta AI Lead: Humans Will Be the Boss of Superintelligent AI

AI and Automation: The Perfect Pairing for Smart Businesses

Lenovo Unveils AI Inferencing Server

Most Popular

From AI to Machine Learning, Deep Learning, and Generative AI: Evolution of Intelligent Machines | by Zeeshan Saghir | Mar, 2025

3 Questions: Visualizing research in the age of AI | MIT News

Why AI Makes Your Brand Voice More Valuable Than Ever

Our Picks

🚀 5 Powerful Open Source Projects Backed by Big Tech Companies — and Changing the World of Development | by TechTales | Jun, 2025

Top 9 Tungsten Automation (Kofax) alternatives

Smash Your Way to Success with an iSmash Rage Room Franchise

Why Are Convolutional Neural Networks Great For Images?

Symmetry and invariance

Exploiting symmetries: convolutional neural community

Abstract

Additional studying

Related Posts