Where Do Loss Functions Come From? | by Yoshimasa | Mar, 2025

Photograph by Antoine Dautry on Unsplash

While you practice a machine studying mannequin, you decrease a loss perform — however have you ever ever questioned why we use those we do? Why is Imply Squared Error (MSE) so frequent in regression? Why does Cross-Entropy Loss dominate classification? Are loss features simply arbitrary decisions, or have they got deeper mathematical roots?

It seems that many loss features aren’t simply invented — they emerge naturally from chance concept. However not all of them. Some loss features defy probabilistic instinct and are purely designed for optimization.

Let’s begin with a easy instance. Suppose we’re predicting home costs utilizing a regression mannequin. The commonest strategy to measure error is the Imply Squared Error (MSE) loss:

At first look, this simply seems to be like a mathematical strategy to measure how far our predictions are from actuality. However why squared error? Why not absolute error? Why not dice error?

Chance Density Operate (PDF)

If we assume that the errors in our mannequin comply with a regular distribution:

Then the chance density perform (PDF) is:

Probability Operate

If we observe a number of impartial knowledge factors x1,x2,…,xn, then their joint chance (probability perform) is:

Since we sometimes work with log-likelihoods for simpler optimization:

Deriving the Loss Operate

Now, to show this right into a loss perform, we negate the log-likelihood (since optimizers decrease loss relatively than maximize chance):

If we assume σ^2 is fixed, the loss perform simplifies to:

which is simply Imply Squared Error (MSE).
MSE isn’t only a selection — it’s the results of assuming usually distributed errors. Because of this we implicitly assume a Gaussian distribution each time we decrease MSE.

If we don’t assume a hard and fast variance, we get a barely totally different loss perform:

This additional time period, logσ^2, implies that the optimum parameters for μ and σ^2 are discovered collectively, relatively than assuming a hard and fast variance.

If we deal with σ^2 as unknown, we transfer towards heteroscedastic fashions, which permit for various ranges of uncertainty throughout totally different predictions.

Cross-Entropy Loss

For classification issues, we frequently decrease Cross-Entropy Loss, which comes from the Bernoulli or Categorical probability perform.

For binary classification:

This arises naturally from the probability of knowledge coming from a Bernoulli distribution.

To date, we’ve seen that many loss features come up naturally from probability features. However not all of them. Some are designed for optimization effectivity, robustness, or task-specific wants.

Hinge Loss (SVMs)

Most classification loss features, like cross-entropy loss, come from a probabilistic framework. However Hinge Loss, the core loss perform in Assist Vector Machines (SVMs), is totally different.

As a substitute of modeling probability, it focuses on maximizing the margin between lessons.

If now we have labels y∈{−1,+1} and a mannequin making predictions f(x), hinge loss is:

If yf(x) ≥ 1 → No loss (appropriate classification with margin).

If yf(x) → Loss will increase linearly (incorrect or near boundary).

Source link

YouBot: Understanding YouTube Comments and Chatting Intelligently — An Engineer’s Perspective | by Sercan Teyhani | Jun, 2025

From Accidents to Actuarial Accuracy: The Role of Assumption Validation in Insurance Claim Amount Prediction Using Linear Regression | by Ved Prakash | Jun, 2025

Why You’re Still Coding AI Manually: Build a GPT-Backed API with Spring Boot in 30 Minutes | by CodeWithUs | Jun, 2025

Nvidia CEO Jensen Huang Says San Francisco Is Back Due to AI

The Future Just Landed — Are You Watching Closely, AI Techies? | by Sourabh Joshi | Jun, 2025

Study shows vision-language models can’t handle queries with negation words | MIT News

Can boosting algorithms outperform neural networks? | by Muhammad Husnain | Feb, 2025

Breaking into Data Science as an Analytics Engineer | by Amber Walker | May, 2025

Most Popular

Unveiling the Neural Mind: Tracing Step-by-Step Reasoning in Large Language Models | by Vilohit | Apr, 2025

5 AI Projects You Can Build in a Weekend (With Python) | by Abdur Rahman | May, 2025

Prompt vs Output: The Ultimate Comparison That’ll Blow Your Mind! 🚀 | by AI With Lil Bro | Apr, 2025

Our Picks

My Bear Market Investment Game Plan: Adjusting the Strategy

From a Point to L∞ | Towards Data Science

What Living in a 5-Minute City Taught Me About Building Better Businesses