Categorical Cross-Entropy Loss and Accuracy | by Mrscott

Up to now, the mannequin spits out a chance distribution so our loss perform (may also be known as Price perform) must replicate that, therefore the specific cross-entropy perform, aka Log Loss. It finds the distinction, or loss, between the precise ‘y’ and predicted distribution, ‘y-hat’ .

The standard kind for categorical cross-entropy loss:

the place:

C = variety of courses (e.g., 3 when you have crimson, blue, inexperienced)
yi = 1 if class i is the true class, 0 in any other case (from the one-hot goal vector)
pi(y-hat) = predicted chance for sophistication i (after softmax).

If our softmax output is [0.7, 0.1, 0.2], the one-hot encoding for this may be [1, 0, 0]. Now we have 0.7 because the true class 1, and the opposite two outputs could be 0 for one-hot encoding. Lets plug some numbers into the method:

(1*log(0.7) + 0 * log(0.1) + 0 * log (0.2)) = −(−0.3567) = 0.3567

With all of the craziness happening world wide proper now it’s good to know some issues haven’t modified preferred multiplying by 0 nonetheless equals 0, so we are able to merely the method to:

L=−log(0.7) = 0.3567

The log used is the pure log or base e. The upper a mannequin’s confidence in its prediction the decrease the loss, which is smart for the reason that loss is the distinction between precise vs predicated values. In the event you’re 100% assured that any quantity * 0 = 0 your loss could be 0.0. Your confidence about having the following profitable lotto ticket is slightly low (appropriately) in order that distinction could be a really giant quantity.

#Instance
print(math.log(1.0)) # 100% assured
print(math.log(0.5)) # 50% assured
print(math.log(0.000001)) # Extraordinarily low confidence

0.0
-0.6931471805599453
-13.815510557964274

This curvature ought to most likely be a bit extra excessive with a extra ‘hockey-stick’ look to the curvature however hey I’m attempting. The plot above exhibits how the cross-entropy loss 𝐿(𝑝) = −ln(𝑝) behaves because the mannequin’s predicted confidence 𝑝 (for the true class) varies from 0 to 1:

– As 𝑝→1: the loss drops towards 0, that means excessive confidence within the right class yields nearly no penalty.

– As 𝑝→0: the loss shoots towards +∞, closely penalizing predictions that assign near-zero chance to the true class. It “amplifies” the penalty on confidently improper predictions, pushing the optimizer to right them aggressively.

– Speedy lower: many of the loss change occurs for 𝑝 within the low vary (0–0.5). Gaining a bit of confidence from very low 𝑝 yields a big discount in loss.

**This** curvature is what drives gradient updates.

Recall that that is solely the primary cross via the community with randomly initialized weights, so this primary calculation may very well be off by a large margin. You compute the softmax and get one thing like [0.7,0.1,0.2], then compute the loss and again‑propagate to replace the weights. On the subsequent ahead cross, with these up to date weights, you’ll get a new output distribution — perhaps [0.2,0.1,0.7] or one thing else completely. Over many such passes (epochs), gradient descent nudges the weights in order that ultimately the community’s outputs align extra intently with the true one‑scorching targets. However we’re not entering into back-propagation simply but.

Since I discussed multiplying by 0, dividing by 0, or in our case log(0) additionally must be talked about. Regardless it’s nonetheless undefined, regardless of what some elementary faculty trainer and principal stated (sure, a trainer claimed dividing by 0 = 0). The mannequin may output 0, so we have to take care of that contingency. log(p) and p = 0, you get -∞. Additionally we don’t need 1 as an output both, so we’ll clip each ends to make the numbers shut however not equal to 0 and 1.

y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

Source link

Harnessing ChatGPT to Transform Websites Efficiently | by Artificial Intelligence + | May, 2025

Building a Streamlit App for Deepfake Audio Detection and Multi-label Defect Prediction | by Ayesha Saeed | May, 2025

عنوان: حجاب؛ واجب شرعی، ضرورت قانونی – تحلیل فقهی و حقوقی به قلم سید محسن حسینی خراسانی | by Saman sanat mobtaker | May, 2025

Unlock the Power of AI in Intelligent Operations

How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

How to Succeed as a People-Driven Leader

AGI is suddenly a dinner table topic

One Turn After Another | Towards Data Science

Most Popular

K-Nearest Neighbor (KNN) — The Lazy Learning Algorithm | by Bhakti K | Feb, 2025

Generate 1,000+ Marketing Images This Month: 1min.AI Now $79.97 for Life

Understanding Model Calibration: A Gentle Introduction & Visual Exploration

Our Picks

Day 4: React Controlled Components: Master User Input Handling in Next.js (JSX Guide)🚀 | by Lokesh Prajapati | Apr, 2025

The Million-Dollar Mindset of Personal Finance Enthusiasts

Who Is Liang Wenfeng, the Founder of AI Disruptor DeepSeek?

Categorical Cross-Entropy Loss and Accuracy | by Mrscott | May, 2025

Related Posts