Close Menu
    Trending
    • Reddit Sues AI Startup Anthropic Over Alleged AI Training
    • The Journey from Jupyter to Programmer: A Quick-Start Guide
    • Should You Switch from Scikit-learn to PyTorch for GPU-Accelerated Machine Learning? | by ThamizhElango Natarajan | Jun, 2025
    • Before You Invest, Take These Steps to Build a Strategy That Works
    • 📚 ScholarMate: An AI-Powered Learning Companion for Academic Documents | by ARNAV GOEL | Jun, 2025
    • Redesigning Customer Interactions: Human-AI Collaboration with Agentic AI
    • Want to Monetize Your Hobby? Here’s What You Need to Do.
    • Hopfield Neural Network. The main takeaway of this paper is a… | by bhagya | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»Weight Initializations: Never It For Granteed | by Ashwathsreeram | Apr, 2025
    Machine Learning

    Weight Initializations: Never It For Granteed | by Ashwathsreeram | Apr, 2025

    FinanceStarGateBy FinanceStarGateApril 19, 2025No Comments2 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    To grasp the connection between Weight Initialization and the Activation Operate, allow us to take an instance which offers with the Vanishing Gradient Downside.

    We have now a single layer neural community with a Tanh activation perform because the activation utilized on the finish. Now, ideally you’d normally have one other linear layer to foretell your steady worth that you’ll use as logits for classification or the ultimate prediction worth for regression; however for the sake of simplicity, allow us to keep on with this.

    Now, the equation type of the arrange is as follows:

    Equation 1: Single Layer Community with Tanh Activation

    Now, after we do the spinoff of the loss perform with respect to m, which is the burden of the one layer, we get the next by the chain rule:

    Equation 2: Chain Rule

    The primary time period of the chain rule is the spinoff of the Loss perform with respect to the activation perform; the second time period is the spinoff of the activation perform with respect to the layer output; and the third time period is the spinoff of the layer output with respect to the weights of the layer. Now, a very powerful time period it’s important to give attention to is the center one, and let me clarify why.

    If our loss perform is Imply Sq. Error, our first time period will look one thing like this:

    Equation 3: Chain Rule Time period 1

    Onto our second time period:

    The purpose to notice is the worth of tanh in our spinoff. In line with the chain rule — proven in equation 2 — all of the derivates are multiplied; which implies that if the worth of tanh near 1 or -1, the spinoff can develop into 0. When this occurs, we get what is named Vanishing Gradients.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleYour Clients Are Using AI to Replace You — Do These 3 Things Before They Do
    Next Article Founders Are Missing This One Investment — But It Could Be the Most Profitable One You Make
    FinanceStarGate

    Related Posts

    Machine Learning

    Should You Switch from Scikit-learn to PyTorch for GPU-Accelerated Machine Learning? | by ThamizhElango Natarajan | Jun, 2025

    June 5, 2025
    Machine Learning

    📚 ScholarMate: An AI-Powered Learning Companion for Academic Documents | by ARNAV GOEL | Jun, 2025

    June 4, 2025
    Machine Learning

    Hopfield Neural Network. The main takeaway of this paper is a… | by bhagya | Jun, 2025

    June 4, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    The Urgent Need for Intrinsic Alignment Technologies for Responsible Agentic AI

    March 5, 2025

    The Difference between Duplicate and Reference in Power Query

    May 3, 2025

    Method of Moments Estimation with Python Code

    February 13, 2025

    Grammar as an Injectable: A Trojan Horse to NLP

    June 2, 2025

    If You’re Using ChatGPT This Way, You’re Doing It Wrong

    April 18, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    DeepSeek-V3: Pushing the Boundaries of Efficient Large Language Models

    February 11, 2025

    Capital gains tax break for investing in Canada makes sense

    March 31, 2025

    Running MLflow Projects on Azure Databricks | by Invisible Guru Jii | Mar, 2025

    March 19, 2025
    Our Picks

    The Easy Way to Make Managing Your Rental Property Stress Free is Just $39

    February 1, 2025

    Agentic AI 102: Guardrails and Agent Evaluation

    May 17, 2025

    How to Prepare for the Databricks Certified Generative AI Engineer Associate Certification | by MyExamCloud | Feb, 2025

    February 17, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.