Close Menu
    Trending
    • High Paying, Six Figure Jobs For Recent Graduates: Report
    • What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization
    • YouBot: Understanding YouTube Comments and Chatting Intelligently — An Engineer’s Perspective | by Sercan Teyhani | Jun, 2025
    • Inspiring Quotes From Brian Wilson of The Beach Boys
    • AI Is Not a Black Box (Relatively Speaking)
    • From Accidents to Actuarial Accuracy: The Role of Assumption Validation in Insurance Claim Amount Prediction Using Linear Regression | by Ved Prakash | Jun, 2025
    • I Wish Every Entrepreneur Had a Dad Like Mine — Here’s Why
    • Why You’re Still Coding AI Manually: Build a GPT-Backed API with Spring Boot in 30 Minutes | by CodeWithUs | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»Sparse AutoEncoder: from Superposition to interpretable features | by Shuyang Xiang | Feb, 2025
    Artificial Intelligence

    Sparse AutoEncoder: from Superposition to interpretable features | by Shuyang Xiang | Feb, 2025

    FinanceStarGateBy FinanceStarGateFebruary 1, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Disentangle options in advanced Neural Community with superpositions

    Towards Data Science

    Advanced neural networks, comparable to Massive Language Fashions (LLMs), endure very often from interpretability challenges. One of the essential causes for such problem is superposition — a phenomenon of the neural community having fewer dimensions than the variety of options it has to signify. For instance, a toy LLM with 2 neurons has to current 6 totally different language options. Because of this, we observe typically {that a} single neuron must activate for a number of options. For a extra detailed rationalization and definition of superposition, please confer with my earlier blog post: “Superposition: What Makes it Troublesome to Clarify Neural Community”.

    On this weblog put up, we take one step additional: let’s attempt to disentangle some fsuperposed options. I’ll introduce a technique known as Sparse Autoencoder to decompose advanced neural community, particularly LLM into interpretable options, with a toy instance of language options.

    A Sparse Autoencoder, by definition, is an Autoencoder with sparsity launched on objective within the activations of its hidden layers. With a reasonably easy construction and light-weight coaching course of, it goals to decompose a posh neural community and uncover the options in a extra interpretable approach and extra comprehensible to people.

    Allow us to think about that you’ve got a skilled neural community. The autoencoder isn’t a part of the coaching technique of the mannequin itself however is as a substitute a post-hoc evaluation instrument. The unique mannequin has its personal activations, and these activations are collected afterwards after which used as enter information for the sparse autoencoder.

    For instance, we suppose that your unique mannequin is a neural community with one hidden layer of 5 neurons. Apart from, you’ve a coaching dataset of 5000 samples. You need to accumulate all of the values of the 5-dimensional activation of the hidden layer for all of your 5000 coaching samples, and they’re now the enter in your sparse autoencoder.

    Picture by writer: Autoencoder to analyse an LLM

    The autoencoder then learns a brand new, sparse illustration from these activations. The encoder maps the unique MLP activations into a brand new vector house with greater illustration dimensions. Trying again at my earlier 5-neuron easy instance, we’d take into account to map it right into a vector house with 20 options. Hopefully, we are going to get hold of a sparse autoencoder successfully decomposing the unique MLP activations right into a illustration, simpler to interpret and analyze.

    Sparsity is a vital within the autoencoder as a result of it’s obligatory for the autoencoder to “disentangle” options, with extra “freedom” than in a dense, overlapping house.. With out existence of sparsity, the autoencoder will in all probability the autoencoder would possibly simply be taught a trivial compression with none significant options’ formation.

    Language mannequin

    Allow us to now construct our toy mannequin. I urge the readers to notice that this mannequin isn’t real looking and even a bit foolish in observe however it’s ample to showcase how we construct sparse autoencoder and seize some options.

    Suppose now we’ve constructed a language mannequin which has one explicit hidden layer whose activation has three dimensions. Allow us to suppose additionally that we’ve the next tokens: “cat,” “glad cat,” “canine,” “energetic canine,” “not cat,” “not canine,” “robotic,” and “AI assistant” within the coaching dataset they usually have the next activation values.

    information = torch.tensor([
    # Cat categories
    [0.8, 0.3, 0.1, 0.05], # "cat"
    [0.82, 0.32, 0.12, 0.06], # "glad cat" (just like "cat")
    # Canine classes
    [0.7, 0.2, 0.05, 0.2], # "canine"
    [0.75, 0.3, 0.1, 0.25], # "loyal canine" (just like "canine")

    # "Not animal" classes
    [0.05, 0.9, 0.4, 0.4], # "not cat"
    [0.15, 0.85, 0.35, 0.5], # "not canine"

    # Robotic and AI assistant (extra distinct in 4D house)
    [0.0, 0.7, 0.9, 0.8], # "robotic"
    [0.1, 0.6, 0.85, 0.75] # "AI assistant"
    ], dtype=torch.float32)

    Building of autoencoder

    We now construct the autoencoder with the next code:

    class SparseAutoencoder(nn.Module):
    def __init__(self, input_dim, hidden_dim):
    tremendous(SparseAutoencoder, self).__init__()
    self.encoder = nn.Sequential(
    nn.Linear(input_dim, hidden_dim),
    nn.ReLU()
    )
    self.decoder = nn.Sequential(
    nn.Linear(hidden_dim, input_dim)
    )

    def ahead(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return encoded, decoded

    In keeping with the code above, we see that the encoder has a just one absolutely linked linear layer, mapping the enter to a hidden illustration with hidden_dim and it then passes to a ReLU activation. The decoder makes use of only one linear layer to reconstruct the enter. Observe that the absence of ReLU activation within the decoder is intentional for our particular reconstruction case, as a result of the reconstruction would possibly include real-valued and probably destructive valued information. A ReLU would quite the opposite pressure the output to remain non-negative, which isn’t fascinating for our reconstruction.

    We prepare mannequin utilizing the code under. Right here, the loss operate has two elements: the reconstruction loss, measuring the accuracy of the autoencoder’s reconstruction of the enter information, and a sparsity loss (with weight), which inspires sparsity formulation within the encoder.

    # Coaching loop
    for epoch in vary(num_epochs):
    optimizer.zero_grad()

    # Ahead go
    encoded, decoded = mannequin(information)

    # Reconstruction loss
    reconstruction_loss = criterion(decoded, information)

    # Sparsity penalty (L1 regularization on the encoded options)
    sparsity_loss = torch.imply(torch.abs(encoded))

    # Whole loss
    loss = reconstruction_loss + sparsity_weight * sparsity_loss

    # Backward go and optimization
    loss.backward()
    optimizer.step()

    Now we are able to take a look of the end result. Now we have plotted the encoder’s output worth of every activation of the unique fashions. Recall that the enter tokens are “cat,” “glad cat,” “canine,” “energetic canine,” “not cat,” “not canine,” “robotic,” and “AI assistant”.

    Picture by writer: options realized by encoder

    Despite the fact that the unique mannequin was designed with a quite simple structure with none deep consideration, the autoencoder has nonetheless captured significant options of this trivial mannequin. In keeping with the plot above, we are able to observe at the very least 4 options that look like realized by the encoder.

    Give first Function 1 a consideration. This feautre has large activation values on the 4 following tokens: “cat”, “glad cat”, “canine”, and “energetic canine”. The end result means that Function 1 might be one thing associated to “animals” or “pets”. Function 2 can be an fascinating instance, activating on two tokens “robotic” and “AI assistant”. We guess, subsequently, this characteristic has one thing to do with “synthetic and robotics”, indicating the mannequin’s understanding on technological contexts. Function 3 has activation on 4 tokens: “not cat”, “not canine”, “robotic” and “AI assistant” and that is probably a characteristic “not an animal”.

    Sadly, unique mannequin isn’t an actual mannequin skilled on real-world textual content, however reasonably artificially designed with the belief that related tokens have some similarity within the activation vector house. Nevertheless, the outcomes nonetheless present fascinating insights: the sparse autoencoder succeeded in displaying some significant, human-friendly options or real-world ideas.

    The straightforward end result on this weblog put up suggests:, a sparse autoencoder can successfully assist to get high-level, interpretable options from advanced neural networks comparable to LLM.

    For readers curious about a real-world implementation of sparse autoencoders, I like to recommend this article, the place an autoencoder was skilled to interpret an actual massive language mannequin with 512 neurons. This examine offers an actual utility of sparse autoencoders within the context of LLM’s interpretability.

    Lastly, I present right here this google colab notebook for my detailed implementation talked about on this article.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhat’s next for robots | MIT Technology Review
    Next Article Google Asks Platforms and Devices Team to Voluntarily Resign
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization

    June 14, 2025
    Artificial Intelligence

    AI Is Not a Black Box (Relatively Speaking)

    June 13, 2025
    Artificial Intelligence

    Boost Your LLM Output and Design Smarter Prompts: Real Tricks from an AI Engineer’s Toolbox

    June 13, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    IBM Unveils watsonx AI Labs in New York City

    June 2, 2025

    What is Artificial Intelligence? A Non-Technical Guide for 2025 | by Manikesh Tripathi | Jun, 2025

    June 5, 2025

    “Composing the Future: How AI and Neural Networks are Creating Music” | by Jothilingamdj | Mar, 2025

    March 10, 2025

    From Resume to Cover Letter Using AI and LLM, with Python and Streamlit

    February 5, 2025

    Graph Representation Learning using Graph Transformers !! | by Sarvesh Khetan | Mar, 2025

    March 16, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    From Retrieval to Generation: How to Measure RAG Performance | by Kauser | Apr, 2025

    April 25, 2025

    Linear Regression in Time Series: Sources of Spurious Regression

    March 10, 2025

    Okta CEO: AI Will Lead to More Software Engineers, Not Less

    April 9, 2025
    Our Picks

    Optical Proximity Correction in the Manufacturing of Integrated Circuits — Part 2 | by Janhavi Giri | Mar, 2025

    March 2, 2025

    Why Generative AI is Booming: A Beginner’s Guide to LLMs, Ollama, and the Future of AI | by Brain Glitch | May, 2025

    May 7, 2025

    CEOs Get Paid Too Much, According to Pretty Much Everyone in the World | by Bhajan Bishnoi | Feb, 2025

    February 12, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.