Close Menu
    Trending
    • High Paying, Six Figure Jobs For Recent Graduates: Report
    • What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization
    • YouBot: Understanding YouTube Comments and Chatting Intelligently — An Engineer’s Perspective | by Sercan Teyhani | Jun, 2025
    • Inspiring Quotes From Brian Wilson of The Beach Boys
    • AI Is Not a Black Box (Relatively Speaking)
    • From Accidents to Actuarial Accuracy: The Role of Assumption Validation in Insurance Claim Amount Prediction Using Linear Regression | by Ved Prakash | Jun, 2025
    • I Wish Every Entrepreneur Had a Dad Like Mine — Here’s Why
    • Why You’re Still Coding AI Manually: Build a GPT-Backed API with Spring Boot in 30 Minutes | by CodeWithUs | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»Behind the Magic: How Tensors Drive Transformers
    Artificial Intelligence

    Behind the Magic: How Tensors Drive Transformers

    FinanceStarGateBy FinanceStarGateApril 25, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Transformers have modified the way in which synthetic intelligence works, particularly in understanding language and studying from knowledge. On the core of those fashions are tensors (a generalized sort of mathematical matrices that assist course of info) . As knowledge strikes by way of the completely different components of a Transformer, these tensors are topic to completely different transformations that assist the mannequin make sense of issues like sentences or pictures. Studying how tensors work inside Transformers may also help you perceive how immediately’s smartest AI methods truly work and suppose.

    What This Article Covers and What It Doesn’t

    ✅ This Article IS About:

    • The movement of tensors from enter to output inside a Transformer mannequin.
    • Guaranteeing dimensional coherence all through the computational course of.
    • The step-by-step transformations that tensors bear in numerous Transformer layers.

    ❌ This Article IS NOT About:

    • A common introduction to Transformers or deep studying.
    • Detailed structure of Transformer fashions.
    • Coaching course of or hyper-parameter tuning of Transformers.

    How Tensors Act Inside Transformers

    A Transformer consists of two most important elements:

    • Encoder: Processes enter knowledge, capturing contextual relationships to create significant representations.
    • Decoder: Makes use of these representations to generate coherent output, predicting every ingredient sequentially.

    Tensors are the elemental knowledge constructions that undergo these elements, experiencing a number of transformations that guarantee dimensional coherence and correct info movement.

    Picture From Analysis Paper: Transformer normal archictecture

    Enter Embedding Layer

    Earlier than coming into the Transformer, uncooked enter tokens (phrases, subwords, or characters) are transformed into dense vector representations by way of the embedding layer. This layer capabilities as a lookup desk that maps every token vector, capturing semantic relationships with different phrases.

    Picture by writer: Tensors passing by way of Embedding layer

    For a batch of 5 sentences, every with a sequence size of 12 tokens, and an embedding dimension of 768, the tensor form is:

    • Tensor form: [batch_size, seq_len, embedding_dim] → [5, 12, 768]

    After embedding, positional encoding is added, guaranteeing that order info is preserved with out altering the tensor form.

    Modified Picture from Analysis Paper: Scenario of the workflow

    Multi-Head Consideration Mechanism

    One of the essential elements of the Transformer is the Multi-Head Consideration (MHA) mechanism. It operates on three matrices derived from enter embeddings:

    • Question (Q)
    • Key (Okay)
    • Worth (V)

    These matrices are generated utilizing learnable weight matrices:

    • Wq, Wk, Wv of form [embedding_dim, d_model] (e.g., [768, 512]).
    • The ensuing Q, Okay, V matrices have dimensions 
      [batch_size, seq_len, d_model].
    Picture by writer: Desk exhibiting the shapes/dimensions of Embedding, Q, Okay, V tensors

    Splitting Q, Okay, V into A number of Heads

    For efficient parallelization and improved studying, MHA splits Q, Okay, and V into a number of heads. Suppose we now have 8 consideration heads:

    • Every head operates on a subspace of d_model / head_count.
    Picture by writer: Multihead Consideration
    • The reshaped tensor dimensions are [batch_size, seq_len, head_count, d_model / head_count].
    • Instance: [5, 12, 8, 64] → rearranged to [5, 8, 12, 64] to make sure that every head receives a separate sequence slice.
    Picture by writer: Reshaping the tensors
    • So every head will get the its share of Qi, Ki, Vi
    Picture by writer: Every Qi,Ki,Vi despatched to completely different head

    Consideration Calculation

    Every head computes consideration utilizing the formulation:

    As soon as consideration is computed for all heads, the outputs are concatenated and handed by way of a linear transformation, restoring the preliminary tensor form.

    Picture by writer: Concatenating the output of all heads
    Modified Picture From Analysis Paper: Scenario of the workflow

    Residual Connection and Normalization

    After the multi-head consideration mechanism, a residual connection is added, adopted by layer normalization:

    • Residual connection: Output = Embedding Tensor + Multi-Head Consideration Output
    • Normalization: (Output − μ) / σ to stabilize coaching
    • Tensor form stays [batch_size, seq_len, embedding_dim]
    Picture by writer: Residual Connection

    Feed-Ahead Community (FFN)

    Within the decoder, Masked Multi-Head Consideration ensures that every token attends solely to earlier tokens, stopping leakage of future info.

    Modified Picture From Analysis Paper: Masked Multi Head Consideration

    That is achieved utilizing a decrease triangular masks of form [seq_len, seq_len] with -inf values within the higher triangle. Making use of this masks ensures that the Softmax operate nullifies future positions.

    Picture by writer: Masks matrix

    Cross-Consideration in Decoding

    For the reason that decoder doesn’t absolutely perceive the enter sentence, it makes use of cross-attention to refine predictions. Right here:

    • The decoder generates queries (Qd) from its enter ([batch_size, target_seq_len, embedding_dim]).
    • The encoder output serves as keys (Ke) and values (Ve).
    • The decoder computes consideration between Qd and Ke, extracting related context from the encoder’s output.
    Modified Picture From Analysis Paper: Cross Head Consideration

    Conclusion

    Transformers use tensors to assist them study and make sensible selections. As the info strikes by way of the community, these tensors undergo completely different steps—like being changed into numbers the mannequin can perceive (embedding), specializing in necessary components (consideration), staying balanced (normalization), and being handed by way of layers that study patterns (feed-forward). These modifications maintain the info in the fitting form the entire time. By understanding how tensors transfer and alter, we will get a greater concept of how AI models work and the way they’ll perceive and create human-like language.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe 12 Dimensions of Agentic AI Maturity | by Frank Klucznik | Apr, 2025
    Next Article Microsoft Surface Ad Is AI-Generated, No One Picked Up On It
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    What If I had AI in 2018: Rent the Runway Fulfillment Center Optimization

    June 14, 2025
    Artificial Intelligence

    AI Is Not a Black Box (Relatively Speaking)

    June 13, 2025
    Artificial Intelligence

    Boost Your LLM Output and Design Smarter Prompts: Real Tricks from an AI Engineer’s Toolbox

    June 13, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    A Data Scientist’s Guide to Docker Containers

    April 8, 2025

    DIY AI & ML: Logistic Regression. An exercise in gaining an in-depth… | by Jacob Ingle | Feb, 2025

    February 19, 2025

    Why Generative AI is Booming: A Beginner’s Guide to LLMs, Ollama, and the Future of AI | by Brain Glitch | May, 2025

    May 7, 2025

    Data Science: From School to Work, Part IV

    April 24, 2025

    How Automatic Speech Recognition is Shaping the Future of Voice Technology | by Matthew-Mcmullen | May, 2025

    May 6, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    jhhhghgggg

    February 7, 2025

    Report: NVIDIA and AMD Devising Export Rules-Compliant Chips for China AI Market

    May 29, 2025

    Can you invest your time and money in a mid-career gap and still be financially secure?

    May 13, 2025
    Our Picks

    How User-Generated Content Helps You Build Trust and Credibility

    March 20, 2025

    How I Built a Bird Identification App with OpenAI CLIP | by Operation Curiosity | Jun, 2025

    June 8, 2025

    The First Car Ever Made – Anastasya_iuly

    February 4, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.