Close Menu
    Trending
    • Why This CEO Cut a $500,000 Per Month Product — And What Every Founder Can Learn From It
    • A Journey to the Land of Peace: Our Visit to Hiroshima | by Pokharel vikram | Jun, 2025
    • Use This AI-Powered Platform to Turn Your Side Hustle into a Scalable Business
    • Rethinking Reasoning: A Critical Look at Large Reasoning Models | by Eshaan Gupta | Jun, 2025
    • Streamline Your Workflow With This $30 Microsoft Office Professional Plus 2019 License
    • Future of Business Analytics in This Evolution of AI | by Advait Dharmadhikari | Jun, 2025
    • You’re Only Three Weeks Away From Reaching International Clients, Partners, and Customers
    • How Brain-Computer Interfaces Are Changing the Game | by Rahul Mishra | Coding Nexus | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»Sentence Transformers, Bi-Encoders And Cross-Encoders | by Shaza Elmorshidy | Mar, 2025
    Machine Learning

    Sentence Transformers, Bi-Encoders And Cross-Encoders | by Shaza Elmorshidy | Mar, 2025

    FinanceStarGateBy FinanceStarGateMarch 10, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    A sentence transformer [Bi-Encoder] is a neural community mannequin designed to generate high-quality vector representations (embeddings) for sentences or textual content fragments. It’s primarily based on transformer architectures, corresponding to BERT or RoBERTa, however optimized for duties like semantic similarity, clustering, or retrieval. In contrast to conventional transformers, which concentrate on token-level outputs, sentence transformers produce a fixed-size dense vector for a whole sentence, capturing its semantic which means.

    Cross-Encoders, alternatively, take two textual content inputs (e.g., a question and a candidate response) and course of them collectively by way of a single mannequin to compute a rating, usually indicating their relevance or similarity. They obtain larger accuracy as a result of the mannequin can concentrate on contextual interactions between the inputs, however they’re computationally costly because the scoring requires processing each pair anew.

    Cross Encoders are sometimes used to re-rank the top-k outcomes from a Sentence Transformer mannequin.

    The answer got here in 2019 with Nils Reimers and Iryna Gurevych’s SBERT (Sentence-BERT) and since SBERT, numerous sentence transformer fashions have been developed and optimized.

    SBERT Structure

    SBERT (Sentence-BERT) enhances the BERT mannequin by using a siamese structure, the place two an identical BERT networks course of two separate sentences independently. This produces embeddings for every sentence, pooled utilizing strategies like imply pooling. These sentence embeddings, uuu and vvv, are then mixed right into a single vector that captures their relationship. The best mixture method is (u,v,∣u−v∣)(u, v, |u-v|)(u,v,∣u−v∣), the place ∣u−v∣|u-v|∣u−v∣ represents the element-wise absolute distinction.

    Coaching Course of

    SBERT is fine-tuned on duties like Pure Language Inference (NLI), which includes figuring out whether or not one sentence entails, contradicts, or is impartial to a different. The coaching course of consists of the next steps:

    1. Sentence Embedding: Every sentence pair is processed to generate particular person embeddings.
    2. Concatenation: The embeddings (uuu and vvv) are mixed right into a single vector ((u,v,∣u−v∣)(u, v, |u-v|)(u,v,∣u−v∣)).
    3. Feedforward Neural Community (FFNN): The concatenated vector is handed by way of an FFNN with a number of hidden layers to generate uncooked output logits.
    4. Softmax Layer: The logits are normalized into possibilities, comparable to NLI labels (entailment, contradiction, or impartial).
    5. Cross-Entropy Loss: The expected possibilities are in contrast with precise labels utilizing the cross-entropy loss perform, which penalizes incorrect predictions.
    6. Optimization: The loss is minimized by way of backpropagation, adjusting the mannequin’s parameters to enhance accuracy on the coaching activity.

    Pretrained fashions and evaluations are discovered right here Pretrained Models — Sentence Transformers documentation

    • Basic Objective Fashions: These embrace variations of BERT, RoBERTa, DistilBERT, and XLM-R which are fine-tuned for sentence-level duties. Examples:
      – The all-* fashions have been educated on all obtainable coaching information (greater than 1 billion coaching pairs) and are designed as basic goal fashions. The all-mpnet-base-v2 mannequin supplies the very best quality, whereas all-MiniLM-L6-v2 is 5 occasions quicker and nonetheless presents good high quality.
    • Multilingual Fashions: These fashions help a number of languages, making them splendid for multilingual and cross-lingual duties. Examples:
      distiluse-base-multilingual-cased-v2
      xlm-r-100langs-bert-base-nli-stsb
    • Area-Particular Fashions: Fashions fine-tuned on particular domains or datasets, corresponding to biomedical textual content, monetary paperwork, or authorized textual content. Examples:
      biobert-sentence-transformer: Specialised for biomedical literature.
      – Customized fine-tuned fashions obtainable by way of Hugging Face or Sentence Transformers for area of interest domains.
    • Multimodal Fashions: These fashions can deal with inputs past textual content, corresponding to pictures and textual content mixed, making them helpful for functions like picture captioning, visible query answering, and cross-modal retrieval. Examples:
      clip-ViT-B-32: Integrates visible and textual inputs for duties that contain each modalities, corresponding to discovering pictures primarily based on textual queries.
      mage-text-matching: A specialised mannequin for matching textual content descriptions with related pictures.
    • Activity-Particular Fashions: Pre-trained for duties like semantic search, clustering, and classification. Examples:
      msmarco-MiniLM-L12-v2: Optimized for info retrieval and search duties.
      nli-roberta-base-v2: Designed for pure language inference.
    • Customized Advantageous-Tuned Fashions: Customers can prepare their very own fashions on particular datasets utilizing Sentence Transformers’ coaching utilities. This permits adaptation to extremely specialised use circumstances.

    References:

    What is a Sentence Transformer?

    Index of /docs/sentence_transformer



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleNews Bytes 20250310: TSMC’s $100B for Arizona Fabs, New AGI Benchmarks, JSC’s Quantum-Exascale Integration, Chinese Quantum Reported 1Mx Faster than Google’s
    Next Article Linear Regression in Time Series: Sources of Spurious Regression
    FinanceStarGate

    Related Posts

    Machine Learning

    A Journey to the Land of Peace: Our Visit to Hiroshima | by Pokharel vikram | Jun, 2025

    June 15, 2025
    Machine Learning

    Rethinking Reasoning: A Critical Look at Large Reasoning Models | by Eshaan Gupta | Jun, 2025

    June 14, 2025
    Machine Learning

    Future of Business Analytics in This Evolution of AI | by Advait Dharmadhikari | Jun, 2025

    June 14, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    Learn Data Science Like a Pro: Python Control Flow #Day2 | by Ritesh Gupta | May, 2025

    May 4, 2025

    Why Workforce Efficiency Isn’t Just Code for Layoffs

    May 13, 2025

    Rethinking Reasoning: A Critical Look at Large Reasoning Models | by Eshaan Gupta | Jun, 2025

    June 14, 2025

    MrBeast Crashes Mark Zuckerberg Interview, Suggests Change

    March 28, 2025

    Your Growth Strategy Won’t Matter if Your Team Drowns — 5 Truths About Crisis Leadership

    February 17, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    A Great Domain Name Can Add Millions to Your Business — Here’s How to Get One (Even If It’s Already Taken)

    May 7, 2025

    🚀 The Role of Python in Building Autonomous Agents: Powering the Next Generation of Smart Systems | by Nikulsinh Rajput | May, 2025

    May 31, 2025

    CRA wins case against remote worker claiming moving expenses

    February 27, 2025
    Our Picks

    Why OPENAI CODEX Might Be the Technological COVID We Missed | by Abay Serkebayev | May, 2025

    May 22, 2025

    Ultimate Guide to SQL Commands: DDL vs DML vs TCL vs DQL vs DCL | by The Analyst’s Edge | May, 2025

    May 16, 2025

    May Jobs Report Shows a ‘Steady But Cautious’ Labor Market

    June 8, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.