Close Menu
    Trending
    • Inspiring Quotes From Brian Wilson of The Beach Boys
    • AI Is Not a Black Box (Relatively Speaking)
    • From Accidents to Actuarial Accuracy: The Role of Assumption Validation in Insurance Claim Amount Prediction Using Linear Regression | by Ved Prakash | Jun, 2025
    • I Wish Every Entrepreneur Had a Dad Like Mine — Here’s Why
    • Why You’re Still Coding AI Manually: Build a GPT-Backed API with Spring Boot in 30 Minutes | by CodeWithUs | Jun, 2025
    • New York Requiring Companies to Reveal If AI Caused Layoffs
    • Powering next-gen services with AI in regulated industries 
    • From Grit to GitHub: My Journey Into Data Science and Analytics | by JashwanthDasari | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»Introduction to Sequence Modeling with Transformers | by Joni Kamarainen | Feb, 2025
    Machine Learning

    Introduction to Sequence Modeling with Transformers | by Joni Kamarainen | Feb, 2025

    FinanceStarGateBy FinanceStarGateFebruary 28, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    The Seq2SeqTransformer can be taught all above sequences with out bother. The ultimate step is to show all eight sequence-to-sequence translations to a single transformer mannequin:

    • 0,0,0,0 → 1,1,1,1
    • 1,1,1,1 → 0,0,0,0
    • 1,1,1 → 0
    • 0,0,0 → 1
    • 0 → 1,1,1
    • 1 → 0,0,0
    • 0,1,0,1 → 0,1,0,1
    • 1,0,1,0 → 1,0,1,0

    The requirement the present mannequin can’t deal with is that sequences are of various lengths. There are two choices, every sequence could be educated individually, which is inefficient, or dummy PAD tokens are added on the finish of sequences which might be shorter than the utmost size. If all sequences are roughly the identical size, then the latter one is extra environment friendly answer.

    Padding maskss

    Further bother with padding is that much like masking of future tokens, the PAD tokens should be masked throughout coaching. There are three torch.nn.Transformer.ahead() parameters by which the masks tensors should be offered:

    • Enter masking (src_key_padding_mask)
    • Output (goal) masking (tgt_key_padding_mask)
    • Decoder reminiscence masking (memory_key_padding_mask)

    In a lot of the circumstances decoder reminiscence masks is similar as enter masks, i.e. it prevents the decoder to see PAD tokens in its ’reminiscence’.

    The padding masks have an effect on the embedding as extra token must be added:

    # Token embedding layer - this takes care of changing integer to vectors
    self.embedding = nn.Embedding(num_tokens+1, d_model, padding_idx = self.padding_idx)

    One other consideration is the loss perform because it ought to ignore gradients with respect to the PAD tokens.

    loss_fn = torch.nn.CrossEntropyLoss(ignore_index=PAD_IDX)

    Let’s put all of it collectively.

    Ultimate run

    Generate knowledge:

    def generate_data5(n):
    SOS_token = np.array([2])
    EOS_token = np.array([3])

    knowledge = []
    seq_len = []

    # 0,0,0,0 -> 0,0,0,0
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [0, 0, 0, 0], EOS_token))
    y = np.concatenate((SOS_token, [0, 0, 0, 0], EOS_token))
    knowledge.append([X, y])
    seq_len.append([4+2, 4+2])

    # 1,1,1,1 -> 1,1,1,1
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [1, 1, 1, 1], EOS_token))
    y = np.concatenate((SOS_token, [1, 1, 1, 1], EOS_token))
    knowledge.append([X, y])
    seq_len.append([4+2, 4+2])

    # 0,0,0 -> 1
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [0, 0, 0], EOS_token))
    y = np.concatenate((SOS_token, [1], EOS_token))
    knowledge.append([X, y])
    seq_len.append([3+2, 1+2])

    # 1,1,1 -> 0
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [1, 1, 1], EOS_token))
    y = np.concatenate((SOS_token, [0], EOS_token))
    knowledge.append([X, y])
    seq_len.append([3+2, 1+2])

    # 1 -> 0,0,0
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [1], EOS_token))
    y = np.concatenate((SOS_token, [0, 0, 0], EOS_token))
    knowledge.append([X, y])
    seq_len.append([1+2, 3+2])

    # 0 -> 1,1,1
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [0], EOS_token))
    y = np.concatenate((SOS_token, [1, 1, 1], EOS_token))
    knowledge.append([X, y])
    seq_len.append([1+2, 3+2])

    # 0,1,0,1 -> 0,1,0,1
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [0,1,0,1], EOS_token))
    y = np.concatenate((SOS_token, [0,1,0,1], EOS_token))
    knowledge.append([X, y])
    seq_len.append([4+2, 4+2])

    # 1,0,1,0 -> 1,0,1,0
    for i in vary(n // 8):
    X = np.concatenate((SOS_token, [1,0,1,0], EOS_token))
    y = np.concatenate((SOS_token, [1,0,1,0], EOS_token))
    knowledge.append([X, y])
    seq_len.append([4+2, 4+2])

    temp = checklist(zip(knowledge, seq_len)) # Pair the weather
    random.shuffle(temp) # Shuffle the pairs
    knowledge, seq_len = zip(*temp) # Unzip into separate lists

    #np.random.shuffle(knowledge)

    return knowledge, seq_len

    Assemble coaching knowledge and add PAD tokens to sequences shorter than the utmost:

    # Generate knowledge and size of every sequence
    tr_data, tr_seq_len = generate_data5(200)

    # Add the pad tokens
    PAD_IDX = 4
    max_len_X = max([foo[0] for foo in tr_seq_len])
    max_len_Y = max([foo[1] for foo in tr_seq_len])
    print(max_len_X)
    print(max_len_Y)

    X_tr = PAD_IDX*torch.ones((max_len_X,len(tr_data)))
    Y_tr = PAD_IDX*torch.ones((max_len_Y,len(tr_data)))
    for ids, s in enumerate(tr_data):
    X_tr[:tr_seq_len[ids][0],ids] = torch.from_numpy(s[0])
    Y_tr[:tr_seq_len[ids][1],ids] = torch.from_numpy(s[1])

    # Assemble logical pad masks (True is PAD)
    src_padding_mask = (X_tr == PAD_IDX).transpose(0, 1)
    tgt_padding_mask = (Y_tr == PAD_IDX).transpose(0, 1)

    Re-define Seq2SeqTransformer with padding assist this time (extra parameters added to the transformer name within the ahead() perform:

    class Seq2SeqTransformer(nn.Module):
    # Constructor
    def __init__(
    self,
    num_tokens,
    d_model,
    nhead,
    num_encoder_layers,
    num_decoder_layers,
    dim_feedforward,
    dropout_p,
    layer_norm_eps,
    padding_idx = None
    ):
    tremendous().__init__()

    self.d_model = d_model
    self.padding_idx = padding_idx

    if padding_idx != None:
    # Token embedding layer - this takes care of changing integer to vectors
    self.embedding = nn.Embedding(num_tokens+1, d_model, padding_idx = self.padding_idx)
    else:
    # Token embedding layer - this takes care of changing integer to vectors
    self.embedding = nn.Embedding(num_tokens, d_model)

    # Token "unembedding" to one-hot token vector
    self.unembedding = nn.Linear(d_model, num_tokens)

    # Positional encoding
    self.positional_encoder = PositionalEncoding(d_model=d_model, dropout=dropout_p)

    # nn.Transformer that does the magic
    self.transformer = nn.Transformer(
    d_model = d_model,
    nhead = nhead,
    num_encoder_layers = num_encoder_layers,
    num_decoder_layers = num_decoder_layers,
    dim_feedforward = dim_feedforward,
    dropout = dropout_p,
    layer_norm_eps = layer_norm_eps,
    norm_first = True
    )

    def ahead(
    self,
    src,
    tgt,
    tgt_mask = None,
    src_key_padding_mask = None,
    tgt_key_padding_mask = None
    ):
    # Notice: src & tgt default dimension is (seq_length, batch_num, feat_dim)

    # Token embedding
    src = self.embedding(src) * math.sqrt(self.d_model)
    tgt = self.embedding(tgt) * math.sqrt(self.d_model)

    # Positional encoding - that is delicate that knowledge _must_ be seq len x batch num x feat dim
    # Inference usually misses the batch num
    if src.dim() == 2: # seq len x feat dim
    src = torch.unsqueeze(src,1)
    src = self.positional_encoder(src)
    if tgt.dim() == 2: # seq len x feat dim
    tgt = torch.unsqueeze(tgt,1)
    tgt = self.positional_encoder(tgt)

    # Transformer output
    out = self.transformer(src, tgt, tgt_mask=tgt_mask, src_key_padding_mask = src_key_padding_mask,
    tgt_key_padding_mask=tgt_key_padding_mask, memory_key_padding_mask=src_key_padding_mask)
    out = self.unembedding(out)

    return out

    Assemble the mannequin and practice

    mannequin = Seq2SeqTransformer(num_tokens = 4, d_model = 8, nhead = 1, num_encoder_layers = 1,
    num_decoder_layers = 1, dim_feedforward = 8, dropout_p = 0.1,
    layer_norm_eps = 1e-05, padding_idx = PAD_IDX)

    num_of_epochs = 2000
    loss_fn = torch.nn.CrossEntropyLoss(ignore_index=PAD_IDX)
    optimizer = torch.optim.Adam(mannequin.parameters(), lr=0.01)
    scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[1000], gamma=0.1)
    mannequin.practice()
    for n in vary(num_of_epochs):
    running_loss = 0.0
    X_in = X_tr.lengthy()
    Y_in = Y_tr[:-1,:].lengthy()
    Y_out = Y_tr[1:,:].lengthy()
    tgt_padding_mask_in = tgt_padding_mask[:,:-1]

    # Get masks to masks out the following phrases
    sequence_length = Y_in.dimension(0)
    tgt_mask = nn.Transformer.generate_square_subsequent_mask(sequence_length)

    Y_pred = mannequin(X_in,Y_in, tgt_mask = tgt_mask, src_key_padding_mask = src_padding_mask,
    tgt_key_padding_mask = tgt_padding_mask_in)

    # seq len x num samples => num samples x seq len
    Y_out = Y_out.permute(1,0)
    # seq len x num samples x token one scorching => num samples x token one scorching x seq len
    Y_pred = Y_pred.permute(1, 2, 0)
    #print(Y_pred.form)
    loss = loss_fn(Y_pred,Y_out)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    running_loss += loss.merchandise()
    scheduler.step()
    if n % 100 == 0:
    print(f' Epoch {n} coaching loss {running_loss} (lr={optimizer.param_groups[0]["lr"]})')
    print(f'Ultimate: Epoch {n} coaching loss {running_loss} (lr={optimizer.param_groups[0]["lr"]})')

    Epoch 0 coaching loss 1.3525161743164062 (lr=0.01)
    Epoch 100 coaching loss 0.5397025942802429 (lr=0.01)
    Epoch 200 coaching loss 0.3458441495895386 (lr=0.01)
    Epoch 300 coaching loss 0.2927950918674469 (lr=0.01)
    Epoch 400 coaching loss 0.24350842833518982 (lr=0.01)
    Epoch 500 coaching loss 0.2193882018327713 (lr=0.01)
    Epoch 600 coaching loss 0.1860966682434082 (lr=0.01)
    Epoch 700 coaching loss 0.15263524651527405 (lr=0.01)
    Epoch 800 coaching loss 0.1544901579618454 (lr=0.01)
    Epoch 900 coaching loss 0.16662688553333282 (lr=0.01)
    Epoch 1000 coaching loss 0.13945193588733673 (lr=0.001)
    Epoch 1100 coaching loss 0.1130961999297142 (lr=0.001)
    Epoch 1200 coaching loss 0.12732738256454468 (lr=0.001)
    Epoch 1300 coaching loss 0.12633047997951508 (lr=0.001)
    Epoch 1400 coaching loss 0.12585079669952393 (lr=0.001)
    Epoch 1500 coaching loss 0.13260918855667114 (lr=0.001)
    Epoch 1600 coaching loss 0.09995909780263901 (lr=0.001)
    Epoch 1700 coaching loss 0.09377395361661911 (lr=0.001)
    Epoch 1800 coaching loss 0.12214040011167526 (lr=0.001)
    Epoch 1900 coaching loss 0.09379428625106812 (lr=0.001)
    Ultimate: Epoch 1999 coaching loss 0.11169883608818054 (lr=0.001)

    Check the mannequin with all sequences

    # Right here we check some examples to look at how the mannequin predicts
    examples = [
    torch.tensor([2, 0, 0, 0, 0, 3], dtype=torch.lengthy),
    torch.tensor([2, 1, 1, 1, 1, 3], dtype=torch.lengthy),
    torch.tensor([2, 1, 1, 1, 3], dtype=torch.lengthy),
    torch.tensor([2, 0, 0, 0, 3], dtype=torch.lengthy),
    torch.tensor([2, 0, 3], dtype=torch.lengthy),
    torch.tensor([2, 1, 3], dtype=torch.lengthy),
    torch.tensor([2, 0, 1, 0, 1, 3], dtype=torch.lengthy),
    torch.tensor([2, 1, 0, 1, 0, 3], dtype=torch.lengthy),
    ]

    for idx, instance in enumerate(examples):
    consequence = predict(mannequin, instance)
    print(f"Instance {idx}")
    print(f"Enter sequence: {instance.view(-1).tolist()[1:-1]}")
    print(f"Output (predicted) sequence: {consequence[1:-1]}")
    print()

    Instance 0
    Enter sequence: [0, 0, 0, 0]
    Output (predicted) sequence: [0, 0, 0, 0]

    Instance 1
    Enter sequence: [1, 1, 1, 1]
    Output (predicted) sequence: [1, 1, 1, 1]

    Instance 2
    Enter sequence: [1, 1, 1]
    Output (predicted) sequence: [0]

    Instance 3
    Enter sequence: [0, 0, 0]
    Output (predicted) sequence: [1]

    Instance 4
    Enter sequence: [0]
    Output (predicted) sequence: [1, 1, 1]

    Instance 5
    Enter sequence: [1]
    Output (predicted) sequence: [0, 0, 0]

    Instance 6
    Enter sequence: [0, 1, 0, 1]
    Output (predicted) sequence: [0, 1, 0, 1]

    Instance 7
    Enter sequence: [1, 0, 1, 0]
    Output (predicted) sequence: [1, 0, 1, 0]

    It really works as we anticipated. The complete mannequin can now be used for any Seq2Seq drawback.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Article5 Real-World Applications of Quantum Computing in 2025
    Next Article An ancient RNA-guided system could simplify delivery of gene editing therapies | MIT News
    FinanceStarGate

    Related Posts

    Machine Learning

    From Accidents to Actuarial Accuracy: The Role of Assumption Validation in Insurance Claim Amount Prediction Using Linear Regression | by Ved Prakash | Jun, 2025

    June 13, 2025
    Machine Learning

    Why You’re Still Coding AI Manually: Build a GPT-Backed API with Spring Boot in 30 Minutes | by CodeWithUs | Jun, 2025

    June 13, 2025
    Machine Learning

    From Grit to GitHub: My Journey Into Data Science and Analytics | by JashwanthDasari | Jun, 2025

    June 13, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    CRA’s ‘stupid mistake’ compels taxpayer to pay taxes on extra income

    March 14, 2025

    This Overlooked Legal Tool Can Protect Your Most Sensitive Data

    April 22, 2025

    Data Center Cooling: PFCC and ENEOS Collaborate on Materials R&D with NVIDIA ALCHEMI Software

    April 3, 2025

    Building ETL Pipelines for Machine Learning Using PySpark: A Comprehensive Guide | by Orami | Apr, 2025

    April 16, 2025

    BOOK DRAGON: BOOK GENRE CLASSIFICATION USING MACHINE LEARNING | by Ishita Joshi | Apr, 2025

    April 29, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Anthropic CEO Predicts AI Will Take Over Coding in 12 Months

    March 15, 2025

    Smart Anomaly Detection Framework for Satellite Images — technical details | by Talex Maxim (Taimax) | Mar, 2025

    March 30, 2025

    Reincarnation of Robots and Machines | by AI & Tech by Nidhika, PhD | Jun, 2025

    June 5, 2025
    Our Picks

    How the Gig Economy Is Failing Businesses

    May 22, 2025

    Apple iPhone Prices Could Rise to $3,500 if Made in the US

    April 12, 2025

    Liquid Cooling: CoolIT Systems Announces Row-Based Coolant Distribution Unit

    April 15, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.