Close Menu
    Trending
    • Get Started with Rust: Installation and Your First CLI Tool – A Beginner’s Guide
    • I Passed My AWS Machine Learning Engineer Associate Exam! | by carlarjenkins | May, 2025
    • Has AI Changed The Flow Of Innovation?
    • Can you invest your time and money in a mid-career gap and still be financially secure?
    • How to Build the Ultimate Partner Network for Your Startup
    • TDS Authors Can Now Receive Payments Via Stripe
    • Cursor AI: The AI-Powered IDE That’s Redefining How Developers Code | by Johan L | May, 2025
    • Democracy.exe: When Exponential Tech Crashes the Human Mind
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»Pause Your ML Pipelines for Human Review Using AWS Step Functions + Slack
    Artificial Intelligence

    Pause Your ML Pipelines for Human Review Using AWS Step Functions + Slack

    FinanceStarGateBy FinanceStarGateMay 13, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    wished to pause an automatic workflow to attend for a human resolution?

    Possibly you want approval earlier than provisioning cloud assets, selling a machine studying mannequin to manufacturing, or charging a buyer’s bank card.

    In lots of knowledge science and machine studying workflows, automation will get you 90% of the best way — however that crucial final step usually wants human judgment.

    Particularly in manufacturing environments, mannequin retraining, anomaly overrides, or massive knowledge actions require cautious human evaluate to keep away from costly errors.

    In my case, I wanted to manually evaluate conditions the place my system flagged greater than 6% of buyer knowledge for anomalies — usually on account of unintentional pushes by clients.

    Earlier than I applied a correct workflow, this was dealt with informally: builders would straight replace manufacturing databases (!) — dangerous, error-prone, and unscalable.

    To unravel this, I constructed a scalable handbook approval system utilizing AWS Step Capabilities, Slack, Lambda, and SNS — a cloud-native, low-cost structure that cleanly paused workflows for human approvals with out spinning up idle compute.

    On this put up, I’ll stroll you thru the complete design, the AWS assets concerned, and how one can apply it to your individual crucial workflows.

    Let’s get into it 👇

    The Answer

    My utility is deployed within the AWS ecosystem, so we’ll use Aws Step Functions to construct a state machine that:

    1. Executes enterprise logic
    2. Lambda with WaitForTaskToken to pause till approval
    3. Sends a Slack message requesting approval (could be an e-mail/)
    4. Waits for a human to click on “Approve” or “Reject”
    5. Resumes routinely from the identical level
    The Step operate stream

    Here’s a youtube video exhibiting the demo and precise utility in motion:

    I’ve additionally hosted the reside demo app right here →
    👉 https://v0-manual-review-app-fwtjca.vercel.app
    All code is hosted here with the fitting set of IAM permissions.


    Step-by-Step Implementation

    1. Now we’ll create the Step Perform with a handbook evaluate stream step. Right here is the step operate definition:
    Step operate stream with definition

    The stream above generates a dataset, uploads it to AWS S3 and if a evaluate is required, then invokes the Handbook Evaluate lambda. On the handbook evaluate step, we’ll use a Job lambda with an invoke on WaitForTaskToken, which pauses execution till resumed. The lambda reads the token this fashion:

    Python">def lambda_handler(occasion, context):
    
      config = occasion["Payload"]["config"]
      task_token = occasion["Payload"]["taskToken"] # Step Capabilities auto-generates this
    
      reviewer = ManualReview(config, task_token)
      reviewer.send_notification()
    
      return config

    This Lambda sends a Slack message that features the duty token so the operate is aware of what execution to renew.

    2. Earlier than the we ship out the slack notification, we have to

    1. setup an SNS Matter that receives evaluate messages from the lambda
    2. a slack workflow with a web-hook subscribed to the SNS subject, and a confirmed subscription
    3. an https API Gateway with approval and rejection endpoints.
    4. a lambda operate that processes the API Gateway requests: code

    I adopted the youtube video right here for my setup.

    3. As soon as the above is setup, setup the variables into the web-hook step of the slack workflow:

    And use the variables with a useful be aware within the following step:

    The ultimate workflow will appear like this:

    4. Ship a Slack Notification printed to an SNS subject (you may alternately use slack-sdk as properly) with job parameters. Here’s what the message will appear like:

    def publish_message(self, bucket_name: str, s3_file: str, topic: str = "Handbook Evaluate") -> dict:
    
        presigned_url = S3.generate_presigned_url(bucket_name, s3_file, expiration=86400)  # 1 day expiration
    
        message = {
            "approval_link": self.approve_link,
            "rejection_link": self.reject_link,
            "s3_file": presigned_url if presigned_url else s3_file
        }
    
        logging.data(f"Publishing message to , with topic: {topic}, message: {message}")
    
        response = self.shopper.publish(
            TopicArn=self.topic_arn,
            Message=json.dumps(message),
            Topic=topic
        )
    
        logging.data(f"Response: {response}")
        return response

    This Lambda sends a Slack message that features the duty token so the operate is aware of what execution to renew.

    def send_notification(self):
    
        # As quickly as this message is distributed out, this callback lambda will go right into a wait state,
        # till an express name to this Lambda operate execution is triggered.
    
        # If you don't need this operate to attend eternally (or the default Steps timeout), make sure you setup
        # an express timeout on this
        self.sns.publish_message(self.s3_bucket_name, self.s3_key)
    
    def lambda_handler(occasion, context):
    
        config = occasion["Payload"]["config"]
        task_token = occasion["Payload"]["taskToken"]  # Step Capabilities auto-generates this
    
        reviewer = ManualReview(config, task_token)
        reviewer.send_notification()

    5. As soon as a evaluate notification is obtained in slack, the consumer can approve or reject it. The step operate goes right into a wait state till it receives a consumer response; nevertheless the duty token is about to run out in 24 hours, so inactivity will timeout the step operate.

    Primarily based on whether or not the consumer approves or rejects the evaluate request, the rawPath will get set and could be parsed right here: code

    motion = occasion.get("rawPath", "").strip("/").decrease()  
    # Extracts 'approve' or 'reject'

    The receiving API Gateway + Lambda combo:

    • Parses the Slack payload
    • Extracts taskToken + resolution
    • Makes use of StepFunctions.send_task_success() or send_task_failure()

    Instance:

    match motion:
        case "approve":
            output_dict["is_manually_approved"] = True
            response_message = "Approval processed efficiently."
        case "reject":
            output_dict["is_manually_rejected"] = True
            response_message = "Rejection processed efficiently."
        case _:
            return {
                "statusCode": 400,
                "physique": json.dumps({"error": "Invalid motion. Use '/approve' or '/reject' in URL."})
            }
    
    ...
    
    sfn_client.send_task_success(
        taskToken=task_token,
        output=output
    )

    Notice: Lambda configured with WaitForTaskToken should wait. In the event you don’t ship the token, your workflow simply stalls.

    Bonus: In the event you want e-mail or SMS alerts, use SNS to inform a broader group.
    Simply sns.publish() from inside your Lambda or Step Perform.

    Testing

    As soon as the handbook approval system was wired up, it was time to kick the tires. Right here’s how I examined it:

    • Proper after publishing the slack workflow, I confirmed the SNS subscription earlier than messages get forwarded. Don’t skip this step.
    • Then, I triggered the Step Perform manually with a faux payload simulating an information flagging occasion.
    • When the workflow hit the handbook approval step, it despatched a Slack message with Approve/Reject buttons.

    I examined all main paths:

    • Approve: Clicked Approve — noticed the Step Perform resume and full efficiently.
    • Reject: Clicked Reject — Step Perform moved cleanly right into a failure state.
    • Timeout: Ignored the Slack message — Step Perform waited for the configured timeout after which gracefully timed out with out hanging.

    Behind the scenes, I additionally verified that:

    • The Lambda receiving Slack responses was appropriately parsing motion payloads.
    • No rogue job tokens had been left hanging.
    • Step Capabilities metrics and Slack error logs had been clear.

    I extremely suggest testing not simply joyful paths, but additionally “what if no one clicks?” and “what if Slack glitches?” — catching these edge circumstances early saved me complications later.


    Classes Realized

    • At all times use timeouts: Set a timeout each on the WaitForTaskToken step and on the complete Step Perform. With out it, workflows can get caught indefinitely if nobody responds.
    • Cross mandatory context: In case your Step Perform wants sure recordsdata, paths, or config settings after resuming, be sure you encode and ship them alongside within the SNS notification.
      Step Capabilities don’t routinely retain earlier in-memory context when resuming from a Job Token.
    • Handle Slack noise: Watch out about spamming a Slack channel with too many evaluate requests. I like to recommend creating separate channels for improvement, UAT, and manufacturing flows to maintain issues clear.
    • Lock down permissions early: Be sure all of your AWS assets (Lambda capabilities, API Gateway, S3 buckets, SNS Matters) have right and minimal permissions following the precept of least privilege. The place I wanted to customise past AWS’s defaults, I wrote and posted inline IAM insurance policies as JSON. (You’ll discover examples within the GitHub repo).
    • Pre-sign and shorten URLs: In the event you’re sending hyperlinks (e.g., to S3 recordsdata) in Slack messages, pre-sign the URLs for safe entry — and shorten them for a cleaner Slack UI. Right here’s a fast instance I used:
    shorten_url = requests.get(f"http://tinyurl.com/api-create.php?url={presigned_url}").textual content
    default_links[key] = shorten_url if shorten_url else presigned_url

    Wrapping Up

    Including human-in-the-loop logic doesn’t must imply duct tape and cron jobs. With Step Capabilities + Slack, you may construct reviewable, traceable, and production-safe approval flows.

    If this helped, otherwise you’re attempting one thing related, drop a be aware within the feedback! Let’s construct higher workflows. 

    Notice: All photographs on this article had been created by the creator



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleIntroducing Generative AI and Its Use Cases | by Parth Dangroshiya | May, 2025
    Next Article Why AI Makes Your Brand Voice More Valuable Than Ever
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    Get Started with Rust: Installation and Your First CLI Tool – A Beginner’s Guide

    May 13, 2025
    Artificial Intelligence

    TDS Authors Can Now Receive Payments Via Stripe

    May 13, 2025
    Artificial Intelligence

    The Art of the Phillips Curve

    May 13, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Learnings from Building an AI Agent | by Mala Munisamy | Mar, 2025

    March 14, 2025

    Learnings from a Machine Learning Engineer — Part 4: The Model

    February 14, 2025

    How to Spot and Prevent Model Drift Before it Impacts Your Business

    March 6, 2025

    A Guide to Cloud Migration for Legacy Applications

    March 19, 2025

    The Great (Brain) Heist: How TikTok Hijacks Your Attention — The Algorithm Behind the Screen | by Builescu Daniel | Feb, 2025

    February 24, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    How Cross-Chain DApps Handle Gas Optimization

    March 3, 2025

    Kohl’s CEO Ashley Buchanan Fired After 4 Months: ‘Conflicts’

    May 2, 2025

    Hyundai Announces $20 Billion US Investment, New Plant

    March 24, 2025
    Our Picks

    Trade Wars Could Be What The Housing Market Needs To Heat Up

    February 3, 2025

    Machine Learning. Machine Learning Basics | by Pranav V R | Apr, 2025

    April 3, 2025

    Mastering AWS Machine Learning Data Management: Storage, Ingestion, and Transformation | by Rahul Balasubramanian | Mar, 2025

    March 12, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.