Close Menu
    Trending
    • Turn Your Professional Expertise into a Book—You Don’t Even Have to Write It Yourself
    • Agents, APIs, and the Next Layer of the Internet
    • AI copyright anxiety will hold back creativity
    • ML Data Pre-processing: Cleaning and Preparing Data for Success | by Brooksolivia | Jun, 2025
    • Business Owners Can Finally Replace a Subtle Cost That Really Adds Up
    • I Won $10,000 in a Machine Learning Competition — Here’s My Complete Strategy
    • When AIs bargain, a less advanced agent could cost you
    • Do You Really Need GraphRAG? — AI Innovations and Insights 50 | by Florian June | AI Exploration Journey | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»I Won $10,000 in a Machine Learning Competition — Here’s My Complete Strategy
    Artificial Intelligence

    I Won $10,000 in a Machine Learning Competition — Here’s My Complete Strategy

    FinanceStarGateBy FinanceStarGateJune 17, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    in my first ML competitors and actually, I’m nonetheless a bit shocked.

    I’ve labored as an information scientist in FinTech for six years. After I noticed that Spectral Finance was working a credit score scoring problem for Web3 wallets, I made a decision to provide it a attempt regardless of having zero blockchain expertise.

    Right here had been my limitations:

    • I used my laptop, which has no GPUs
    • I solely had a weekend (~10 hours) to work on it
    • I had by no means touched web3 or blockchain knowledge earlier than
    • I had by no means constructed a neural community for credit score scoring

    The competitors purpose was easy: predict which Web3 wallets had been prone to default on loans utilizing their transaction historical past. Primarily, conventional credit score scoring however with DeFi knowledge as a substitute of financial institution statements.

    To my shock, I got here second and received $10k in USD Coin! Sadly, Spectral Finance has since taken the competitors web site and leaderboard down, however right here’s a screenshot from after I received:

    My username was Ds-clau, second place with a rating of 83.66 (picture by creator)

    This expertise taught me that understanding the enterprise drawback actually issues. On this put up, I’ll present you precisely how I did it with detailed explanations and Python code snippets, so you possibly can replicate this method on your subsequent machine studying mission or competitors.

    Getting Began: You Don’t Want Costly {Hardware}

    Let me get this clear, you don’t essentially want an costly cloud computing setup to win ML competitions (until the dataset is simply too huge to suit regionally).

    The dataset for this competitors contained 77 options and 443k rows, which isn’t small by any means. The information got here as a .parquet file that I downloaded utilizing duckdb.

    I used my private laptop computer, a MacBook Professional with 16GB RAM and no GPU. All the dataset match regionally on my laptop computer, although I have to admit the coaching course of was a bit sluggish.

    Perception: Intelligent sampling strategies get you 90% of the insights with out the excessive computational prices. Many individuals get intimidated by giant datasets and assume they want huge cloud cases. You can begin a mission regionally by sampling a portion of the dataset and analyzing the pattern first.

    EDA: Know Your Information

    Right here’s the place my fintech background turned my superpower, and I approached this like every other credit score threat drawback.

    First query in credit score scoring: What’s the category distribution?

    Seeing the 62/38 cut up made me shiver… 38% is a very excessive default fee from a enterprise perspective, however fortunately, the competitors wasn’t about pricing this product.

    Subsequent, I wished to see which options truly mattered:

    That is the place I received excited. The patterns had been precisely what I’d count on from credit score knowledge:

    • risk_factor was the strongest predictor and confirmed > 0.4 correlation with the goal variable (greater threat actor = extra prone to default)
    • time_since_last_liquidated confirmed a powerful detrimental correlation, so the extra just lately they final liquidated, they riskier they had been. This traces up as anticipated, since excessive velocity is normally a excessive threat sign (current liquidation = dangerous)
    • liquidation_count_sum_eth urged that debtors with greater liquidation counts in ETH had been threat flags (extra liquidations = riskier behaviour)

    Perception: Taking a look at Pearson correlation is a straightforward but intuitive solution to perceive linear relationships between options and the goal variable. It’s an effective way to achieve instinct on which options ought to and shouldn’t be included in your remaining mannequin.

    Characteristic Choice: Much less is Extra

    Right here’s one thing that all the time puzzles executives after I clarify this to them:

    Extra options doesn’t all the time imply higher efficiency.

    In reality, too many options normally imply worse efficiency and slower coaching, as a result of additional options add noise. Each irrelevant function makes your mannequin a bit bit worse at discovering the actual patterns.

    So, function choice is a vital step that I by no means skip. I used recursive function elimination to search out the optimum variety of options. Let me stroll you thru my actual course of:

    The candy spot was 34 options. After this level, the mannequin efficiency as measured by the AUC rating didn’t enhance with further options. So, I ended up utilizing lower than half of the given options to coach my mannequin, going from 77 options all the way down to 34.

    Perception: This discount in options eradicated noise whereas preserving sign from the necessary options, resulting in a mannequin that was each sooner to coach and extra predictive.

    Constructing the Neural Community: Easy But Highly effective Structure

    Earlier than defining the mannequin structure, I needed to outline the dataset correctly:

    1. Cut up into coaching and validation units (for verifying outcomes after mannequin coaching)
    2. Scale options as a result of neural networks are very delicate to outliers
    3. Convert datasets to PyTorch tensors for environment friendly computation

    Right here’s my actual knowledge preprocessing pipeline:

    Now comes the enjoyable half: constructing the precise neural community mannequin.

    Essential context: Spectral Finance (the competitors organizer) restricted mannequin deployments to solely neural networks and logistic regression due to their zero-knowledge proof system.

    ZK proofs require mathematical circuits that may cryptographically confirm computations with out revealing underlying knowledge, and neural networks and logistic regression could be effectively transformed into ZK circuits.

    Because it was my first time constructing a neural community for credit score scoring, I wished to maintain issues easy however efficient. Right here’s my mannequin structure:

    Let’s stroll by means of my structure alternative intimately:

    • 5 hidden layers: Deep sufficient to seize advanced patterns, shallow sufficient to keep away from overfitting
    • 64 neurons per layer: Good stability between capability and computational effectivity
    • ReLU activation: Normal alternative for hidden layers, prevents vanishing gradients
    • Dropout (0.2): Prevents overfitting by randomly zeroing 20% of neurons throughout coaching
    • Sigmoid output: superb for binary classification, outputs chances between 0 and 1

    Coaching the Mannequin: The place the Magic Occurs

    Now for the coaching loop that kicks off the mannequin studying course of:

    Listed here are some particulars on the mannequin coaching course of:

    • Early stopping: Prevents overfitting by stopping when validation efficiency stops enhancing
    • SGD with momentum: Easy however efficient optimizer alternative
    • Validation monitoring: Important for monitoring actual efficiency, not simply coaching loss

    The coaching curves confirmed regular enhancements with out overfitting in the course of the coaching course of. That is precisely what I wished to see.

    Model training loss surves
    Mannequin coaching loss curves (picture by creator)

    The Secret Weapon: Threshold Optimization

    Right here’s the place I most likely outperformed others with extra difficult fashions within the competitors: I guess most individuals submitted predictions with the default 0.5 threshold.

    However as a result of class imbalance (~38% of loans defaulted), I knew that the default threshold can be suboptimal. So, I used precision-recall evaluation to select a greater cutoff.

    I ended up maximizing the F1 rating, which is the harmonic imply between precision and recall. The optimum threshold based mostly on the very best F1 rating was 0.35 as a substitute of 0.5. This single change improved my competitors rating by a number of share factors, seemingly the distinction between putting and successful.

    Perception: In the actual world, various kinds of errors have totally different prices. Lacking a default loses you cash, which rejecting a superb buyer simply loses you potential revenue. The brink ought to replicate this actuality and shouldn’t be set arbitrarily at 0.5.

    Conclusion

    This competitors bolstered one thing I’ve identified for some time:

    Success in machine studying isn’t about having the fanciest instruments or probably the most advanced algorithms.

    It’s about understanding your drawback, making use of strong fundamentals, and specializing in what truly strikes the needle.

    You don’t want a PhD to be an information scientist or win a ML competitors.

    You don’t must implement the newest analysis papers.

    You additionally don’t want costly cloud assets.

    What you do want is area data, strong fundamentals, consideration to particulars that others would possibly overlook (like threshold optimization).


    Wish to construct your AI expertise?

    👉🏻 I run the AI Weekender, which options enjoyable weekend AI initiatives and fast, sensible suggestions that will help you construct with AI.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhen AIs bargain, a less advanced agent could cost you
    Next Article Business Owners Can Finally Replace a Subtle Cost That Really Adds Up
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    Agents, APIs, and the Next Layer of the Internet

    June 17, 2025
    Artificial Intelligence

    Grad-CAM from Scratch with PyTorch Hooks

    June 17, 2025
    Artificial Intelligence

    Build an AI Agent to Explore Your Data Catalog with Natural Language

    June 17, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Deep Learning in Action: What Self-Driving Cars, Style Transfer, and Flappy Bird Can Teach You About the Future of AI | by Julieta D. Rubis | Jun, 2025

    June 5, 2025

    BluSky AI Inc. Introduces Modular AI Data Center Solutions

    April 15, 2025

    Education as a Shared Mission: Lessons from Japan | by Abrar Iqbal | Mar, 2025

    March 20, 2025

    User-friendly system can help developers build more efficient simulations and AI models | MIT News

    February 3, 2025

    TikTok Layoffs Hit E-Commerce Division in US, TikTok Shop

    May 21, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Cycling Benefits: Why Riding a Bike Every Day Can Revolutionize Your Health | by Professor | May, 2025

    May 2, 2025

    How Quantum Computing is Transforming Data Science Careers | by Suhas GM | May, 2025

    May 13, 2025

    Hd#شماره خاله تهران# شماره خاله تهرانپارس# شماره خاله تهرانسر# شماره خاله انقلاب شماره خاله ونک…

    March 16, 2025
    Our Picks

    The Growing Demand for Web3 Professionals & How Certifications Can Help

    March 25, 2025

    90% of Your Business Could Be Automated With Just These 4 Tools

    April 5, 2025

    Time Series Analysis: A Comprehensive Guide | by Padmajeet Mhaske | Mar, 2025

    March 25, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.