Close Menu
    Trending
    • Forecasting Seizures With Wearables: Personalizing Epilepsy Care Through AI and Remote Monitoring | by Henry Nduka | Jun, 2025
    • Hitting ‘Unsubscribe’ to Annoying Emails Isn’t Safe Anymore
    • Regularisation: A Deep Dive into Theory, Implementation, and Practical Insights
    • From Lines to Classes: Wrapping Up Chapter 4 of Hands-On ML | by Khushi Rawat | Jun, 2025
    • Take a Look: This Single AI Platform Pulls Together Some of the Most Popular AI Models
    • A Beginner’s Approach to Building a Regression Model (End-to-End Project) | by The Data Learner | Jun, 2025
    • Execution Fear: The Silent Killer of Great Real Estate Deals
    • How to Harness Your Inner Athlete and Reach Peak Performance
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»A Beginner’s Approach to Building a Regression Model (End-to-End Project) | by The Data Learner | Jun, 2025
    Machine Learning

    A Beginner’s Approach to Building a Regression Model (End-to-End Project) | by The Data Learner | Jun, 2025

    FinanceStarGateBy FinanceStarGateJune 16, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Hello there! In case you are studying Machine Studying, one of many very first fashions you’ll in all probability come throughout is the Linear Regression mannequin. It’s very good and straightforward to work with. Whether or not it’s easy linear regression or a number of linear regression. You might have discovered about it, concerning the benefits, disadvantages, and assumptions.

    The weblog is for you in case you have performed all the idea half and you might be prepared to suit the mannequin, however you don’t know the way. I imply, understanding the idea is one factor, however implementing one other, proper?

    So, how will we strategy a a number of linear regression downside?
    Let’s be sincere, we not often know from the start what sort of relationship exists within the knowledge or what mannequin will work greatest after we begin. So right here’s what we do (or a minimum of, what I do)

    We outline our downside, get the info (I’m fairly positive you already know the way to try this ), after which carry out knowledge preprocessing and EDA. That is such an necessary stage as a result of EDA helps in characteristic engineering, if required. At this stage, we work out what options must be scaled, or eliminated, what wants encoding, and what wants cleansing. (If we work out that totally different options in our dataset require totally different remedies, it’s a good observe to make use of a column transformer.)

    As soon as that’s performed, we transfer on to mannequin becoming, carry out cross-validation to test if our mannequin is overfitting or generalizing properly, and eventually do some hyperparameter tuning. After that, we are able to export the mannequin and deploy it.

    Nicely, that’s our end-to-end ML venture in a nutshell.

    Now, allow us to get by it with a non-scary instance.

    1. Information

    Pattern Information

    That is how the info seems to be. I used to be looking for a contemporary dataset on the web, however couldn’t, so as an alternative I generated artificial knowledge utilizing AI. (I’ll later replace this venture with actual scraped knowledge. )

    You may both use this, generate your personal, scrape, or decide any knowledge obtainable on-line.

    The information is about varied traits of a freelancer like expertise, area, common ranking by previous purchasers, whether or not they’re new or skilled, and their hourly fee. The hourly fee is the goal variable we intention to foretell for freelancers.

    Right here’s what every column means:

    • Area: Sort of labor (Internet Dev, Content material Writing, and many others.)
    • Experience_Years: 0.1–10 years of labor expertise
    • Projects_Completed: Variety of previous initiatives
    • Avg_Rating: Freelancer ranking
    • Retention_Rate: % of purchasers who returned
    • Premium_Certified: Boolean (1 if licensed)
    • Portfolio_Pieces_Count: Variety of initiatives of their portfolio
    • Learning_Hours_Per_Week: Weekly research time
    • Is_New: Whether or not the freelancer is new (Sure-1/No-0)
    • Hourly_Rate_USD: Goal variable

    2. Information Preprocessing

    Now this half wasn’t very attention-grabbing right here. And that’s the draw back of artificial knowledge that we often don’t get to carry out on the messy knowledge, which isn’t the case in real-world knowledge. Regardless that I requested for some noise, I nonetheless obtained clear rows, no lacking values, and no duplicates. So, I simply checked the fundamental form, abstract statistics, and knowledge varieties.

    3. Information Visualization

    I plotted some graphs to grasp the info higher. However how do we all know what to plot?

    We don’t. No less than, I don’t. I simply apply univariate evaluation, then multivariate, after which maintain the necessary ones. (One can even use Pandas Profiling for EDA)

    The correlation matrix was tremendous attention-grabbing. Some variables have been strongly correlated:

    • Expertise and Initiatives Accomplished have been strongly linked to increased Hourly Charge (~0.8 correlation)
    • Is_New was extremely negatively correlated with Avg_Rating (-0.95) and Retention_Rate (-0.92)

    Which is sensible, proper? Skilled freelancers would’ve accomplished extra initiatives and certain have higher scores.

    A couple of options like Studying Hours and Portfolio Items had outliers in boxplots. However after trying carefully, they weren’t truly outliers. New freelancers had increased studying hours, and fewer portfolio items that’s regular. So, I didn’t deal with them as outliers.

    After just a few extra visualizations right here and there, I lastly wrapped it up.

    4. Mannequin Becoming

    Then I proceeded to suit a Linear Regression mannequin to the info. (As a result of I had generated the info with an intention to suit Linear Regression. However on any actual world knowledge we are able to apply and test which mannequin performs properly utilizing the metrices. Right here, If I decide R² rating and that comes out to be very poor then I might know that the info won’t be linear and use a unique mannequin)

    I developed a pipeline:

    (i) Encoding the Area characteristic

    (ii) Becoming the LR mannequin

    (iii) Calculating the R² Rating and MAE

    • R² Rating tells us how a lot variance the options clarify within the goal variable. It ranges from 0 to 1 (the nearer to 1, the higher).
    • MAE (Imply Absolute Error) tells us, on common, how far off the mannequin’s predictions are from the precise values.

    Subsequent, I did cross-validation. The mannequin had fairly constant scores each time , so no overfitting.

    5. Assumptions

    Now that I had the LR mannequin, I checked its assumptions:

    (i) Linearity

    I plotted a pairplot most relationships appeared form of linear to me.

    (ii) Multicollinearity

    I calculated VIF (Variance Inflation Issue) this tells us how a lot a variable inflates the variance of the regression coefficients as a result of multicollinearity.

    If VIF > 5, there’s a multicollinearity downside.
    I had excessive VIF values for Avg_Rating and Retention_Rate.

    So right here I had two decisions:
    Both take away certainly one of them or strive a Ridge Regression.

    (iii) Normality of residuals

    I plotted a distribution of residuals (distinction between precise and predicted). It appeared roughly regular.

    (iv) Homoscedasticity

    This implies equal variance of residuals. I plotted a scatter plot of residuals vs. predictions they have been randomly scattered, so assumption was right.

    (v) Autocorrelation of Residuals

    I plotted residuals over index. The plot had a random zig-zag, no apparent sample so, no autocorrelation downside both.

    (Checking assumptions of LR simply validates my mannequin. And It additionally inform me what higher I can do with the mannequin. Like right here it identified multicollinearity so I used regularisation.)

    6. Finest Mannequin and Deployment

    Now I needed to take care of multicollinearity since I knew the LR was positive with this knowledge. So, I match a Ridge regression to deal with multicollinearity. It gave outcomes just like the linear mannequin. I additionally I went forward and eliminated one of many multicollinear options (Retention_Rate) I did becuse I assumed avearge ranking is one thing that may be simply calcualted by the consumer and likewise it’s simply accessible. Then used hyperparameter tuning to get one of the best alpha worth for Ridge

    Lastly, I exported the mannequin utilizing pickle.dump() and deployed it utilizing Streamlit.

    You may entry the stay app from right here: Freelancer Hourly Rate Estimator WebApp

    7. Sources

    Jupyter Pocket book: Notebook

    Dataset: Freelancer Hourly Rate Estimator Dataset

    Conclusion

    I hope this venture expalnation was helpful for you! That is certainly one of my preliminary ML initiatives. I might need missed just a few issues in between however I’m following this text up with ML Interview Questions that will probably be relevent to this whole porject and likewise on the whole, so keep tuned! I might actually respect any suggestions or ideas you could have.

    Let’s construct worthwhile fashions!

    (Used ChatGPT for sentence correction and fixing grammatical errors)



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleExecution Fear: The Silent Killer of Great Real Estate Deals
    Next Article Take a Look: This Single AI Platform Pulls Together Some of the Most Popular AI Models
    FinanceStarGate

    Related Posts

    Machine Learning

    Forecasting Seizures With Wearables: Personalizing Epilepsy Care Through AI and Remote Monitoring | by Henry Nduka | Jun, 2025

    June 16, 2025
    Machine Learning

    From Lines to Classes: Wrapping Up Chapter 4 of Hands-On ML | by Khushi Rawat | Jun, 2025

    June 16, 2025
    Machine Learning

    Cross-Entropy Loss — A Simple Explanation of the Core of Machine Learning Classification | by christoschr97 | Jun, 2025

    June 16, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Understanding Big Data: Why Every Educated Person Should Know the Basics | by Sajjad Ahmad | Mar, 2025

    March 22, 2025

    🤖 HATERS? NO PROBLEM. NO LIKEY ROBOT? YOU DON’T GET ONE. EVER. You heard me. – NickyCammarata

    May 22, 2025

    How I Built My First Machine Learning Model with Zero Experience (Step-by-Step Guide) | by Jakka Hari Anjaneyulu | May, 2025

    May 27, 2025

    Simplify Trading: Build a Multi-Timeframe Dashboard in Pine Script (Without Chart-Hopping) | by Betashorts | May, 2025

    May 11, 2025

    Validation technique could help scientists make more accurate forecasts | MIT News

    February 7, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Why Many Business Owners are Finally Moving on From Microsoft 365

    April 12, 2025

    Sam The Concrete Man is North America’s #1 Residential Concrete Franchise

    February 19, 2025

    Categorical Cross-Entropy Loss and Accuracy | by Mrscott | May, 2025

    May 4, 2025
    Our Picks

    Exporting MLflow Experiments from Restricted HPC Systems

    April 24, 2025

    Can AI Have Emotions? The Science Behind Artificial Feelings | by Nitay V. | Feb, 2025

    February 25, 2025

    Revolutionizing Palm Oil Plantations: How AI and Drones are Cultivating Efficiency and Sustainability

    May 20, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.