Close Menu
    Trending
    • Army Dog Center Pakistan 03457512069 | by Army Dog Center Pakistan 03008751871 | Jun, 2025
    • How Banking App Chime Went From Broke to IPO Billions
    • Technologies. Photo by Markus Spiske on Unsplash | by Abhinav Shrivastav | Jun, 2025
    • Why This CEO Cut a $500,000 Per Month Product — And What Every Founder Can Learn From It
    • A Journey to the Land of Peace: Our Visit to Hiroshima | by Pokharel vikram | Jun, 2025
    • Use This AI-Powered Platform to Turn Your Side Hustle into a Scalable Business
    • Rethinking Reasoning: A Critical Look at Large Reasoning Models | by Eshaan Gupta | Jun, 2025
    • Streamline Your Workflow With This $30 Microsoft Office Professional Plus 2019 License
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»Understanding Skewness in Machine Learning: A Beginner’s Guide with Python Example | by Codes With Pankaj | Mar, 2025
    Machine Learning

    Understanding Skewness in Machine Learning: A Beginner’s Guide with Python Example | by Codes With Pankaj | Mar, 2025

    FinanceStarGateBy FinanceStarGateMarch 11, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Machine studying fashions typically carry out higher when the enter knowledge is symmetric or near a standard distribution. Right here’s why skewness generally is a drawback:

    • Biased Predictions: Skewed knowledge can lead fashions to focus an excessive amount of on the “tail” values, skewing predictions.
    • Assumption Violation: Algorithms like linear regression assume normality for optimum outcomes.
    • Outliers: Skewed distributions typically have outliers, which might confuse fashions.

    To repair this, we preprocess the info by decreasing skewness — generally utilizing transformations like logarithms, sq. roots, or energy transformations. Don’t fear if that sounds complicated; we’ll see it in motion quickly !

    Let’s get hands-on! We’ll use Python to calculate skewness and visualize it. For this tutorial, you’ll want the next libraries:

    • numpy : For numerical operations.
    • pandas : For knowledge dealing with.
    • scipy : To calculate skewness.
    • matplotlib and seaborn : For plotting.

    Should you don’t have them put in, run this in your terminal :

    pip set up numpy pandas scipy matplotlib seaborn

    Let’s create a positively skewed dataset (simulating earnings) and analyze it.

    # Import libraries
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from scipy.stats import skew

    # Set random seed for reproducibility
    np.random.seed(42)

    # Generate a positively skewed dataset (income-like)
    knowledge = np.random.exponential(scale=1000, measurement=1000) # Exponential distribution is of course skewed

    # Convert to a Pandas Collection for simpler dealing with
    data_series = pd.Collection(knowledge)

    # Calculate skewness
    skewness = skew(data_series)
    print(f"Skewness of the dataset: {skewness:.3f}")

    # Plot the distribution
    plt.determine(figsize=(10, 6))
    sns.histplot(data_series, kde=True, shade='blue')
    plt.title('Distribution of Artificial Revenue Information (Optimistic Skew)', fontsize=14)
    plt.xlabel('Revenue', fontsize=12)
    plt.ylabel('Frequency', fontsize=12)
    plt.present()

    Output Rationalization:

    • The skewness worth shall be optimistic (e.g., round 2.0), confirming a right-skewed distribution.
    • The histogram will present a protracted tail on the best, typical of earnings knowledge.

    One widespread solution to cut back skewness is by making use of a log transformation. This compresses giant values and spreads out smaller ones, making the distribution extra symmetric. Let’s strive it !

    # Apply log transformation (add 1 to keep away from log(0) errors)
    log_data = np.log1p(data_series)

    # Calculate new skewness
    log_skewness = skew(log_data)
    print(f"Skewness after log transformation: {log_skewness:.3f}")

    # Plot the remodeled distribution
    plt.determine(figsize=(10, 6))
    sns.histplot(log_data, kde=True, shade='inexperienced')
    plt.title('Distribution After Log Transformation (Diminished Skew)', fontsize=14)
    plt.xlabel('Log(Revenue)', fontsize=12)
    plt.ylabel('Frequency', fontsize=12)
    plt.present()

    Output Rationalization:

    • The skewness worth will drop considerably (nearer to 0), indicating a extra symmetric distribution.
    • The histogram will look extra bell-shaped — nearer to a standard distribution.

    Think about you’re constructing a mannequin to foretell home costs. The “value” column in your dataset is commonly positively skewed as a result of just a few homes are extraordinarily costly. Should you feed this skewed knowledge immediately right into a linear regression mannequin, the predictions is likely to be off. By making use of a log transformation (as we did above), you may normalize the info, enhancing the mannequin’s accuracy.

    Right here’s a fast guidelines for coping with skewness:

    1. Examine Skewness: Use skew() to measure it.
    2. Visualize: Plot histograms or KDEs to verify.
    3. Remodel: Apply log, sq. root, or Field-Cox transformations primarily based on the skew kind.
    4. Validate: Re-check skewness and distribution after transformation.
    • Skewness measures the asymmetry of your knowledge.
    • Optimistic skew has a protracted proper tail; unfavourable skew has a protracted left tail.
    • Many ML fashions want symmetric knowledge, so decreasing skewness is a key preprocessing step.
    • Python libraries like scipy and seaborn make it straightforward to research and visualize skewness.

    Download All Code

    Congratulations on making it by way of this tutorial from Codes With Pankaj Chouhan ! Now that you simply perceive skewness, strive experimenting with different datasets (e.g., from Kaggle) and transformations like sq. root or Field-Cox. Within the subsequent tutorial on www.codeswithpankaj.com, we’ll discover learn how to deal with lacking knowledge in machine studying — one other important ability for newcomers.

    Have questions or suggestions? Drop a remark under or join with me on my web site. Comfortable coding!

    Pankaj Chouhan



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhat Business Leaders Can Learn from Alex Ferguson’s Client-First Mentality
    Next Article AGI is suddenly a dinner table topic
    FinanceStarGate

    Related Posts

    Machine Learning

    Army Dog Center Pakistan 03457512069 | by Army Dog Center Pakistan 03008751871 | Jun, 2025

    June 15, 2025
    Machine Learning

    Technologies. Photo by Markus Spiske on Unsplash | by Abhinav Shrivastav | Jun, 2025

    June 15, 2025
    Machine Learning

    A Journey to the Land of Peace: Our Visit to Hiroshima | by Pokharel vikram | Jun, 2025

    June 15, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    Top 7 Machine Learning Frameworks Compared

    March 2, 2025

    How do we withdraw funds without running out of money?

    February 4, 2025

    Clustering in Machine Learning: A journey through the K-Means Algorithm | by Divakar Singh | Mar, 2025

    March 19, 2025

    Deep Learning Design Patterns in Practice | by Everton Gomede, PhD | May, 2025

    May 11, 2025

    AI in Business: How It’s Helping, Hurting, and What I’m Doing About It | by Rahul Kadiyala | Apr, 2025

    April 17, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    How to Make Money Without a Job

    February 2, 2025

    Inside Amsterdam’s high-stakes experiment to create fair welfare AI

    June 11, 2025

    Boost Productivity With This Adjustable Stand With Port Hub for Just $100

    April 26, 2025
    Our Picks

    The Rise of Small Language Models: The Future of AI Isn’t Always Bigger | by Bolaji Adebayo Ikotun | May, 2025

    May 15, 2025

    FedEx Board Member David Steiner to Be Postmaster General

    May 9, 2025

    A Simple Implementation of the Attention Mechanism from Scratch

    April 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.