Close Menu
    Trending
    • Creating Business Value with AI — What I Learned from Cornell’s “Designing and Building AI Solutions” Program (Part 1) | by Aaron (Youshen) Lim | May, 2025
    • The Easy Way to Keep Tabs on Site Status and Downtime
    • The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics
    • The Intelligent Relay: How Agentic AI and RPA are Reinventing the Supply Chain | by Vikas Kulhari | May, 2025
    • How the 3 Worst Decisions I Ever Made Turned Into Success
    • ACP: The Internet Protocol for AI Agents
    • A new AI translation system for headphones clones multiple voices simultaneously
    • ViT from scratch. Foreword | by Tyler Yu | May, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»How can a decision tree choose a film? Gini Index and Entropy | by Michael Reppion | May, 2025
    Machine Learning

    How can a decision tree choose a film? Gini Index and Entropy | by Michael Reppion | May, 2025

    FinanceStarGateBy FinanceStarGateMay 9, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    It’s a typical incidence; I’ll sit down and try to decide on a movie from certainly one of many streaming providers equivalent to Netflix or Amazon. Given the quantity of selection and the good suggestion algorithms, I’ll nonetheless discover myself endlessly looking the apps to discover a hidden gem.

    For greater than 10 years I’ve been ranking and reviewing movies and amassed round 1800 rankings on Letterboxd. This must be an affordable pattern measurement to achieve perception into what kinds of movies I choose.

    Selecting a movie to look at could be thought of a classification downside and may very well be decided by utilizing options about previous rankings. Let’s say the goal metric is that the expected movie ranking is Excessive (>3) or Low (≤3) and the options to tell this prediction are elements like 12 months of launch, style or runtime. (Be aware different individuals might use rankings in a different way and will class a 2 or 3 as value watching. This comes down to non-public choice)

    Distribution of my Letterboxd movie rankings

    One methodology that can be utilized for Classification (and regression) issues is a Choice Tree. A Choice Tree is a machine-learning methodology that creates binary splits on options to map observations to areas from these splits.

    The principle benefit of a call tree is it’s a quick and versatile option to mannequin and will help interpret the results options have merely. The opposite benefit is that though it applies linear splits, it can be helpful for non-linear relationships too as there could be many splits occurring. One draw back is that it may very simply overfit. This implies it’s irrelevant or unsuitable after we wish to apply it to new movies the mannequin has not seen earlier than.

    Visualising the choice tree often goes from the root node, the place we break up on a characteristic, to different inside nodes and ends on the terminal node, known as a leaf node. This chooses the category with nearly all of observations in that node.

    Let’s take a look at the discharge 12 months for example. A speculation may very well be that I choose newer to older movies on common.

    There isn’t any clear break up earlier than a sure 12 months the place there are solely good or unhealthy movies. Nonetheless it does appear that newer movies usually tend to have the next ranking.

    If we create a resolution tree to foretell the ranking utilizing just one break up with coaching and check information, we get:

    A choice tree with a single break up

    The accuracy of this mannequin with just one break up is 60%. Though poor, that is an enchancment over randomly selecting Low or Excessive (ratio of excessive is 54%)

    How has the choice tree determined the absolute best break up is after 2012? In classification, the principle goal is to create a mannequin that can precisely predict the category, minimising the error of getting these predictions unsuitable.

    A method is to make use of the misclassification error price as the standards for every break up. That is outlined as:

    Incorrect Predictions / Complete Predictions

    We are able to take a look at the check information in a confusion matrix. This compares the expected towards the true labels, proven beneath.

    Confusion Matrix for single break up resolution tree

    On this case, our Misclassification Error Fee is:

    39 + 147 / (114 + 147 +39 +166) =>

    186 / 466 = 0.399

    We are able to additionally get the Accuracy which is 1 — misclassification so 0.601

    The mannequin may try to break up the information many occasions to seek out the bottom doable misclassification price. Nonetheless a significant draw back is that it doesn’t differ between splits that produce nodes with solely observations from one class vs splits that produce a blended node. That is known as the purity of the node.

    There are two predominant standards generally used for deciding the place to separate the chosen characteristic: Gini Index or Entropy. Each give a greater measure of how pure a node is. One more reason to not use a misclassification price is that it isn’t differentiable, not like Gini and Entropy

    The Gini Index is calculated by utilizing the proportion of every class in every node:

    Gini Index = 1 — (proportion of every class in every node)²

    Within the above resolution tree for the Low class leaf node offers:

    1 — ( 423 / 917 )² — (494 / 917)² = 0.497

    The Gini Index can be known as Gini purity and interpreted as the possibility of misclassifying a random pattern, so we wish to minimise this.

    An index of 0.5 is an equal variety of every class within the node. For the Low leaf node this isn’t the very best resolution as we’ve got a really impure node to foretell with.

    Equally, Entropy is outlined as:

    Entropy = — (proportion of sophistication in node) * log₂(proportion of sophistication in node)

    Calculating this once more for the Low-class leaf node offers 0.997.

    This Entropy could be considered the quantity of uncertainty within the node and is linked to the data gained for a category.

    What every of those tells us is that the mannequin is unhealthy. There is just one break up and one characteristic getting used. Attempting additional splits with solely a 12 months didn’t show helpful, so maybe discovering extra info and options in regards to the movies rated will create a greater mannequin.

    We explored how a call tree tried to foretell whether or not to look at a movie and the way the tree chooses the splits primarily based on both the Gini Index or Entropy.

    For extra studying about Choice Timber, I like to recommend The Elements of Statistical Learning.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhy Your Company’s AI Strategy Is Probably Backwards
    Next Article Model Compression: Make Your Machine Learning Models Lighter and Faster
    FinanceStarGate

    Related Posts

    Machine Learning

    Creating Business Value with AI — What I Learned from Cornell’s “Designing and Building AI Solutions” Program (Part 1) | by Aaron (Youshen) Lim | May, 2025

    May 9, 2025
    Machine Learning

    The Intelligent Relay: How Agentic AI and RPA are Reinventing the Supply Chain | by Vikas Kulhari | May, 2025

    May 9, 2025
    Machine Learning

    ViT from scratch. Foreword | by Tyler Yu | May, 2025

    May 9, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Why AI Still Struggles with Realism: Lessons from the Human Brain | by nemomen | Mar, 2025

    March 29, 2025

    New method efficiently safeguards sensitive AI training data | MIT News

    April 11, 2025

    How DeepSeek ripped up the AI playbook—and why everyone’s going to follow it

    February 1, 2025

    Smart Passive Income Merge Their Two Entrepreneur Communities

    February 17, 2025

    What Germany Currently Is Up To, Debt-Wise

    March 22, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    AI and Quantum Neuro-Processing: A New Frontier | by Carolecameroninge | Apr, 2025

    April 12, 2025

    How do you teach an AI model to give therapy?

    April 1, 2025

    Image Captioning, Transformer Mode On

    March 8, 2025
    Our Picks

    ViT from scratch. Foreword | by Tyler Yu | May, 2025

    May 9, 2025

    Google’s New AI System Outperforms Physicians in Complex Diagnoses

    April 17, 2025

    Entrepreneur Ranked Baya Bar the #1 Açai Bowl Franchise

    March 12, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.