Close Menu
    Trending
    • Before You Invest, Take These Steps to Build a Strategy That Works
    • 📚 ScholarMate: An AI-Powered Learning Companion for Academic Documents | by ARNAV GOEL | Jun, 2025
    • Redesigning Customer Interactions: Human-AI Collaboration with Agentic AI
    • Want to Monetize Your Hobby? Here’s What You Need to Do.
    • Hopfield Neural Network. The main takeaway of this paper is a
 | by bhagya | Jun, 2025
    • Postman Unveils Agent Mode: AI-Native Development Revolutionizes API Lifecycle
    • The Hidden Dangers of Earning Risk-Free Passive Income
    • Want to Be a Stronger Mentor? Start With These 4 Questions
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Machine Learning»Naive Bayes Multi-Classifiers for Mixed Data Types | by Kuriko | May, 2025
    Machine Learning

    Naive Bayes Multi-Classifiers for Mixed Data Types | by Kuriko | May, 2025

    FinanceStarGateBy FinanceStarGateMay 27, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Allow us to discover the multi-classifier strategy.

    Objective

    • Predict the worth vary (0–3 (high-end)) from given smartphone specs

    Method

    1. Analyze enter options distribution varieties
    2. Select classifiers
    3. Mix and finalize the outcomes
    4. Consider the outcomes

    Dataset

    Mobile Price Classification, Kaggle

    • 3,000 datasets
    • 21 columns:
      ‘battery_power’, ‘blue’, ‘clock_speed’, ‘dual_sim’, ‘fc’, ‘four_g’, ‘int_memory’, ‘m_dep’, ‘mobile_wt’, ‘n_cores’, ‘computer’, ‘px_height’, ‘px_width’,‘ram’, ‘sc_h’, ‘sc_w’, ‘talk_time’, ‘three_g’, ‘touch_screen’, ‘wifi’, ‘price_range’, ‘id’

    Visualizing information

    After eradicating pointless column (`id`), we’ll plot frequency histograms and Quantile-Quantile plots (Q-Q plot) over the traditional distribution by enter options:

    Frequency Histogram and QQ Plots by Enter Function

    After resampling, we secured 250K information factors per class:

    Creating prepare/take a look at information

    X = df.drop('price_range', axis=1)
    y = df['price_range']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)
    print(X_train.form, y_train.form, X_test.form, y_test.form)
    (898186, 20) (898186,) (99799, 20) (99799,)

    We’ll prepare the next fashions based mostly on our (x|y) distributions:

    • GaussianNB
    • BernoulliNB
    • MultinomialNB

    First, classifying the enter options into binary, multinomial, and gaussian distribution:

    binary = [
    'blue',
    'dual_sim',
    'four_g',
    'three_g',
    'touch_screen',
    'wifi'
    ]
    multinomial = [
    'fc',
    'pc',
    'sc_h',
    'sc_w'
    ]
    gaussian = df.copy().drop(columns=[*target, *binary, *categorical, *multinomial], axis='columns').columns

    Practice every NB mannequin with corresponding enter options

    def train_nb_classifier(mannequin, X_train, X_test, model_name):
    mannequin.match(X_train, y_train)
    possibilities = mannequin.predict_proba(X_test)
    y_pred = np.argmax(possibilities, axis=1)
    accuracy = accuracy_score(y_test, y_pred)
    print(f'--------- {model_name} ---------')
    print(f"Averaged Chance Ensemble Accuracy: {accuracy:.4f}")
    print(classification_report(y_test, y_pred))
    return y_pred, possibilities, mannequin
    # gaussian
    scaler = MinMaxScaler()
    X_train_gaussian_scaled = scaler.fit_transform(X_train[gaussian])
    X_test_gaussian_scaled = scaler.remodel(X_test[gaussian])
    y_pred_gnb, prob_gnb, gnb = train_nb_classifier(mannequin=GaussianNB(), X_train=X_train_gaussian_scaled, X_test=X_test_gaussian_scaled, model_name='Gaussian')
    # bernoulli
    y_pred_bnb, prob_bnb, bnb = train_nb_classifier(mannequin=BernoulliNB(), X_train=X_train[binary], X_test=X_test[binary], model_name='Bernoulli')
    # multinomial
    y_pred_mnb, prob_mnb, mnb = train_nb_classifier(mannequin=MultinomialNB(), X_train=X_train[multinomial], X_test=X_test[multinomial], model_name='Multinomial')

    Observe that we solely reworked the gaussian information to keep away from skewing different information varieties.

    Combining outcomes

    Combining the outcomes utilizing common and weighted common:

    # mixed (common)
    prob_averaged = (prob_gnb + prob_bnb + prob_mnb) / 3
    y_pred_averaged = np.argmax(prob_averaged, axis=1)
    accuracy = accuracy_score(y_test, y_pred_averaged)
    print('--------- Common ---------')
    print(f"Averaged Chance Ensemble Accuracy: {accuracy:.4f}")
    print(classification_report(y_test, y_pred_averaged))
    # mixed (weight common)
    weight_gnb = 0.9 # greater weight
    weight_bnb = 0.05
    weight_mnb = 0.05
    prob_weighted_average = (weight_gnb * prob_gnb + weight_bnb * prob_bnb + weight_mnb * prob_mnb)
    y_pred_weighted = np.argmax(prob_weighted_average, axis=1)
    accuracy_weighted = accuracy_score(y_test, y_pred_weighted)
    print('--------- Weighted Common ---------')
    print(f"Weighted Averaged Ensemble Accuracy: {accuracy_weighted:.4f}")
    print(classification_report(y_test, y_pred_weighted))
    Accuracy reviews (common, weight common)

    Stacking

    Optionally, we’ll stack the outcomes with Logistic Regression as meta-learner.

    LR is among the widespread meta-learner choices as a result of its simplicity, interpretability, effectiveness with chance inputs from classifiers, and regularization.

    X_meta_test = np.hstack((prob_gnb, prob_bnb, prob_mnb))
    prob_train_gnb = gnb.predict_proba(X_train_gaussian_scaled)
    prob_train_bnb = bnb.predict_proba(X_train[binary])
    prob_train_mnb = mnb.predict_proba(X_train[multinomial])
    X_meta_train = np.hstack((prob_train_gnb, prob_train_bnb, prob_train_mnb))
    meta_learner = LogisticRegression(random_state=42, solver='liblinear', multi_class='auto')
    meta_learner.match(X_meta_train, y_train)
    y_pred_stacked = meta_learner.predict(X_meta_test)
    prob_stacked = meta_learner.predict_proba(X_meta_test)
    accuracy_stacked = accuracy_score(y_test, y_pred_stacked)
    print('--------- Meta learner (logistic regression) ---------')
    print(f"Stacked Ensemble Accuracy: {accuracy_stacked:.4f}")
    print(classification_report(y_test, y_pred_stacked))
    Accuracy Report — Stacking

    Stacking performs the most effective, whereas Multinomial and Bernoulli individually weren’t environment friendly predictors of the ultimate final result.

    That is primarily because of the argmax operation the place the mannequin chooses a single class with the very best chance as its remaining resolution.

    Within the course of, the underlying chance distributions from Multinomial and Bernoulli are disregarded. Therefore, these particular person fashions should not “environment friendly” on their very own to supply one extremely assured prediction.

    But, after we mixed the outcomes with the meta-learner, it exploited extra info on such distributions from Multinomial and Bernoulli in a stacking ensemble.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhat Living in a 5-Minute City Taught Me About Building Better Businesses
    Next Article Code Agents: The Future of Agentic AI
    FinanceStarGate

    Related Posts

    Machine Learning

    📚 ScholarMate: An AI-Powered Learning Companion for Academic Documents | by ARNAV GOEL | Jun, 2025

    June 4, 2025
    Machine Learning

    Hopfield Neural Network. The main takeaway of this paper is a
 | by bhagya | Jun, 2025

    June 4, 2025
    Machine Learning

    The Next Frontier of Human Performance | by Lyrah | Jun, 2025

    June 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    From Code to Creativity: Building Multimodal AI Apps with Gemini and Imagen | by Hiralkotwani | May, 2025

    May 15, 2025

    Can Automation Technology Transform Supply Chain Management in the Age of Tariffs?

    June 3, 2025

    Enhance your AP automation workflows

    May 22, 2025

    FEATURE ENGINEERING for Machine Learning | by Yasin Sutoglu | May, 2025

    May 25, 2025

    Her ‘No New Things’ Challenge Paid Off $22k Debt, Saved $36k

    April 14, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Interpreting Data. Statistical tests are mathematical
 | by æĄœæș€ 集 | Feb, 2025

    February 16, 2025

    Dell Issues Strict RTO Mandate for Most Employees

    February 1, 2025

    The Revolution of Reasoning in AI: How Advanced Models Think Before They Speak | by Mohammad Yaseen | Mar, 2025

    March 29, 2025
    Our Picks

    Data as a Product: The Evolution of Data Delivery | by Tushar Mahuri | May, 2025

    May 7, 2025

    How can I disinherit my kids and leave it all to an animal shelter?

    February 4, 2025

    How to Fire Bad Clients the Right Way

    April 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.