ML Threshold Tuning with 5-Fold Stratified Cross-Validation! | by Hourglassdatalab

Scikit-learn 1.5+ launched a brilliant handy solution to tune choice thresholds immediately utilizing TunedThresholdClassifierCV.

Let’s stroll via an instance utilizing the favored breast most cancers dataset.

In medical purposes like breast most cancers detection, false negatives (missed constructive instances) are much more harmful than false positives. That is why we’ll use the F2 rating, which weights recall larger than precision, making it ideally suited for this situation.

Let’s dive into the code:

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import TunedThresholdClassifierCV, train_test_split
from sklearn.metrics import make_scorer, fbeta_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegressionimport warnings
warnings.filterwarnings('ignore')
RANDOM_STATE = 000
knowledge = load_breast_cancer()
X, y = knowledge.knowledge, knowledge.goal
# Outline an F2 scorer to emphasise recall (beta=2)
f2_scorer = make_scorer(fbeta_score, beta=2)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, stratify=y, random_state=RANDOM_STATE
)
# Standardize the function knowledge
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.remodel(X_test)
# Arrange a base classifier
base_clf = LogisticRegression()
base_clf.match(X_train, y_train)
base_pred = base_clf.predict(X_test)
print("F2 rating on take a look at (unseen) set:", fbeta_score(y_test, base_pred, beta=2)

F2 rating on take a look at (unseen) set: 0.9644194756554307

# Configure the TunedThresholdClassifierCV to optimize the F2 rating
tuned_clf = TunedThresholdClassifierCV(
estimator=base_clf,
scoring=f2_scorer,
cv=5,
refit=True,
store_cv_results=True,
random_state=RANDOM_STATE
)tuned_clf.match(X_train, y_train)
tuned_preds = tuned_clf.predict(X_test)
# Print the optimum threshold decided by cross-validation
print("Optimum threshold:", tuned_clf.best_threshold_)
print("F2 rating on take a look at (unseen) set:", fbeta_score(y_test, tuned_preds, beta=2))

Optimum threshold: 0.40404039630244054
F2 rating on take a look at (unseen) set: 0.9776536312849162

Given the scientific implications, lacking a most cancers analysis carries larger penalties than false positives, making the F2 rating a perfect analysis metric as a consequence of its give attention to recall. Our experimentation with transferring the brink efficiently boosted the F2 rating from 0.9644 to 0.9777 on the take a look at set, validating the worth of threshold tuning.

For extra detailed details about API utilization, check with scikit-learn’s docs.

Source link

The Logic Gap: AI Insights vs. Policy Actions | by Sheedeh Rahimi | May, 2025

Why OPENAI CODEX Might Be the Technological COVID We Missed | by Abay Serkebayev | May, 2025

Model Context Protocol (MCP): The Universal Connector for AI Applications | by AJG | May, 2025

Are You Ready to Go Viral? 4 Ways to Navigate Overnight Growth

Accepting A Preemptive Offer vs. Listing On The Open Market

Protecting Digital Assets with Advanced Technology

DeepSeek-R1 İnceleme. Geçtiğimiz haftalarda OpenAI’nın o1–127… | by Ümit | Feb, 2025

A Simple Implementation of the Attention Mechanism from Scratch

Most Popular

Exploring the Slope of Straight Lines in Differential Calculus | by Yokeswaran | Mar, 2025

Smash Your Way to Success with an iSmash Rage Room Franchise

Newton’s Method in Focus: How a Machine Learning Lesson Sparked AI Crypto Market Shifts on March 13, 2025 | by ButerinBard | Mar, 2025

Our Picks

How to Solve Machine Learning Case Studies: Cracking Fraud Detection in Data Science Interviews | by Ancienthorse | Feb, 2025

Fiveonefour Unveils Aurora AI Agents for Data Engineering

The Easy Way to Make Managing Your Rental Property Stress Free is Just $39

ML Threshold Tuning with 5-Fold Stratified Cross-Validation! | by Hourglassdatalab | Apr, 2025

Related Posts