Close Menu
    Trending
    • Patterns at Your Fingertips: A Practitioner’s Journey into Fingerprint Classification | by Everton Gomede, PhD | Jun, 2025
    • Get Microsoft 365 for Six People a Year for Just $100
    • The Age of Thinking Machines: Are We Ready for AI with a Mind of Its Own? | by Mirzagalib | Jun, 2025
    • Housing Market Hits a Record, More Sellers Than Buyers
    • Gaussian-Weighted Word Embeddings for Sentiment Analysis | by Sgsahoo | Jun, 2025
    • How a Firefighter’s ‘Hidden’ Side Hustle Led to $22M in Revenue
    • Hands-On CUDA ML Setup with PyTorch & TensorFlow on WSL2
    • 5 Lessons I Learned the Hard Way About Business Success
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»Bayesian Optimization for Hyperparameter Tuning of Deep Learning Models
    Artificial Intelligence

    Bayesian Optimization for Hyperparameter Tuning of Deep Learning Models

    FinanceStarGateBy FinanceStarGateMay 27, 2025No Comments13 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    to tune hyperparamters of deep studying fashions (Keras Sequential model), as compared with a conventional strategy — Grid Search.

    Bayesian Optimization

    Bayesian Optimization is a sequential design technique for international optimization of black-box capabilities.

    It’s significantly well-suited for capabilities which are costly to guage, lack an analytical kind, or have unknown derivatives.
    Within the context of hyperparameter optimization, the unknown operate will be:

    • an goal operate,
    • accuracy worth for a coaching or validation set,
    • loss worth for a coaching or validation set,
    • entropy gained or misplaced,
    • AUC for ROC curves,
    • A/B check outcomes,
    • computation price per epoch,
    • mannequin measurement,
    • reward quantity for reinforcement studying, and extra.

    Not like conventional optimization strategies that depend on direct operate evaluations, Bayesian Optimization builds and refines a probabilistic mannequin of the target operate, utilizing this mannequin to intelligently choose the following analysis level.

    The core thought revolves round two key parts:

    1. Surrogate Mannequin (Probabilistic Mannequin)

    The mannequin approximates the unknown goal operate (f(x)) to a surrogate mannequin similar to Gaussian Course of (GP).

    A GP is a non-parametric Bayesian mannequin that defines a distribution over capabilities. It present:

    • a prediction of the operate worth at a given level μ(x) and
    • a measure of uncertainty round that prediction σ(x), usually represented as a confidence interval.

    Mathematically, for a Gaussian Course of, the predictions at an unobserved level (x∗), given noticed information (X, y), are usually distributed:

    the place

    • μ(x∗): the imply prediction and
    • σ²(x∗): the predictive variance.

    2. Acquisition Perform

    The acquisition operate determines a subsequent level (x_t+1)​ to guage by quantifying how “promising” a candidate level is for enhancing the target operate, by balancing:

    • Exploration (Excessive Variance): Sampling in areas with excessive uncertainty to find new promising areas and
    • Exploitation (Excessive Imply): Sampling in areas the place the surrogate mannequin predicts excessive goal values.

    Frequent acquisition capabilities embrace:

    Chance of Enchancment (PI)
    PI selects the purpose that has the very best likelihood of enhancing upon the present finest noticed worth (f(x+)):

    the place

    • Φ: the cumulative distribution operate (CDF) of the usual regular distribution, and
    • ξ≥0 is a trade-off parameter (exploration vs. exploitation).

    ξ controls a trade-off between exploration and exploitation, and a bigger ξ encourages extra exploration.

    Anticipated Enchancment (EI)
    Quantifies the anticipated quantity of enchancment over the present finest noticed worth:

    Assuming a Gaussian Course of surrogate, the analytical type of EI is outlined:

    the place ϕ is the likelihood density operate (PDF) of the usual regular distribution.

    EI is likely one of the most generally used acquisition capabilities. EI additionally considers the magnitude of the advance not like PI.

    Higher Confidence Certain (UCB)
    UCB balances exploitation (excessive imply) and exploration (excessive variance), specializing in factors which have each a excessive predicted imply and excessive uncertainty:

    κ≥0 is a tuning parameter that controls the stability between exploration and exploitation.

    A bigger κ places extra emphasis on exploring unsure areas.

    Bayesian Optimization Technique (Iterative Course of)

    Bayesian Optimization iteratively updates the surrogate mannequin and optimizes the acquisition operate.

    It guides the search in the direction of optimum areas whereas minimizing the variety of costly goal operate evaluations.

    Now, allow us to see the method with code snippets utilizing KerasTuner for a fraud detection activity (binary classification the place y=1 (fraud) prices us essentially the most.)

    Step 1. Initialization

    Initializes the method by sampling the hyperparameter area randomly or low-discrepancy sequencing (ususally choosing up 5 to 10 factors) to get an thought of the target operate.

    These preliminary observations are used to construct the primary model of the surrogate mannequin.

    As we construct Keras Sequential mannequin, we first outline and compile the mannequin, then outline theBayesianOptimization tuner with the variety of preliminary factors to evaluate.

    import keras_tuner as kt
    import tensorflow as tf
    from tensorflow import keras
    from keras.fashions import Sequential
    from keras.layers import Dense, Dropout, Enter
    
    # initialize a Keras Sequential mannequin
    mannequin = Sequential([
        Input(shape=(self.input_shape,)),
        Dense(
            units=hp.Int(
                'neurons1', min_value=20, max_value=60, step=10),
                 activation='relu'
        ),
        Dropout(
            hp.Float(
                 'dropout_rate1', min_value=0.0, max_value=0.5, step=0.1
        )),
        Dense(
            units=hp.Int(
                'neurons2', min_value=20, max_value=60, step=10),
                activation='relu'
        ),
        Dropout(
             hp.Float(
                  'dropout_rate2', min_value=0.0, max_value=0.5, step=0.1
        )),
        Dense(
             1, activation='sigmoid', 
             bias_initializer=keras.initializers.Constant(
                 self.initial_bias_value
            )
        )
    ])
    
    # compile the mannequin
    mannequin.compile(
        optimizer=optimizer,
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            keras.metrics.Precision(name='precision'),
            keras.metrics.Recall(name='recall'),
            keras.metrics.AUC(name='auc')
        ]
    )
    
    # outline a tuner with the intial factors
    tuner = kt.BayesianOptimization(
        hypermodel=custom_hypermodel,
        goal=kt.Goal("val_recall", route="max"), 
        max_trials=max_trials,
        executions_per_trial=executions_per_trial,
        listing=listing,
        project_name=project_name,
        num_initial_points=num_initial_points,
        overwrite=True,
    )

    num_initial_points defines what number of preliminary, randomly chosen hyperparameter configurations ought to be evaluated earlier than the algorithm begins to information the search.

    If not given, KerasTuner takes a default worth: 3 * dimensions of the hyperparameter area.

    Step 2. Surrogate Mannequin Coaching

    Construct and practice the probabilistic mannequin (surrogate mannequin, usually a Gaussian Course of or a Tree-structured Parzen Estimator for Bayesian Optimization) utilizing all obtainable noticed datas factors (enter values and their corresponding output values) to approximate the true operate.

    The surrogate mannequin gives the imply prediction (μ(x)) (more than likely from the Gaussian course of) and uncertainty (σ(x)) for any unobserved level.

    KerasTuner makes use of an inside surrogate mannequin to mannequin the connection between hyperparameters and the target operate’s efficiency.

    After every goal operate analysis by way of practice run, the noticed information factors (hyperparameters and validation metrics) are used to replace the inner surrogate mannequin.

    Step 3. Acquisition Perform Optimization

    Use an optimization algorithm (usually an affordable, native optimizer like L-BFGS and even random search) to seek out the following level (x_t+1)​ that maximizes the chosen acquisition operate.

    This step is essential as a result of it identifies essentially the most promising subsequent candidate for analysis by balancing exploration (making an attempt new, unsure areas of the hyperparameter area) and exploitation (refining promising areas).

    KerasTuner makes use of an optimization technique similar to Anticipated Enchancment or Higher Confidence Certain to seek out the following set of hyperparameters.

    Step 4. Goal Perform Analysis

    Consider the true, costly goal operate (f(x)) on the new candidate level (x_t+1)​.

    The Keras mannequin is educated utilizing the offered coaching datasets and evaluated on the validation information. We set val_recall as the results of this analysis.

    def match(self, hp, mannequin=None, *args, **kwargs):
        mannequin = self.construct(hp=hp) if not mannequin else mannequin
        batch_size = hp.Alternative('batch_size', values=[16, 32, 64])
        epochs = hp.Int('epochs', min_value=50, max_value=200, step=50)
      
        return mannequin.match(
            batch_size=batch_size,
            epochs=epochs,
            class_weight=self.class_weights_dict,
            *args,
            **kwargs
        )

    Step 5. Knowledge Replace

    Add the newly noticed information level (x_(t+1​), f(x_(t+1)​)) to the set of observations.

    Step 6. Iteration

    Repeat Step 2 — 5 till a stopping criterion is met.

    Technically, the tuner.search() technique orchestrates all the Bayesian optimization course of from Step 2 to five:

    tuner.search(
        X_train, y_train,
        validation_data=(X_val, y_val),
        callbacks=[early_stopping_callback]
    )
    
    best_hp = tuner.get_best_hyperparameters(num_trials=1)[0]
    best_keras_model_from_tuner = tuner.get_best_models(num_models=1)[0]

    The tactic repeatedly performs these steps till the max_trials restrict is reached or different inside stopping standards similar to early_stopping_callback are met.

    Right here, we set recall as our key metrics to penalize the misclassification as False Constructive prices us essentially the most within the fraud detection case.

    Study Extra: KerasTuner Source Code

    Outcomes

    The Bayesian Optimization course of aimed to boost the mannequin’s efficiency, primarily by maximizing recall.

    The tuning efforts yielded a trade-off throughout key metrics, leading to a mannequin with considerably improved recall on the expense of some precision and general accuracy in comparison with the preliminary state:

    • Recall: 0.9055 (0.6595 -> 0.6450) — 0.8400
    • Precision: 0.6831 (0.8338 -> 0.8113) — 0.6747
    • Accuracy: 0.7427 (0.7640 -> 0.7475) — 0.7175
      (From improvement (coaching / validation mixed) to check part)
    Historical past of Studying Charge within the Gaussian Optimization Course of

    Greatest performing hyperparameter set:

    • neurons1: 40
    • dropout_rate1: 0.0
    • neurons2: 20,
    • dropout_rate2: 0.4
    • optimizer_name: lion,
    • learning_rate: 0.004019639999963362
    • batch_size: 64
    • epochs: 200
    • beta_1_lion: 0.9
    • beta_2_lion: 0.99

    Optimum Neural Community Abstract:

    Optimum Neural Community Abstract (Bayesian Optimization)

    Key Efficiency Metrics:

    • Recall: The mannequin demonstrated a major enchancment in recall, growing from an preliminary worth of roughly 0.66 (or 0.645) to 0.8400. This means the optimized mannequin is notably higher at figuring out constructive circumstances.
    • Precision: Concurrently, precision skilled a lower. Ranging from round 0.83 (or 0.81), it settled at 0.6747 post-optimization. This means that whereas extra constructive circumstances are being recognized, the next proportion of these identifications is likely to be false positives.
    • Accuracy: The general accuracy of the mannequin additionally noticed a decline, transferring from an preliminary 0.7640 (or 0.7475) right down to 0.7175. That is in keeping with the noticed trade-off between recall and precision, the place optimizing for one usually impacts the others.

    Evaluating with Grid Search

    We tuned a Keras Sequential mannequin with Grid Search on Adam optimizer for comparability:

    import tensorflow as tf
    from tensorflow import keras
    from keras.fashions import Sequential
    from keras.layers import Dense, Dropout, Enter
    from sklearn.model_selection import GridSearchCV
    from scikeras.wrappers import KerasClassifier
    
    param_grid = {
        'model__learning_rate': [0.001, 0.0005, 0.0001],
        'model__neurons1': [20, 30, 40],
        'model__neurons2': [20, 30, 40],
        'model__dropout_rate1': [0.1, 0.15, 0.2],
        'model__dropout_rate2': [0.1, 0.15, 0.2],
        'batch_size': [16, 32, 64],
        'epochs': [50, 100],
    }
    
    input_shape = X_train.form[1]
    initial_bias = np.log([np.sum(y_train == 1) / np.sum(y_train == 0)])
    class_weights = class_weight.compute_class_weight(
        class_weight='balanced',
        lessons=np.distinctive(y_train),
        y=y_train
    )
    class_weights_dict = dict(zip(np.distinctive(y_train), class_weights))
    
    keras_classifier = KerasClassifier(
        mannequin=create_model,
        model__input_shape=input_shape,
        model__initial_bias_value=initial_bias,
        loss='binary_crossentropy',
        metrics=[
            'accuracy',
            keras.metrics.Precision(name='precision'),
            keras.metrics.Recall(name='recall'),
            keras.metrics.AUC(name='auc')
        ]
    )
    
    grid_search = GridSearchCV(
        estimator=keras_classifier,
        param_grid=param_grid,
        scoring='recall',
        cv=3,
        n_jobs=-1,
        error_score='elevate'
    )
    
    grid_result = grid_search.match(
        X_train, y_train,
        validation_data=(X_val, y_val),
        callbacks=[early_stopping_callback],
        class_weight=class_weights_dict
    )
    
    optimal_params = grid_result.best_params_
    best_keras_classifier = grid_result.best_estimator_

    Outcomes

    Grid Search tuning resulted in a mannequin with robust precision and good general accuracy, although with a decrease recall in comparison with the Bayesian Optimization strategy:

    • Recall: 0.8214(0.7735 -> 0.7150)— 0.7100
    • Precision: 0.7884 (0.8331 -> 0.8034) — 0.8304
    • Accuracy:0.8005 (0.8092 -> 0.7700) — 0.7825

    Greatest performing hyperparameter set:

    • neurons1: 40
    • dropout_rate1: 0.15
    • neurons2: 40
    • dropout_rate2: 0.1
    • learning_rate: 0.001
    • batch_size: 16
    • epochs: 100

    Optimum Neural Community Abstract:

    Optimum Neural Community Abstract (GridSearch CV)
    Analysis Throughout Coaching (Grid Search Tuning)
    Analysis Throughout Validation (Grid Search Tuning)
    Analysis Throughout Check (Grid Search Tuning)

    Grid Search Efficiency:

    • Recall: Achieved a recall of 0.7100, a slight lower from its preliminary vary (0.7735–0.7150).
    • Precision: Confirmed strong efficiency at 0.8304, an enchancment over its preliminary vary (0.8331–0.8034).
    • Accuracy: Settled at 0.7825, sustaining a strong general predictive functionality, barely decrease than its preliminary vary (0.8092–0.7700).

    Comparability with Bayesian Optimization:

    • Recall: Bayesian Optimization (0.8400) considerably outperformed Grid Search (0.7100) in figuring out constructive circumstances.
    • Precision: Grid Search (0.8304) achieved a lot larger precision than Bayesian Optimization (0.6747), indicating fewer false positives.
    • Accuracy: Grid Search’s accuracy (0.7825) was notably larger than Bayesian Optimization’s (0.7175).

    Normal Comparability with Grid Search

    1. Approaching the Search Area

    Bayesian Optimization

    • Clever/Adaptive: Bayesian Optimization builds a probabilistic mannequin (usually a Gaussian Course of) of the target operate (e.g., mannequin efficiency as a operate of hyperparameters). It makes use of this mannequin to foretell which hyperparameter combos are more than likely to yield higher outcomes.
    • Knowledgeable: It learns from earlier evaluations. After every trial, the probabilistic mannequin is up to date, guiding the search in the direction of extra promising areas of the hyperparameter area. This enables it to make “clever” decisions about the place to pattern subsequent, balancing exploration (making an attempt new, unknown areas) and exploitation (specializing in areas which have proven good outcomes).
    • Sequential: It sometimes operates sequentially, evaluating one level at a time and updating its mannequin earlier than choosing the following.

    Grid Search:

    • Exhaustive/Brute-force: Grid Search systematically tries each doable mixture of hyperparameter values from a pre-defined set of values for every hyperparameter. You specify a “grid” of values, and it evaluates each level on that grid.
    • Uninformed: It doesn’t use the outcomes of earlier evaluations to tell the choice of the following set of hyperparameters to attempt. Every mixture is evaluated independently.
    • Deterministic: Given the identical grid, it’ll all the time discover the identical combos in the identical order.

    2. Computational Price

    Bayesian Optimization

    • Extra Environment friendly: Designed to seek out optimum hyperparameters with considerably fewer evaluations in comparison with Grid Search. This makes it significantly efficient when evaluating the target operate (e.g., coaching a Machine Learning mannequin) is computationally costly or time-consuming.
    • Scalability: Usually scales higher to higher-dimensional hyperparameter areas than Grid Search, although it may possibly nonetheless be computationally intensive for very excessive dimensions as a result of overhead of sustaining and updating the probabilistic mannequin.

    Grid Search

    • Computationally Costly: Because the variety of hyperparameters and the vary of values for every hyperparameter enhance, the variety of combos grows exponentially. This results in very future instances and excessive computational price, making it impractical for giant search areas. That is also known as the “curse of dimensionality.”
    • Scalability: Doesn’t scale effectively with high-dimensional hyperparameter areas.

    3. Ensures and Exploration

    Bayesian Optimization

    • Probabilistic assure: It goals to seek out the worldwide optimum effectively, however it does not supply a tough assure like Grid Seek for discovering the best possible inside a discrete set. As a substitute, it converges probabilistically in the direction of the optimum.
    • Smarter exploration: Its stability of exploration and exploitation helps it keep away from getting caught in native optima and uncover optimum values extra successfully.

    Grid Search

    • Assured to seek out finest in grid: If the optimum hyperparameters are inside the outlined grid, Grid Search is assured to seek out them as a result of it tries each mixture.
    • Restricted exploration: It may well miss optimum values in the event that they fall between the discrete factors outlined within the grid.

    4. When to Use Which

    Bayesian Optimization:

    • Giant, high-dimensional hyperparameter areas: When evaluating fashions is pricey and you’ve got many hyperparameters to tune.
    • When effectivity is paramount: To seek out good hyperparameters shortly, particularly in conditions with restricted computational assets or time.
    • Black-box optimization issues: When the target operate is complicated, non-linear, and doesn’t have a identified analytical kind.

    Grid Search

    • Small, low-dimensional hyperparameter areas: When you will have only some hyperparameters and a restricted variety of values for every, Grid Search could be a easy and efficient selection.
    • When exhaustiveness is essential: In the event you completely have to discover each single outlined mixture.

    Conclusion

    The experiment successfully demonstrated the distinct strengths of Bayesian Optimization and Grid Search in hyperparameter tuning.
    Bayesian Optimization, by design, proved extremely efficient at intelligently navigating the search area and prioritizing a particular goal, on this case, maximizing recall.

    It efficiently achieved the next recall price (0.8400) in comparison with Grid Search, indicating its potential to seek out extra constructive cases.
    This functionality comes with an inherent trade-off, resulting in decreased precision and general accuracy.

    Such an end result is extremely useful in purposes the place minimizing false negatives is essential (e.g., medical analysis, fraud detection).
    Its effectivity, stemming from probabilistic modeling that guides the search in the direction of promising areas, makes it a most well-liked technique for optimizing expensive experiments or simulations the place every analysis is pricey.

    In distinction, Grid Search, whereas exhaustive, yielded a extra balanced mannequin with superior precision (0.8304) and general accuracy (0.7825).

    This means Grid Search was extra conservative in its predictions, leading to fewer false positives.

    In abstract, whereas Grid Search presents a simple and exhaustive strategy, Bayesian Optimization stands out as a extra refined and environment friendly technique able to find superior outcomes with fewer evaluations, significantly when optimizing for a particular, usually complicated, goal like maximizing recall in a high-dimensional area.

    The optimum selection of tuning technique finally is dependent upon the particular efficiency priorities and useful resource constraints of the appliance.


    Writer: Kuriko IWAI
    Portfolio / LinkedIn / Github
    Could 26, 2025


    All photographs, until in any other case famous, are by the creator.
    The article makes use of artificial information, licensed under Apache 2.0 for commercial use.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMaking Sense of CNNs: Breaking Down the Core Concepts | by Weronika Wojtak, PhD | May, 2025
    Next Article Starbucks Is Hiring a Pilot to Captain Its Company Aircraft
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    How to Build an MCQ App

    May 31, 2025
    Artificial Intelligence

    Simulating Flood Inundation with Python and Elevation Data: A Beginner’s Guide

    May 31, 2025
    Artificial Intelligence

    LLM Optimization: LoRA and QLoRA | Towards Data Science

    May 31, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Exploring IMDB Movies Dataset: Key Insights and Marketing Research Implications | by Pourushporwal | Feb, 2025

    February 17, 2025

    No More Tableau Downtime: Metadata API for Proactive Data Health

    March 21, 2025

    khhhhhggg

    March 23, 2025

    How Much Do Investment Bankers Make on Wall Street? Pay Data

    March 7, 2025

    How Much MrBeast Paid to Create Amazon’s ‘Beast Games’

    February 24, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Meta Lays Off Some Employees Working on Virtual Reality

    April 26, 2025

    How Entrepreneurs Can Stay Ahead in the Age of Instant News

    March 7, 2025

    Jamie Golombek’s primer on tax brackets, deductions and credits

    April 4, 2025
    Our Picks

    Real-Time Interactive Sentiment Analysis in Python

    May 8, 2025

    Don’t Build Up Relationship Debt!

    April 29, 2025

    CoreWeave Completes Acquisition of Weights & Biases

    May 7, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.