How I Trained a Machine Learning Model to Predict Car Prices (And How You Can Too) | by Ishan Shrestha

A beginner-friendly information to coaching a automotive value prediction mannequin utilizing Python, pandas, and scikit-learn — defined clearly.

Shopping for or promoting a used automotive will be difficult. With so many makes, fashions, gas sorts, and situations, estimating a good value isn’t simple — even for skilled automotive sellers. That’s the place machine studying is available in.

On this put up, I’ll stroll you thru how I skilled a machine studying mannequin to foretell the value of a automotive based mostly on its particulars like make, mannequin, yr, mileage, gas sort, and extra. In the event you’re new to ML, don’t fear — I’ll maintain issues easy and clarify all the things clearly.

Fundamental Python information
Python put in (ideally with pip and digital setting)
Jupyter Pocket book or any IDE (like VSCode)
Some knowledge (we’ll use a CSV file with automotive listings)
Libraries: pandas, scikit-learn, and pickle

Run the next instructions within the your terminal.

mkdir car-price-predictor
cd car-price-predictor
mkdir ml_model
mkdir backend

Save the next content material as ml_model/car_data.csv

make,mannequin,yr,mileage,fuelType,transmission,ownerCount,value
Toyota,Corolla,2015,70000,Petrol,Guide,1,350000
Hyundai,i20,2018,45000,Petrol,Computerized,1,400000
Honda,Civic,2017,60000,Diesel,Guide,2,420000
Maruti,Swift,2019,30000,Petrol,Guide,1,380000
Ford,Ecosport,2016,85000,Diesel,Guide,2,360000
Volkswagen,Polo,2017,40000,Petrol,Guide,1,390000
Mahindra,Scorpio,2015,90000,Diesel,Guide,3,410000
Renault,Kwid,2020,15000,Petrol,Guide,1,300000
Tata,Nexon,2021,10000,Petrol,Computerized,1,520000
Kia,Seltos,2022,5000,Diesel,Guide,1,800000

or if you wish to prepare extra knowledge you may comprises automotive particulars, like this:

and transfer file to car_data.cvs

Prepare your mannequin ml_model/train_models.py

Right here’s the complete code — adopted by detailed explanations of every step.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pickle
# Load knowledge
knowledge = pd.read_csv("car_data.csv")
X = knowledge.drop(columns=["price"])
y = knowledge["price"]
# Outline preprocessing
categorical_cols = ["make", "model", "fuelType", "transmission"]
numeric_cols = ["year", "mileage", "ownerCount"]
preprocessor = ColumnTransformer([
("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols)
], the rest="passthrough")
# Outline fashions
fashions = {
"random_forest": RandomForestRegressor(n_estimators=200, random_state=42),
"bagging": BaggingRegressor(n_estimators=100, random_state=42),
"linear": LinearRegression(),
"ridge": Ridge(alpha=1.0),
"lasso": Lasso(alpha=0.1)
}
# Prepare and save every mannequin
for title, mannequin in fashions.objects():
pipeline = Pipeline([
("pre", preprocessor),
("regressor", model)
])
pipeline.match(X, y)
with open(f"{title}_model.pkl", "wb") as f:
pickle.dump(pipeline, f)
print("✅ Fashions skilled and saved.")

I’ve created a dictionary with 5 completely different regression fashions:

RandomForestRegressor: An ensemble of resolution bushes that normally performs very effectively.
BaggingRegressor: One other ensemble technique that mixes predictions from many fashions to enhance stability.
LinearRegression: The only regression, suits a straight line.
Ridge: Linear regression however with regularization to keep away from overfitting.
Lasso: Much like Ridge however can shrink some function coefficients to zero (performs function choice).

# Prepare and save every mannequin

for title, mannequin in fashions.objects():
pipeline = Pipeline([
("pre", preprocessor),
("regressor", model)
])
pipeline.match(X, y)
with open(f"{title}_model.pkl", "wb") as f:
pickle.dump(pipeline, f)

Loop via every mannequin within the dictionary.
Create a pipeline that first preprocesses knowledge (preprocessor) then suits the regression mannequin (regressor).
Prepare the pipeline on our car_data with .match(X, y).
Save the skilled pipeline (together with preprocessing + mannequin) to a file named {model_name}_model.pkl (e.g., random_forest_model.pkl).
This fashion, we don’t have to preprocess knowledge once more when predicting later, And that is what machine studying is.

After full on prepare,

Run :

cd ml_model
pip set up pandas scikit-learn
python train_models.py

Create a file ml_model/app.py

from flask import Flask, request, jsonify
import pickle
import pandas as pd
app = Flask(__name__)
# Load fashions
fashions = {}
for title in ["random_forest", "bagging", "linear", "ridge", "lasso"]:
with open(f"{title}_model.pkl", "rb") as f:
fashions[name] = pickle.load(f)
@app.route("/predict", strategies=["POST"])
def predict():
knowledge = request.get_json()
df = pd.DataFrame([data])
predictions = {title: spherical(mannequin.predict(df)[0], 2) for title, mannequin in fashions.objects()}
return jsonify(predictions)
if __name__ == "__main__":
app.run(port=5000)

Run the API server:

python app.py

Open Postman.
Make a POST request to http://127.0.0.1:5000/predict
Choose Physique > uncooked > JSON
Paste this JSON:

{
"make": "Kia",
"mannequin": "Seltos",
"yr": 2020,
"mileage": 50425,
"fuelType": "CNG",
"transmission": "Guide",
"ownerCount": 3
}

5. Hit ship

It’s best to see like this.

Coaching a machine studying mannequin to foretell automotive costs is an superior strategy to get hands-on expertise with knowledge preprocessing, completely different regression algorithms, and understanding how one can consider your fashions. It’s a strong stepping stone if you wish to dive deeper into constructing real-world ML purposes.

Whether or not you’re a scholar simply beginning out, a developer interested by machine studying, or somebody who loves exploring new tech — I hope this information has given you a transparent, easy-to-follow path for coaching and utilizing a mannequin with your personal knowledge.

Source link

09211905260 – شماره خاله #شماره خاله,اصفهان#شماره خاله اصفهان

COSR: Training Compact AI in Mathematics and Coding Through Curated Self-Learning | by Vladislav cool curtains | May, 2025

Hyperparameter Tuning Tips. Feature engineering lays the… | by Taha Baba | May, 2025

The Future of Alpha: L2 — Reimagining Quant Trading and Derivatives with Agentic AI and Machine Learning | by peter joseph | May, 2025

Predicting Bitcoin’s Weekly Moves with 68% Accuracy using Random Forests in Python | by Ali AZARY | Apr, 2025

I Employ 75 People Across 10 Countries — Here Are the 3 Skills That Helped Me Build My Global Team

Deep Cogito’s Hybrid AI Revolution: Blending Brains and Speed to Redefine Enterprise Intelligence | by Swapnil | Apr, 2025

Best Veryfi OCR Alternatives in 2024

Most Popular

Python Lists vs. NumPy Arrays: Why Speed (and Memory) Matter in Data Science | by Abhinav Kumar N A | Apr, 2025

Accelerate MLOps with Distributed Machine Learning

These Cities Have the Most Affordable Rent in the US: Report

Our Picks

Boost 2-Bit LLM Accuracy with EoRA

BOOK DRAGON: BOOK GENRE CLASSIFICATION USING MACHINE LEARNING | by Ishita Joshi | Apr, 2025

How Much Do Nvidia Employees Make? Median Salary Revealed

How I Trained a Machine Learning Model to Predict Car Prices (And How You Can Too) | by Ishan Shrestha | May, 2025

Related Posts