A beginner-friendly information to coaching a automotive value prediction mannequin utilizing Python, pandas, and scikit-learn — defined clearly.
Shopping for or promoting a used automotive will be difficult. With so many makes, fashions, gas sorts, and situations, estimating a good value isn’t simple — even for skilled automotive sellers. That’s the place machine studying is available in.
On this put up, I’ll stroll you thru how I skilled a machine studying mannequin to foretell the value of a automotive based mostly on its particulars like make, mannequin, yr, mileage, gas sort, and extra. In the event you’re new to ML, don’t fear — I’ll maintain issues easy and clarify all the things clearly.
- Fundamental Python information
- Python put in (ideally with pip and digital setting)
- Jupyter Pocket book or any IDE (like VSCode)
- Some knowledge (we’ll use a CSV file with automotive listings)
- Libraries:
pandas
,scikit-learn
, andpickle
Run the next instructions within the your terminal.
mkdir car-price-predictor
cd car-price-predictor
mkdir ml_model
mkdir backend
Save the next content material as ml_model/car_data.csv
make,mannequin,yr,mileage,fuelType,transmission,ownerCount,value
Toyota,Corolla,2015,70000,Petrol,Guide,1,350000
Hyundai,i20,2018,45000,Petrol,Computerized,1,400000
Honda,Civic,2017,60000,Diesel,Guide,2,420000
Maruti,Swift,2019,30000,Petrol,Guide,1,380000
Ford,Ecosport,2016,85000,Diesel,Guide,2,360000
Volkswagen,Polo,2017,40000,Petrol,Guide,1,390000
Mahindra,Scorpio,2015,90000,Diesel,Guide,3,410000
Renault,Kwid,2020,15000,Petrol,Guide,1,300000
Tata,Nexon,2021,10000,Petrol,Computerized,1,520000
Kia,Seltos,2022,5000,Diesel,Guide,1,800000
or if you wish to prepare extra knowledge you may comprises automotive particulars, like this:
and transfer file to car_data.cvs
Prepare your mannequin ml_model/train_models.py
Right here’s the complete code — adopted by detailed explanations of every step.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pickle
# Load knowledge
knowledge = pd.read_csv("car_data.csv")
X = knowledge.drop(columns=["price"])
y = knowledge["price"]
# Outline preprocessing
categorical_cols = ["make", "model", "fuelType", "transmission"]
numeric_cols = ["year", "mileage", "ownerCount"]
preprocessor = ColumnTransformer([
("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols)
], the rest="passthrough")
# Outline fashions
fashions = {
"random_forest": RandomForestRegressor(n_estimators=200, random_state=42),
"bagging": BaggingRegressor(n_estimators=100, random_state=42),
"linear": LinearRegression(),
"ridge": Ridge(alpha=1.0),
"lasso": Lasso(alpha=0.1)
}
# Prepare and save every mannequin
for title, mannequin in fashions.objects():
pipeline = Pipeline([
("pre", preprocessor),
("regressor", model)
])
pipeline.match(X, y)
with open(f"{title}_model.pkl", "wb") as f:
pickle.dump(pipeline, f)
print("✅ Fashions skilled and saved.")
I’ve created a dictionary with 5 completely different regression fashions:
- RandomForestRegressor: An ensemble of resolution bushes that normally performs very effectively.
- BaggingRegressor: One other ensemble technique that mixes predictions from many fashions to enhance stability.
- LinearRegression: The only regression, suits a straight line.
- Ridge: Linear regression however with regularization to keep away from overfitting.
- Lasso: Much like Ridge however can shrink some function coefficients to zero (performs function choice).
# Prepare and save every mannequin
for title, mannequin in fashions.objects():
pipeline = Pipeline([
("pre", preprocessor),
("regressor", model)
])
pipeline.match(X, y)
with open(f"{title}_model.pkl", "wb") as f:
pickle.dump(pipeline, f)
- Loop via every mannequin within the dictionary.
- Create a pipeline that first preprocesses knowledge (
preprocessor
) then suits the regression mannequin (regressor
). - Prepare the pipeline on our car_data with
.match(X, y)
. - Save the skilled pipeline (together with preprocessing + mannequin) to a file named
{model_name}_model.pkl
(e.g.,random_forest_model.pkl
). - This fashion, we don’t have to preprocess knowledge once more when predicting later, And that is what machine studying is.
After full on prepare,
Run :
cd ml_model
pip set up pandas scikit-learn
python train_models.py
Create a file ml_model/app.py
from flask import Flask, request, jsonify
import pickle
import pandas as pd
app = Flask(__name__)
# Load fashions
fashions = {}
for title in ["random_forest", "bagging", "linear", "ridge", "lasso"]:
with open(f"{title}_model.pkl", "rb") as f:
fashions[name] = pickle.load(f)
@app.route("/predict", strategies=["POST"])
def predict():
knowledge = request.get_json()
df = pd.DataFrame([data])
predictions = {title: spherical(mannequin.predict(df)[0], 2) for title, mannequin in fashions.objects()}
return jsonify(predictions)
if __name__ == "__main__":
app.run(port=5000)
Run the API server:
python app.py
- Open Postman.
- Make a POST request to
http://127.0.0.1:5000/predict
- Choose Physique > uncooked > JSON
- Paste this JSON:
{
"make": "Kia",
"mannequin": "Seltos",
"yr": 2020,
"mileage": 50425,
"fuelType": "CNG",
"transmission": "Guide",
"ownerCount": 3
}
5. Hit ship
It’s best to see like this.
Coaching a machine studying mannequin to foretell automotive costs is an superior strategy to get hands-on expertise with knowledge preprocessing, completely different regression algorithms, and understanding how one can consider your fashions. It’s a strong stepping stone if you wish to dive deeper into constructing real-world ML purposes.
Whether or not you’re a scholar simply beginning out, a developer interested by machine studying, or somebody who loves exploring new tech — I hope this information has given you a transparent, easy-to-follow path for coaching and utilizing a mannequin with your personal knowledge.