we ready the info from a CSV file. This dataset is semi-structured and accommodates order info from a pizza restaurant, together with Order Quantity, ORDER Kind, ID, ORDERTYPE, and PROMOTIONLIST.
As you’ll be able to see, all the info sorts are object, which can’t be used instantly for evaluation. Subsequently, we’ll convert the info sorts utilizing Label Encoding.
from sklearn.preprocessing import LabelEncoder
# creating occasion of labelencoder
labelencoder = LabelEncoder()
df['ORDER'] = labelencoder.fit_transform(df['ORDER'])
df['ORDERTYPE'] = labelencoder.fit_transform(df['ORDERTYPE'])
df['PROMOTIONLIST'] = labelencoder.fit_transform(df['PROMOTIONLIST'])
To handle this, we use Label Encoding to transform categorical knowledge into integer values. For instance :
ORDER kind is encoded as
- House =
2
- LineMan (LM) =
3
- Seize =
1
- Dine-in =
0
- Pizza Supply =
4
ORDERTYPE is encoded as
- Promotion =
1
- A LA CARTE =
0
From Fig 3. The time format must be transformed, to be appropriate with the perform that classifies it into the respective time interval class. Through the use of this code.
df['TIME'] = pd.to_datetime(df['TIME'], format='%I:%M %p', errors='coerce')
df['Hour'] = df['TIME'].dt.hour
Subsequent, we created a perform to transform the time into completely different lessons for every time interval, so we will determine when the orders have been positioned for additional evaluation
def get_time_period(hour):
if 5 return 'Morning'
elif 11 return 'Lunch'
elif 14 return 'Afternoon'
elif 17 return 'Dinner'
else:
return 'Night time'df['TimePeriod'] = df['Hour'].apply(get_time_period)
df.columns
df.dtypes
df.head(10)
The consequence :
Moreover, we convert the time format to align with our classification perform. We classify completely different time intervals as follows :
- Dinner =
1
- Lunch =
2
- Morning =
3
- Afternoon =
0
- Night time =
4
df['TimePeriod'] = labelencoder.fit_transform(df['TimePeriod'])df.columns
df.dtypes
df.head(10)
The consequence :
Subsequent, we drop the columns that aren’t wanted, comparable to ID, TIME, and HOUR. Through the use of this code.
df.drop(['ID','TIME','Hour'], axis=1, inplace=True)
df.columns
df.dtypes
df.head(10)
The consequence :
Okay-Means requires us to specify the variety of clusters ( Okay ). Choosing the proper Okay is essential, as an inappropriate worth might result in poor clustering outcomes.
To find out the optimum Okay, we use the Elbow Technique, which plots the Sum of Squared Errors (SSE) in opposition to completely different values of Okay. The elbow level—the place the curve bends—is normally the only option, as rising Okay past this level ends in diminishing returns.
To refine our choice, we take into account a variety of doable values for Okay, usually ±2 across the elbow level. After figuring out this vary, we additional analyze the SSE values to make sure we choose the most effective Okay for our dataset.