Machine Learning 🧠. A machine that learns. Sounds cool and… | by Isbilya Nathifa

First issues first open your Google Colab because it’s the medium we’ll be utilizing. Subsequent, like every other information evaluation venture, you must enter or make an information set that may be learn by the machine. On this introduction, we’re going to make use of a pre-made information set that encompass salaries of various folks and their components (variable) that, could or could not, impact their wage.

import pandas as pd
information = pd.read_csv('/content material/Salary_Data.csv')
information.head() #that is to indicate the primary 5 information of the set

Now, let’s have a look of what kind of knowledge every variable is. (*A variable is each information that isn’t what we try to foretell. On this case, variables are each information besides the wage)

information.dtypes

Discover how a number of the information are within the type of object. This can be a drawback we have to resolve. Why? As a result of a machine can’t predict a linear regression if the info saved are usually not numbers (this embody float and integer). That’s why our subsequent step is to vary each information which can be non numbers (objects) into numbers (float/integer).

from sklearn.preprocessing import LabelEncoder
categorical_column = information.select_dtypes(embody=['object']).columnslabel_encoders = {}
for col in categorical_column:
label_encoders[col] = LabelEncoder()
information[col] = label_encoders[col].fit_transform(information[col])
information.head() #this exhibits the primary 5 information of the brand new set

Okay, now that’s carried out, you in all probability will assume that we will go straight to linear regression, hey? Nicely, dangerous information is… not but. There’s another factor we’ve got to examine which is the NaN worth. “Why on earth do we’d like that?” NaN means there isn’t any worth within the information. No ‘0’, no ‘1’, no nothing. That is dangerous for the machine as a result of it signifies lacking information. In an effort to forestall the machine from getting confused for not getting a whole information set, we have to examine what number of NaN worth are within the set utilizing this straightforward code.

information.isna().sum()

As you’ll be able to see, there are some NaN, or lacking values, in a few of these information. In an effort to change that, we have to fill in these with values that may fairly characterize the info. There are lots of choices. You should utilize median, imply, or mode. However now we’re going to go along with the info imply of their variable.

information.fillna(information.imply(), inplace=True)
information.isna().sum()

Okay, after some checking and setting, the info set is now prepared to make use of to foretell wage with linear regression.

Source link

Artificial Intelligence: Shaping the Future | by Aliya Kanwal | Apr, 2025

Technology is NOT Making Humans Stupid: But Fear and Laziness Might. | by Allison Binger | Apr, 2025

Living My Dream As A Data Scientist at Microsoft! | by Harsh Mani | Apr, 2025

4 Ways to Boost Your Business’s Efficiency

OpenAI has released its first research into how using ChatGPT affects people’s emotional wellbeing

Questions to Ask Before Creating a Machine Learning Model | by Karim Samir | simplifann | Mar, 2025

Various Things Children Can Do To Earn Money For A Business

Why Tier 0 Is a Game-Changer for GPU Storage

Most Popular

Automate Supply Chain Analytics Workflows with AI Agents using n8n

Nexla Expands AI-Powered Integration Platform for Enterprise-Grade GenAI

Overcome Failing Document Ingestion & RAG Strategies with Agentic Knowledge Distillation

Our Picks

Jeff Bezos’ Blue Origin Is Laying Off 10% of Its Workforce

Money Tip From Founder Helping College Athletes Manage Billions

How AI Enhances Supply Chain Cybersecurity

Machine Learning 🧠. A machine that learns. Sounds cool and… | by Isbilya Nathifa | Mar, 2025

Related Posts