AI Agent Developer: A Journey Through Code, Creativity, and Curiosity | by Talha Nazar

Let’s construct a easy AI agent utilizing Reinforcement Studying (RL). RL is a sort of machine studying the place an agent learns to carry out duties by interacting with an atmosphere and receiving rewards or penalties.

Instance 1: CartPole Drawback

The CartPole drawback is a basic RL activity the place the objective is to steadiness a pole on a transferring cart. We’ll use the OpenAI Health club library to simulate this atmosphere.

Step 1: Import Libraries

import health club
import numpy as np
from collections import deque
import matplotlib.pyplot as plt

Step 2: Initialize the Surroundings

env = health club.make('CartPole-v1')
state_size = env.observation_space.form[0]
action_size = env.action_space.n
print(f"State Dimension: {state_size}, Motion Dimension: {action_size}")

Right here, state_size represents the variety of variables describing the atmosphere (e.g., cart place, velocity), and action_size represents the potential actions (e.g., transfer left or proper).

Step 3: Outline the Agent

We’ll create a easy Q-learning agent. Q-learning is a model-free RL algorithm that learns the worth of actions in every state.

class QLearningAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.q_table = np.zeros((state_size, action_size))
self.learning_rate = 0.1
self.discount_factor = 0.95
self.epsilon = 1.0  # Exploration pricedef choose_action(self, state):
if np.random.rand()             return env.action_space.pattern()  # Discover
return np.argmax(self.q_table[state])  # Exploit
def study(self, state, motion, reward, next_state):
old_value = self.q_table[state, action]
next_max = np.max(self.q_table[next_state])
new_value = (1 - self.learning_rate) * old_value + self.learning_rate * (reward + self.discount_factor * next_max)
self.q_table[state, action] = new_value

Step 4: Prepare the Agent

agent = QLearningAgent(state_size, action_size)
episodes = 1000
scores = deque(maxlen=100)

for episode in vary(episodes):
state = env.reset()
total_reward = 0
performed = Falsewhereas not performed:
motion = agent.choose_action(state)
next_state, reward, performed, _ = env.step(motion)
agent.study(state, motion, reward, next_state)
state = next_state
total_reward += reward
scores.append(total_reward)
avg_score = np.imply(scores)
if episode % 100 == 0:
print(f"Episode: {episode}, Common Rating: {avg_score}")

Step 5: Visualize Outcomes

plt.plot(scores)
plt.xlabel("Episode")
plt.ylabel("Rating")
plt.title("Coaching Progress")
plt.present()

Source link

Unveiling the Neural Mind: Tracing Step-by-Step Reasoning in Large Language Models | by Vilohit | Apr, 2025

How to break into data science \ machine learning | by Data_Guy | Apr, 2025

Mastering Natural Language Processing — Part 13 Running and Evaluating Classification Experiments in NLP | by Connie Zhou | Apr, 2025

Apple Plans to Bring Live Translation to AirPods: Report

New method assesses and improves the reliability of radiologists’ diagnostic reports | MIT News

Partying Like A Young Degenerate Is Not Good For Your Finances

VideoMind: How Chain-of-LoRA Teaches AI to Understand Time in Long Videos | by Jenray | Mar, 2025

How to Mine Pi Coin — the Hottest Crypto on the Market | by How to Mine Pi Coin | Mar, 2025

Most Popular

Google Exec’s Secrets for Restaurants to Get More Customers

Speed Wins: Why AI Compliance Must Be Swift and Decisive | by Sotiris Spyrou | Mar, 2025

This Is the One Question AI Can’t Answer For You

Our Picks

A small US city experiments with AI to find out what residents want

Various Things Children Can Do To Earn Money For A Business

Chobani Is Building a Billion Dollar Dairy Factory in NY

AI Agent Developer: A Journey Through Code, Creativity, and Curiosity | by Talha Nazar | Feb, 2025