On the planet of automation testing and UI interplay, typical strategies depend on hardcoded selectors and DOM-based interactions. However what if we might work together with a webpage the identical manner a human does — by taking a look at it?
On this mission, I discover a novel strategy: coaching a YOLOv8 object detection mannequin to visually detect internet parts, and utilizing Microsoft Playwright to carry out actions based mostly on these detections. The result’s a fusion of laptop imaginative and prescient and browser automation that opens up thrilling potentialities in check automation and accessibility.
Conventional UI automation instruments like Selenium and Playwright rely closely on XPath, CSS selectors, or ingredient IDs, which could be brittle when the UI adjustments. I needed to discover a strategy to visually establish and work together with parts, simply as a human tester would.
- Mannequin: YOLOv8, a quick and highly effective object detection mannequin.
- Dataset Annotation: Created utilizing Roboflow.
- Automation: Browser management by way of Playwright.
- Goal: Detect parts like buttons, enter fields, or checkboxes visually and work together with them (click on, kind, and so forth.) with out counting on selectors.
- Python 3.x
- YOLOv8 (Ultralytics)
- Roboflow (for picture annotation and dataset era)
- Playwright (for browser automation)
- OpenCV (for picture processing)
To make sure an AI mannequin achieves correct efficiency, it should be skilled on a considerable and high-quality dataset. For this mission, I utilized ChatGPT to generate random hyperlinks and employed the next Python code to provoke the method and seize screenshots.
import os
import asyncio
from playwright.async_api import async_playwrightlogin_urls = [
# Email Services
"https://accounts.google.com/signin",
"https://outlook.live.com/owa/",
"https://login.yahoo.com/",
"https://mail.protonmail.com/login",
"https://accounts.zoho.com/signin",
"https://login.aol.com/",
"https://www.gmx.com/login/",
"https://www.mail.com/int/",
"https://www.icloud.com/mail",
"https://www.fastmail.com/login/",
# Social Media
"https://www.facebook.com/login/",
"https://www.instagram.com/accounts/login/",
"https://twitter.com/login",
"https://www.linkedin.com/login",
"https://accounts.snapchat.com/accounts/login",
"https://www.tiktok.com/login",
"https://www.pinterest.com/login/",
"https://www.reddit.com/login/",
"https://www.tumblr.com/login",
"https://www.quora.com/login",
# Productivity
"https://workspace.google.com/",
"https://www.office.com/",
"https://slack.com/signin",
"https://zoom.us/signin",
"https://trello.com/login",
"https://www.notion.so/login",
"https://app.asana.com/",
"https://launchpad.37signals.com/",
"https://auth.monday.com/auth/login",
"https://app.clickup.com/login",
# Finance
"https://www.paypal.com/signin",
"https://dashboard.stripe.com/login",
"https://dashboard.razorpay.com/signin",
"https://pay.google.com/gp/w/u/0/home/signup",
"https://squareup.com/login",
"https://account.venmo.com/login",
"https://cash.app/login",
"https://login.payoneer.com/",
"https://wise.com/login",
"https://www.xoom.com/signin",
# Cloud Storage
"https://drive.google.com/",
"https://www.dropbox.com/login",
"https://onedrive.live.com/",
"https://account.box.com/login",
"https://www.icloud.com/",
"https://my.pcloud.com/#page=login",
"https://mega.nz/login",
"https://app.sync.com/",
"https://web.tresorit.com/login",
"https://www.mediafire.com/login/",
# Developer Platforms
"https://github.com/login",
"https://gitlab.com/users/sign_in",
"https://bitbucket.org/account/signin/",
"https://stackoverflow.com/users/login",
"https://id.heroku.com/login",
"https://cloud.digitalocean.com/login",
"https://signin.aws.amazon.com/signin",
"https://portal.azure.com/",
"https://console.cloud.google.com/",
"https://app.netlify.com/login",
# E-commerce
"https://www.amazon.com/ap/signin",
"https://www.flipkart.com/account/login",
"https://signin.ebay.com/",
"https://www.etsy.com/signin",
"https://accounts.shopify.com/store-login",
"https://www.walmart.com/account/login",
"https://login.aliexpress.com/",
"https://www.target.com/login",
"https://www.bestbuy.com/identity/global/signin",
"https://www.myntra.com/login",
# Education
"https://www.coursera.org/?authMode=login",
"https://courses.edx.org/login",
"https://www.udemy.com/join/login-popup/",
"https://www.khanacademy.org/login",
"https://www.duolingo.com/log-in",
"https://auth.udacity.com/sign-in",
"https://www.skillshare.com/login",
"https://www.linkedin.com/learning-login/",
"https://www.futurelearn.com/sign-in",
"https://app.pluralsight.com/id/",
# Healthcare
"https://mychart.com/login",
"https://member.webmd.com/login",
"https://www.healthline.com/login",
"https://www.zocdoc.com/login",
"https://www.practo.com/login",
"https://www.1mg.com/login",
"https://www.apollo247.com/login",
"https://www.netmeds.com/customer/account/login",
"https://pharmeasy.in/login",
"https://account.docusign.com/",
# Gaming
"https://store.steampowered.com/login/",
"https://www.epicgames.com/id/login",
"https://www.origin.com/login",
"https://us.battle.net/login/en/",
"https://login.live.com/",
"https://www.playstation.com/en-in/sign-in/",
"https://accounts.nintendo.com/login",
"https://www.gog.com/account/login",
"https://www.twitch.tv/login",
"https://discord.com/login",
]
output_folder = "screenshots"
os.makedirs(output_folder, exist_ok=True)
async def capture_screenshots():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(viewport={"width": 1280, "top": 800})
for index, url in enumerate(login_urls):
strive:
web page = await context.new_page()
await web page.goto(url, timeout=60000)
area = url.break up("//")[1].break up("/")[0].substitute(".", "_")
filename = f"{index+1:03d}_{area}.png"
path = os.path.be a part of(output_folder, filename)
await web page.screenshot(path=path)
print(f"Captured: {path}")
await web page.shut()
besides Exception as e:
print(f"Didn't seize {url}: {e}")
await browser.shut()
asyncio.run(capture_screenshots())
After acquiring the photographs from varied internet functions, we should add them to Roboflow for annotation. Utilizing the annotation instrument, we are going to draw rectangular packing containers across the buttons seen within the internet utility photographs. This course of must be repeated for all photographs containing buttons.
Examples of annotated photographs
After annotating all the photographs, we should export them in YOLOv8 format, together with each coaching and validation datasets together with their corresponding photographs and labels.
On this mission, we are going to deal with coaching the mannequin particularly to establish buttons on internet pages. Moreover, we now have the aptitude to coach the mannequin to acknowledge varied internet parts similar to hyperlinks, dropdowns, checkboxes, and extra.
from ultralytics import YOLOmannequin = YOLO("yolov8n.pt")
mannequin.practice(
information="information.yaml", # path to your information config
epochs=50,
imgsz=640,
batch=16,
gadget='cpu' # change to 'cpu' if no GPU
)
Pattern information.yaml file
path: dataset
practice: practice
val: valnc: 1
names: ['buttons']
Upon executing the script, the mannequin will endure coaching, and the outcomes shall be saved within the runs listing.
Confusion Matrix
F1 — Curve
General Outcomes
import cv2
import pyautogui
from ultralytics import YOLO
import numpy as np
import timefrom playwright.sync_api import sync_playwright
mannequin = YOLO("greatest.pt")
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
web page = browser.new_page()
web page.goto("https://app.sprybe.ai")
web page.screenshot(path="web page.png", full_page=True)
outcomes = mannequin("web page.png")
packing containers = outcomes[0].packing containers.xyxy
x1, y1, x2, y2 = packing containers[0]
x_center = float((x1 + x2) / 2)
y_center = float((y1 + y2) / 2)
web page.mouse.click on(x_center, y_center)
time.sleep(10)
browser.shut()
packing containers.xyxy
provides bounding packing containers for detected objects within the format:[x1, y1, x2, y2]
, the place:
(x1, y1)
is the top-left nook of the field.(x2, y2)
is the bottom-right nook.
Our skilled mannequin verifies that that is 95% more likely to be a button.
x_center = float((x1 + x2) / 2)
y_center = float((y1 + y2) / 2)
Above steps calculates the middle coordinates of the bounding field, i.e., the center of the detected ingredient.
web page.mouse.click on(x_center, y_center)
Above step makes use of Playwright (an online automation library) to simulate a mouse click on on the heart of the detected object on the present web page.
- Selector-free UI testing
- AI-powered browser bots
I examined the mannequin on unseen internet pages. Detection accuracy was constantly excessive. Clicks by way of Playwright matched UI targets precisely even on dynamic layouts.
Execution Hyperlink — https://youtu.be/yJOHjlVUCmE
- Add OCR to learn ingredient labels
- Bundle as a no-code instrument for testers
- Practice with multilingual UI datasets
- NLP to transform handbook check instances to script much less automation
This mission bridges customized construct yolo mannequin and Playwright, providing a human-like strategy to check and work together with internet pages — no DOM selectors required.