On this article, I’ll exhibit using the not too long ago launched State Of The Artwork mannequin YOLOv11 with onnxruntime.
ONNX Runtime is a cross platform engine for operating ML fashions in ONNX format. To transform your pretrained or customized YOLOv11 mannequin into the ONNX format, we’ll use the Ultralytics library, which simplifies this course of, enabling us to export fashions with just some strains of code.
from ultralytics import YOLOmannequin = YOLO("path/to/your/mannequin.pt") # path to your pretrained or custome YOLO11 mannequin
mannequin.export(format="onnx")
As soon as the conversion is efficiently completed, then all we want are solely three libraries, ONNX Runtime, OpenCV and Numpy, our ONNX mannequin and a python script to carry out inference.
Let’s proceed, and set up the above talked about libraries besides numpy which comes bundled with OpenCV:
pip set up opencv-python onnxruntime
Shifting ahead to the script for inference, we begin by importing the required modules, loading our YOLOv11 ONNX mannequin with ONNX Runtime and defining a listing of all of the lessons that our mannequin can detect. Since I’m utilizing official YOLOv11 which is skilled over COCO dataset containing 80 lessons. If you’re utilizing your personal skilled mannequin, then the lessons can be completely different in your state of affairs.
import onnxruntime as ort
import cv2
import numpy as npmode_path = "YOLOvn.onnx"
onnx_model = ort.InferenceSession(mode_path)
with open('coco-classes.txt') as file: # loading the lessons names from the file.
content material = file.learn()
lessons = content material.cut up('n')
Earlier than operating the inference, we are going to load the picture and do some required picture processing over it.
picture = cv2.imread("pictures/img1.jpg")img_w, img_h = picture.form[1], picture.form[0]
img = cv2.resize(picture, (640, 640))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.transpose(2, 0, 1)
img = img.reshape(1, 3, 640, 640)
Within the above code snippet, I resize the picture to 640×640 pixels, as a result of the mannequin was skilled over the photographs of this measurement. Additionally our YOLOv11 ONNX mannequin count on an RGB picture within the form of (1, 3, 640, 640), however OpenCV by default learn the picture in BGR format having channel info on the 2nd place as a substitute of 0th. Subsequently, we convert the picture to RGB and convey it to the required enter form of (1, 3, 640, 640). Right here,
1
signifies the batch measurement (we are going to feed the mannequin with one picture at a time).3
represents the colour channels (RGB).640, 640
are the spatial dimensions of the picture.
Continuing additional, we are going to normalize the pixel values and convey their vary from [0, 255] to [0, 1]. The picture is then transformed to float32
to match the mannequin’s anticipated enter information sort.
# Normalize pixel values to the vary [0, 1]
img = img / 255.0# Convert picture to float32
img = img.astype(np.float32)
Lastly we will feed our mannequin with this picture.
outputs = onnx_model.run(None, {"pictures": img})
On the finish of inference, we get a matrix of form (1, 84, 8400) as output indicating 8400 detections every having 84 parameters. It is because the official YOLOv11 mannequin is designed to all the time predict 8400 objects in a picture no matter what number of objects are literally current. We are going to take away these detections having low confidence scores. Right here 84 within the form of matrix indicating the variety of parameters for every detection. This consists of the bounding field coordinates (x1, y1, x2, y2) and confidence scores for 80 completely different lessons on which the mannequin was skilled. Notice that this construction may differ for customized fashions. The variety of confidence scores all the time depends upon the variety of lessons your mannequin is skilled on. For instance, when you practice YOLOv11 object detection mannequin to detect 1 class, there can be 5 parameters as a substitute of 84. The primary 4 will probably be once more bounding field coordinates and the final one would be the confidence rating.
Now, only for ease, reshaping this output matrix to get the form of (8400, 84).
outcomes = out[0]
outcomes = outcomes.transpose()
Now our subsequent step is to determine the most probably class for every object and filter out low confidence predictions. We are able to do that by choosing the category with the very best confidence rating for every detection. Additionally, we discard detections the place all confidence scores are under than a selected threshold (0.5 in our case).
def filter_Detections(outcomes, thresh = 0.5):
# if mannequin is skilled on 1 class solely
if len(outcomes[0]) == 5:
# filter out the detections with confidence > thresh
considerable_detections = [detection for detection in results if detection[4] > thresh]
considerable_detections = np.array(considerable_detections)
return considerable_detections# if mannequin is skilled on a number of lessons
else:
A = []
for detection in outcomes:
class_id = detection[4:].argmax()
confidence_score = detection[4:].max()
new_detection = np.append(detection[:4],[class_id,confidence_score])
A.append(new_detection)
A = np.array(A)
# filter out the detections with confidence > thresh
considerable_detections = [detection for detection in A if detection[-1] > thresh]
considerable_detections = np.array(considerable_detections)
return considerable_detections
outcomes = filter_Detections(outcomes)
As soon as the ineffective parameters are exclude from output matrix, we will print the form to raised perceive our outcomes.
print(outcomes.form)
(22, 6)
Above end result reveals that now 22 detections are left, every having 6 parameters. They’re bounding field high left (x1, y1) and backside proper (x2, y2) coordinates, class id and confidence worth. Nonetheless there may be some pointless detections and it is because a few of them are literally pointing to the identical object. This may be addressed through the use of an algorithm referred to as Non-Most Suppression (NMS) which selects the perfect detections amongst these doubtlessly referring to the identical object. It achieves this by contemplating two key metrics that are confidence worth and Intersection over Union (IOU). Moreover, we’ll must rescale the remaining detections again to their unique scale. It is because our mannequin has output the detection for a picture of measurement 640×640 which isn’t the scale of our unique picture.
def NMS(bins, conf_scores, iou_thresh = 0.55):# bins [[x1,y1, x2,y2], [x1,y1, x2,y2], ...]
x1 = bins[:,0]
y1 = bins[:,1]
x2 = bins[:,2]
y2 = bins[:,3]
areas = (x2-x1)*(y2-y1)
order = conf_scores.argsort()
maintain = []
keep_confidences = []
whereas len(order) > 0:
idx = order[-1]
A = bins[idx]
conf = conf_scores[idx]
order = order[:-1]
xx1 = np.take(x1, indices= order)
yy1 = np.take(y1, indices= order)
xx2 = np.take(x2, indices= order)
yy2 = np.take(y2, indices= order)
maintain.append(A)
keep_confidences.append(conf)
# iou = inter/union
xx1 = np.most(x1[idx], xx1)
yy1 = np.most(y1[idx], yy1)
xx2 = np.minimal(x2[idx], xx2)
yy2 = np.minimal(y2[idx], yy2)
w = np.most(xx2-xx1, 0)
h = np.most(yy2-yy1, 0)
intersection = w*h
# union = areaA + other_areas - intesection
other_areas = np.take(areas, indices= order)
union = areas[idx] + other_areas - intersection
iou = intersection/union
boleans = iou
order = order[boleans]
# order = [2,0,1] boleans = [True, False, True]
# order = [2,1]
return maintain, keep_confidences
# operate to rescale bounding bins
def rescale_back(outcomes,img_w,img_h):
cx, cy, w, h, class_id, confidence = outcomes[:,0], outcomes[:,1], outcomes[:,2], outcomes[:,3], outcomes[:,4], outcomes[:,-1]
cx = cx/640.0 * img_w
cy = cy/640.0 * img_h
w = w/640.0 * img_w
h = h/640.0 * img_h
x1 = cx - w/2
y1 = cy - h/2
x2 = cx + w/2
y2 = cy + h/2
bins = np.column_stack((x1, y1, x2, y2, class_id))
maintain, keep_confidences = NMS(bins,confidence)
print(np.array(maintain).form)
return maintain, keep_confidences
rescaled_results, confidences = rescale_back(outcomes, img_width, img_height)
Right here rescaled_results incorporates the bounding field (x1, y1, x2, y2) and sophistication id whereas confidences shops the corresponding confidence scores.
Lastly we will visualize these outcomes on our picture.
for res, conf in zip(rescaled_results, confidences):x1,y1,x2,y2, cls_id = res
cls_id = int(cls_id)
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
conf = "{:.2f}".format(conf)
# draw the bounding bins
cv2.rectangle(picture,(int(x1),int(y1)),(int(x2),int(y2)),(255,0, 0),1)
cv2.putText(picture,lessons[cls_id]+' '+conf,(x1,y1-17),
cv2.FONT_HERSHEY_SIMPLEX,0.7,(255,0,0),1)
cv2.imwrite("Output.jpg", picture)