To find out the perfect mannequin to make use of on the backend of whereisthisphoto.com, I analysed the efficiency of varied OpenAI fashions at figuring out pictures taken everywhere in the world.
The dataset was constructed from a mix of private journey pictures I’ve and pictures downloaded from the r/whereintheworld subreddit. All pictures had the metadata eliminated to make sure the fashions have been solely utilizing pc imaginative and prescient to find out the situation.
The fashions have been examined on the next standards:
- Accuracy of nation identification
- Accuracy of the anticipated coordinates
In addition to a overview of the outcomes, I’ve included an in depth evaluation of every picture within the take a look at dataset within the full blog here, displaying how totally different fashions carried out in figuring out their places.
The chart beneath exhibits how every mannequin carried out throughout the take a look at dataset. The rating is derived from a combinatin of the nation prediction accuracy and the typical distance from the precise location.
The testing methodology included:
- A various set of photos from private travels and the r/whereintheworld subreddit
- Analysis based mostly on distance in meters from the precise location and proper nation identification
- Normalized scoring system the place nearer predictions obtained larger scores.
- o3, essentially the most just lately launched mannequin in our testing, carried out the perfect general, demonstrating OpenAI’s continued enchancment in picture location evaluation capabilities.
- o3 and gpt-4.1 achieved an identical nation prediction accuracy, however gpt-4.1 is considerably more cost effective at $2 per million tokens in comparison with o3’s $10 per million tokens.
- Among the many mini fashions examined, o4-mini confirmed the strongest efficiency, suggesting promising potential for the upcoming full o4 mannequin launch.
Well-known Landmarks
For well-known landmarks just like the Eiffel Tower or Colosseum, most fashions carried out exceptionally nicely, with accuracy usually inside 100 meters. Even smaller fashions may acknowledge these iconic buildings with excessive confidence.
City Environments
In city settings with distinctive structure or signage, o3 and GPT-4.1 demonstrated exceptional precision, usually figuring out not simply the town however the particular road or neighborhood.
Pure Landscapes
Pure landscapes proved more difficult, with accuracy various broadly. Essentially the most superior fashions may usually establish common areas accurately, however precision depended closely on the distinctiveness of the panorama options.
Suburban Areas
Suburban places offered a problem for all fashions, with even the highest performers generally struggling to supply exact places. In these instances, country-level identification remained comparatively correct, however street-level precision was uncommon.