Skip to main content

AWS, Google and Microsoft Vision API Comparison

By Dileepa Jayakody - 11th March 2017

Over the past year, major software vendors including Google, Microsoft, IBM and Amazon have shown great interest in developing image analysis APIs and applications. These companies have introduced suites of image analysis APIs offering a vast range of image analysis features. Using a computer vision API, a user can simply submit an image of a busy city and receive semantics tags such as city, buildings, people, whether it’s a photo taken at daytime or night etc, as analysis results of the image. Some of the common features provided by popular vision API providers can be discussed as below. 

Object Detection

Detecting real world objects/entities in an image is a fundamental feature in every Vision API. Generally the object detection results will be provided as a set of labels with their confidence scores. API providers perform the object detection based on their pre-trained object models. Generic objects such as people, trees, buildings, cats and dogs etc can be detected out of the box. Some vision API providers also offers the user the ability to train the system with custom data models for specific domains. 

Concept and scene detection is a step further taken by the vision APIs which can detect concepts such as trademarks or logos or social concepts such as a party, busy city and scenes such as a sun set on an evening and a rainy day. Furthermore, many vision APIs have the ability to detect inappropriate content such as adult/violent content. Useful applications such as content filtering for broadcasting and publishing industries can be developed effectively using these features of vision APIs.  

Bounding box is a common feature with many vendors when providing the image analysis response with identified objects, faces and text. The bounding box can provide the dimensions of the identified object and related information required for further analysis in the application. 

Object_Detection

 

Face Detection

Face detection is another common feature provided by many vendors. Facial sentiments such as happiness, sadness and excitement are also detected by some vision API vendors which adds a lot more value to the end users to build various applications. Some vendors also provide facial recognition which can identify people such as celebrities. Facial landmarks and the facial expression results are also given by many providers. 

Text Detection

Optical character recognition (OCR) is another image analysis feature provided by some API vendors highly useful for applications needing text detection.  Likewise, the vision API vendors provide a long list of image analysis features which can be useful for a wide range of applications including cross media search applications, e-commerce and digital marketing applications etc.

Comparison Of The Vision APIs

When in comes to selecting a vision API, there are few more things to consider other than the features offered by the API. The developer tools and SDKs, ability to add our own training models for custom datasets, the average cost of API usage and the scalability are few factors that determines the vision API selection.

In the table below, we are comparing three market leading vendors of vision APIs in terms of their features.

Table

 


 

About the author: Dileepa Jayakody
Dileepa Jayakody is a Technical Lead at Zaizi. He currently works on Sensefy project and related R&D projects in the domains of enterprise search and enterprise content management. He is passionate about information retrieval,semantic web and machine learning technologies. Dileepa is also a contributor to many open-source projects including Apache Stanbol and Apache ManifoldCF.