2019 Examples to Compare Amazon Textract/Rekognition vs Microsoft Cognitive Services vs Google Vision API

linkIntroduction

We're building a note app that will surface images+documents in full-text search, so it needs to do OCR as well as possible. Preferably at a low price. We hoped there would be a good, modern, comparison of the major OCR services, but as of July 2019, there wasn't -- so we wrote one. The main result Google kept sending us to was OK, but its review concluded more than a year ago, and these services are evolving very quickly. Most have launched completely new versions over the past year.


And so, we did some research on the current OCR providers. We figured that as long as we had to compile the research into a note, we might as well share that note with others who might need this knowledge for whatever reason.


This article will compare

Amazon Textract. Previewed in late 2018 and launched to GA in May 2019, focusing on scanned and structured documents (e.g. forms).


Since our use case is full-text search, we're not seeking to extract any structural data, just a set of words as a user might transcribe the image. Some of these products have a strong focus on specific use cases - like form data extraction - which we're not evaluating. Both Microsoft and Google have additional OCR services that focus on that use case.


In addition to providing transcriptions of sample images, we'll also touch on the current price of each service (with links to pricing pages so you can confirm the estimates are up-to-date), in case that is a factor in your consideration.


If you would like to read a full-width version of this article, try this.


linkOCR Image Processing Results

We started with three image samples, representing archetypes we expect to see from our users. Our samples included a hand-written letter, webpage text, and text written on a whiteboard. The selection of these particular images wasn't scientific, but we figure that if the OCR solution can get these right, it's state-of-the-art for the moment.


For the tl; dr types, here's how each service performed on our non-scientific test:



See also: methodology notes.


Pricing: Amazon Rekognition, Amazon Textract, Google, Microsoft. We don't really care which one you use, but Microsoft did best by our sample data. Textract was a very close second if you only need its headline feature: extracting text from digital documents. If someone wants to email bill -at- amplenote.com with comparable data for other images/services, I can try to work those into this post as time allows. 😎


linkImage 1: Hand-written note

See also: the result as interpreted by me.


linkAmazon Rekognition


See also: the result as text.


linkAmazon Textract


See also: the result as text.


linkGoogle Cloud Vision OCR


See also: the result as text.


linkMicrosoft Cognitive Services (Read API)


See also: the result as text.


Ruby used to compare these: data, and method.


linkImage 2: Webpage text

See also: result as interpreted by me.


linkAmazon Rekognition


See also: result as text.


linkAmazon Textract


See also: result as text.


linkGoogle Cloud Vision OCR


See also: result as text.


linkMicrosoft Cognitive Services (Read API)


See also: result as text.


Ruby used to compare these results: data, and method.


linkImage 3: Handwriting on whiteboard

This one was a toughie. My interpretation.


linkAmazon Rekognition


See also: result as text.


linkAmazon Textract


See also: result as text.


linkGoogle Cloud Vision OCR



See also: result as text.


linkMicrosoft Cognitive Services (Read API)


See also: result as text.


Ruby used to compare these results: data, and method.


Thanks to Jordan for deriving the data and pasting the screenshots!

link🍻🍻🍻