Want to take a free 2 min AI Readiness Assessment? Click here


Tapi regularly need to process a high quantity and variety of maintenance invoices on their customer’s behalf. As speed and accuracy are important factors for ensuring a positive customer experience, they were looking to use AI technology as a way to boost their invoice processing accuracy and efficiency.Working with Arcanum, Tapi hoped to improve the accuracy of the data being extracted from invoices, and increase the speed at which invoices can be processed.

Arcanum implemented an Automated invoice processing workflow powered by AWS Textract that is capable of intelligently identifying data fields on a wide range of invoice formats, as well as extracting the data very quickly in batches of invoices.

AWS Solution Architecture

The invoice extraction workflow utilises AWS Textract, a machine learning service that automatically extracts text, handwriting and data from document sources. For our invoice extraction solution, we have optimised our Arcanum ML service architecture that we have also developed as our general MLOps architecture. Arcanum’s solution architecture is able to be re-used for our other ML services and is also auto-scalable, which was important for Tapi as they are processing a large volume of invoices. We are continually upgrading our solution architecture, and every improvement will be rolled out to our existing customers.Combining queries with the out of the box capabilities of the analyse expense API, we were able to build our own post processors on top of Textract to create a robust, accurate, and generalised invoice extraction service for Tapi. We also utilised Route 53, VPC, Internet Gateway, Load Balancer, NAT gateway, ECS, ECR, EC2 to secure and scale our infrastructure for enterprise customers.


In the context of Tapi, we effectively employed the powerful queries feature provided by Textract to dynamically handle a wide range of invoice formats. By harnessing the capabilities of these queries, we overcame the constraints posed by changing templates and vendor invoice styles. This was made possible through the advanced integration of natural language processing (NLP) and computer vision (CV) techniques, along with the ability to assign multiple queries to a single alias. As an example, we expanded our search for the invoice number by utilizing various queries such as "What is the invoice number" and "What is the document number." By combining these queries with the pre-existing functionalities of the analyze expense API, we successfully constructed a robust, precise, and versatile invoice extraction service for Tapi.

As this is a core function behind their service offering, they were able to improve their customer experience through providing faster and more accurate processing of customer invoices - enabling their customers to get more done with less effort.

“We were able to process more than 60,000 invoices in the first month with no issues and a very high accuracy rate. We are also really impressed with how well it has handled handwritten and blurry invoices.“

Chad, CTO, Tapi