“This has literally saved us the days and days of work that it would have taken to manually enter this information into a spreadsheet.”

The challenge
A property developer with limited admin support had multiple invoices they needed to process.
The invoices were PDFs and they needed to manually transfer the information into a spreadsheet to support cost and finance planning.
It was time consuming and laborious, and a common issue that many business owners face.
The solution
We held an initial consultation and diagnostic session with the client to understand their challenges and frustrations.
It was clear the challenge was one of data conversion and categorisation. Getting data out of PDF files is a common problem, and one to which we already had a solution built.
Our system uses optical character recognition (OCR) to read PDFs. It can extract 70 to 80% of the data, and so saves the same amount of time as it would take to identify and input that data manually.
How it works
We set up our system on an automation platform for the customer, and linked it to their own cloud storage system (Google Drive), the optical character recognition, and a simple AI agent.
We used the OCR technology to extract the following data from each invoice:
- the name of the company issuing the invoice,
- invoice date,
- amount,
- currency.
The AI agent categorised and reasoned over this data, while keeping the output as deterministic as possible.
For each invoice we added the following:
- date and time processed,
- PDF file name,
- link to the cloud copy of the PDF invoice,
- a bookkeeping category,
- notes.
‘Category’ is used by the agent to assign the invoice to a bookkeeping category (such as: ‘groundworks’, ‘bricklaying’, ‘plumbing’, ‘electrics’).
‘Notes’ is a space for the agent to highlight any missing data, information for review, and to explain how it has reached certain decisions that might need double checking.
A link to the cloud version of the PDF file means that the human reviewer can check the sheet line by line for accuracy, and add any missing information simply by clicking the link in the line of the spreadsheet.
As invoices are processed they are moved from an ‘Inputs’ file to a ‘Processed’ file, within the Google Drive, and so the customer can see the automation happening.
If the optical character recognition step fails, because the PDF is damaged, then the file is automatically moved to a folder marked ‘Scan failed’. In this way the customer knows that no information from this invoice was recorded, and so that it must be manually entered.
What the client said
“We have many different suppliers and contractors, from architects to waste collection, with invoices in many different layouts. I was impressed with how the system recorded and handled the data.”
Ongoing value
Now this system has been set up for the customer, it will be there to use the next time invoices need processing providing them with long term value and efficiency.
Security and compliance
We designed this solution with data security and UK GDPR compliance in mind. The customer’s documents remain in their own secure cloud storage. Any processing is carried out using trusted, industry standard providers with appropriate data protection agreements in place.
We minimise the amount of data shared between services, avoid storing sensitive information unnecessarily, and ensure that any temporary processing data is short lived.
In simple terms, the customer’s information stays under their control, is handled securely at every step, and is processed only for its intended purpose.
Next steps
Does your organisation have data in a format in which it cannot easily be used, or needs to be converted to be processed?
If so, let’s start a conversation.
