Monitoring Website Changes with a Controlled Audit Trail

A client approached us during a dispute with a public body. They needed a reliable way to monitor changes to the public body’s website, and keep a record of those changes over time.

Existing tools did not meet their requirements. Services such as Internet Archive (The Wayback Machine) and standard page monitoring tools either lacked control over how data is captured or did not provide a structured audit trail.

The challenge

The client needed more than simple alerts.

They wanted to:

  • control when pages were crawled,
  • capture and store page content in their own cloud environment,
  • track changes between versions,
  • receive clear summaries of changes by email.

This required a controlled and repeatable process, rather than relying on third-party snapshots.

The approach

We built a system to monitor a defined set of website URLs.

On the first run, the system:

  • accessed each page,
  • extracted the main text content,
  • removed non-essential elements such as navigation and footers,
  • stored the cleaned content as a baseline record.

This created a consistent starting point for comparison.

Tracking changes over time

The system was then scheduled to repeat the process daily.

Each new version of a page was compared against the previous day’s version. Where differences were detected, the system recorded them and linked them back to the source page.

Only meaningful changes were flagged, reducing noise and avoiding unnecessary alerts.

The results were then sent to the client by email, including direct links to the affected pages.

Managing notifications

One practical issue emerged during development.

Each page was processed individually, which initially resulted in multiple emails being sent for separate changes. This made the output difficult to review.

To resolve this, we added a final step to consolidate all detected changes into a single summary. The client now receives one email per run, containing a clear overview of any updates.

What this enables

The system provides a structured record of how a website changes over time, stored within the client’s own environment.

This gives them:

  • visibility of when content changes occur,
  • a consistent audit trail,
  • a reliable way to reference historical versions.

In situations where accuracy and accountability matter, this level of control becomes important.

Next steps

Is there data in a public domain that your organisation needs to monitor carefully and keep a record of?

If so, let’s start a conversation.