Touchless Audit Trail: Enabling Audit and Building Data Traction Analytics

5 min readMar 29, 2024

Revolutionising Data Auditing with non-invasive framework

Overview

The “Touch-less Audit Trail” system is designed to enable the auditing of primary source data without requiring any code changes to existing systems. This innovative solution leverages a plugin Change Data Capture (CDC) architecture, ensuring seamless integration and minimal OR no disruption to existing processes. The ability to make sense and react upon data changes in real-time is a critical differentiator.

Design

In the realm of data management, ensuring the integrity and accountability of primary source data is paramount. However, traditional auditing methods often entail cumbersome code modifications and system disruptions, posing significant challenges for organizations striving to maintain operational efficiency while meeting regulatory standards. At its core, the “Touch-less Audit Trail” system is built upon a sophisticated architecture that seamlessly integrates with primary data sources, leveraging Change Data Capture (CDC) technology to capture data changes in real time.

Design Components

Architecture

Debezium: Acting as a powerful source connector, Debezium captures change data directly from primary data stores such as MongoDB or PostgreSQL. By streaming change events to Apache Kafka topics, Debezium preserves data integrity while obviating the need for disruptive modifications to the source system.
Javapoet/Java CodeGen Connector: This component plays a pivotal role in processing Data Definition Language (DDL) statements and generating corresponding Java code. By utilizing this generated code, organisations can track and audit data changes without altering their existing codebase, ensuring seamless integration with the “Touchless Audit Trail” system.
Javers 7.1: Leveraging the generated Java code, Javers facilitates the capture of object changes and commit history. This robust library enables comprehensive tracking of data modifications, empowering organizations to maintain a detailed audit trail without impeding the functionality of their source systems.
History Connector: Serving as the orchestrator of the commit process, the History Connector integrates the generated Java code, data changes from Kafka topics, and the Javers library. By facilitating the storage of change history and snapshots in the Entity Version Database (DB), this component establishes a centralized repository for audit and reconciliation purposes.
Entity Version DB: Dedicated to maintaining a comprehensive record of entity versions, commit history, and snapshots, the Entity Version DB serves as a cornerstone of the “Touchless Audit Trail” system. This centralized repository enables efficient querying and analysis of historical data, supporting various audit, debugging, and reconciliation efforts.
History-service: Acting as a dedicated service component, the History-service interfaces with the Entity Version DB, allowing controlled access to historical data through Javers DSL queries. This gateway provides organizations with the means to retrieve commit history and snapshots for auditing, debugging, and reconciliation purposes.

Many a time for business applications it makes sense to view an audit from the parent table in a relational database. The solution is to maintain the parent-child relationship in a config and bump up the version of the parent object with requirements changed child references. This gives seamless experience to see the lineage and look at the history change from a business perspective.

It's important to know the traction of entity changes say by the author. Since we have CDC available we can build metrics by user, table, and customer to build a heat map on traction analytics.

Generated code provides an option to implement factory methods to look up the data that can be used in internal SDKs in type type-safe manner.

Summary

By implementing the “Touch-less Audit Trail” system, organizations can achieve comprehensive auditing of their primary source data without disrupting existing systems or modifying critical codebases. This approach minimizes risks, reduces development efforts, and ensures compliance with data governance and regulatory requirements while maintaining operational efficiency and integrity.

Why and how about Javers library?

At the heart of Javers lies its core functionality, which includes tracking changes, managing snapshots, and providing querying capabilities to retrieve historical data. Javers uses a combination of reflection and annotations to track changes in Java objects, javers captures any mutations and stores them in its repository that is friendly to query the versions.

When developing applications, we often encounter the need to store information about how data has changed over time. This information can be used both to debug the application more easily and to meet design requirements. In this article, I will discuss an implementation pattern to implement an audit trail with the JaVers library, which allows us to automate this process by recording changes in the states of database entities.

Why Mongo:
MongoDB’s horizontal scalability makes it capable of efficiently storing and retrieving versioned data for thousands of entities. Given the nature of query patterns and storage semantics, with the experiments it was found it’s more efficient in terms of storage and query compared to RDBMS (Postgress).

For Mongo,
jv_head_id — Collection for storing the document containing the last value of commitId
jv_snapshots — This collection contains detailed information about changes made to a business entity as a result of a creation, update or delete operation.

Queries:
Shadow — contains the historical data of the domain object, which are reconstructed recreated from snapshots (snapshots)
Change — represents the difference between the properties of two objects
Snapshot — contains historical data of a domain object, represented as a property map with values

Changed properties are maintained in a manner that is consistent to compare any of version with any. exactly working like git history. Its a framework designed for object auditing and diffing in Java, stores only the changed data rather than the entire data for newer versions. This approach is in line with its purpose to provide efficient and focused auditing capabilities. By storing only the changes rather than the entire object state each time, JaVers ensures efficient use of storage and makes it easier to track and review modifications.

In summary, JaVers focuses on storing incremental changes (diffs) between versions of an object, which optimizes storage and provides clear auditing trails for modifications.

Scaling MongoDB

Given the query pattern and the use cases as queries will be particular tenant and across the tenant Its easy to scale this instance vertically and horizontally (well supported by most of the cloud vendors) and with independent shard instances, Depending on the security and compliance requirement we can run dedicated instance OR club multiple into one.

References

JaVers - The Leading Framework for Object Audit and Diff in Java

With JaVers you can forget about troublesome data versioning. Let the changes in your data be managed by JaVers.

javers.org

Kafka Connectors | Confluent Documentation

Video courses covering Apache Kafka basics, advanced concepts, setup and use cases, and everything in between.

docs.confluent.io

Introducing JavaPoet

Square has a new library for generating Java code.

developer.squareup.com