Advanced Data Security: Scope, Touch-less Design, and Implementation Strategies

Praveen Manvi
7 min readMay 19, 2024

--

Opt-In Security Architecture for Relational Databases with Automated Detection, Breach notification and Compliance Enforcement.

Overview

Data security is often viewed unfavourably by developers and users, much like taxes. However, drawing from the wisdom of Chanakya’s taxation principles, the goal of security measures should be to fortify the digital environment, ensuring public welfare, safety, and sustainable development. Chanakya eloquently described the ideal approach: “Collect taxes from the citizens as honeybees collect nectar from flowers — gently and without causing harm.” This principle can be applied to data security and observability, emphasising the need for effective, unobtrusive methods. Here I am explaining a method that can result in zero changes to existing system (code and deployment) following plugin architecture principles.

Most organisations already have robust role-based systems that define CRUD operations at the entity level. However, these systems often do not support on-demand control of data, such as tagging arbitrary users or entities to control access to specific data rows within a set of entities.

Supporting field-level and row-level security is generally challenging because these cases are rare (assuming 1% of use cases) and can complicate the established role/group hierarchy that has evolved over time. This document tries to come up with an approach that aims to provide a simple solution without disturbing the existing system.

The idea is to develop a non-intrusive solution for implementing on-demand security for Row Level Security (RLS) and Field Level Security (FLS). For RLS, The core concept is to prevent access to specific data rows based on a policy, which can be arbitrary. The assumption is that this is a special case and falls into a small percentage of use cases. The goal is to implement this without disrupting the existing system (Users, Roles, Groups, Permissions), with the data residing in an RDBMS.

Use cases

Let me start with 2 requirements that needs to work with existing system and its “explicit include” and by default its include all.

We have ‘SalaryDetails’ entity that has ‘salary’ column needs to be hidden by all the queries except owner and few departments (Finance, Executives, HR) and restricted to few users among them as well. So Role level security will not help over here. Many a time these are transient set of users that will be permitted only for some time.

We have ‘ScanReportDetails’ entity that has ‘report’ (or say some of the ePHI fields) that needs to be encrypted and should be visible to arbitrary set of medical professionals that has consent from owner of the document (patient in this case). Many a time document uploaded by not the owner but some one of hospitals hence it becomes important to allow dynamic set of users to access.

Above use cases won’t fit into standard CRUD based standard ROLE, GROUP, ACLs. Its generally complicated implement hierarchy and allowing these kind of use cases. Its not worth to complicate implementing these special requirements.

Let’s also make sure from whom we are trying save this information from and implications. The Developers, System Admins, DevOps and any user who knows the primary key and can retrieve the data. Please note that it can be accidental and un-intentional updates happening from DevOps who generally will have access to databases without going through the databases. Of course we also want to save from applications accesing these content.

It’s security issues are ‘P0' issues all the time as implications can be huge with data leakage. Having quick ability to recover from this will be great handle and this approach helps make it fast.

Customer identity theft results in financial losses/credit score damage etc…
Loss of trust and reputation of the enterprise in event of data breach and exposure.
Legal liability to the enterprise managing the PII Data
Export, Data pipeline, Caching Service and Reporting Service accessing this data.

Design

The outline of encryption process is depicted in below diagram

Encrypting Fields having sensitive information

Bring your own key (BYOK) functionality, Hashicorp is good solution with managing allowing customers to provide their own encryption keys for enhanced security and also integrates well with other Key Management Systems (KMS). Google Tink provides field-level encryption using industry-standard AES-256 encryption with symmetric.

Data flow diagram identifying the scope.

The “touch less auditing architecture” that I presented in earlier blog can augment this very well audit and compliance requirement in a transparent manner. “Encryption” and “Redaction” are approaches to implement the depending on the requirement although I provided the encryption redaction can be used in the same fashion. Following example provides how to use the library.

https://gist.github.com/pmanvi/45b32d1d34392e9990104bf830a0ee86

import com.google.crypto.tink.Aead
import com.google.crypto.tink.KeysetHandle
import com.google.crypto.tink.aead.AeadConfig
import com.google.crypto.tink.aead.PredefinedAeadParameters
import com.nerdvision.redact.pii.PiiRedactor

fun main() {
val input = "pii data like aadhar number email(pmanvi@outlook.com), phone ( 988641195 )"

val redactedText = PiiRedactor.getInstance().redact(input);
println(redactedText.contains("EMAIL_ADDRESS"))
println(redactedText)
// pii data like aadhar number email(EMAIL_ADDRESS), phone ( PHONE_NUMBER )

AeadConfig.register();
// simple and robust symmetric encryption (AES) that employs a single key to
// both encrypt and decrypt the data.
val aad = "shared secret read from Hashicorp vault".toByteArray()

val aead = KeysetHandle.generateNew(PredefinedAeadParameters.AES256_GCM).getPrimitive(Aead::class.java)

println("Original: $input")
val ciphertext = aead.encrypt(input.toByteArray(), aad);

// Will be stored in database.
println("Ciphertext: ${String(ciphertext)}")

val original = String(aead.decrypt(ciphertext, aad))

println("Decrypted: $original")

}


/**
* built in regex based redaction rules for:
* credentials
* creditCardNumber
* emailAddress
* ipAddress
* name
* phoneNumber
* streetAddress
* usSocialSecurityNumber
* zipcode
* url
* digits
*/

Automated tagging of fields and controlling visibility

If the system is already having 1000s of tables and its almost impossible and its not fruitful also go back and tag the fields. Even for new green field projects, expecting user to be technically/legally aware to tag field correctly is stretch ask & when they look for ePHI and PII fields information available in internet which will too broader and makes system in-efficient, for PaaS provider its there responsibility to make sure that there is no security breach and depending on user to tag the columns will not be scalable option.

Using CDC and ML libraries to mark/encrypt the sensitive fields.

Row Level Security By Group

It really helps to have separate concept as ‘Row Level Security Group’ dedicated for a managing such use cases. Please note the caveat that the whole assumption is that its just 1% use cases and we do not want to complicate existing CRUD Role, ACLs for this. If this is the main use case, I don’t think this approach should not be considered.

One of the reason I did this POC/experiment was to evaluate ‘exposed’ ORM framework from Kotlin and used SpringBoot as framework. xposed is an open-source library (Apache license) developed by JetBrains, which provides an idiomatic Kotlin API. I don’t think I can recommend for large scale applications. It was from brilliant JetBrains engineer. I will have separate write up on the same.

Picking of user sets for RlSGroup and and entity can be driven by Rules Engine based ‘User’ and ‘Entity’ supported by rule engine editor.

Database Entities and relationships, Here User/Entity tables are considered existing ones
Kotlin Data class representation of schema

Sample Data and Queries

Sample Data of Entities and and Row Level Security Groups
SET @TEST_USER_ID = 5;
SET @TEST_GROUP_ID = NULL;

SELECT entity.*
FROM ScanReportDetails entity
JOIN Users u ON entity.ownerId = u.id
LEFT JOIN (
SELECT userId, 1 AS groupId
FROM UserRlsGroups
WHERE groupId = @TEST_GROUP_ID
UNION ALL
SELECT entityId AS userId, 1 AS groupId
FROM EntityRlsGroups
WHERE entityType = 'ScanReportDetails' AND groupId = @TEST_GROUP_ID
) ug ON u.id = ug.userId
WHERE ug.userId IS NOT NULL OR u.id = @TEST_USER_ID;

For userId 5 with group_id (NULL), he will see only his record as he owner, but userId 1 & 2 the rls_group_id is 1 they will see records of other docs as well as their group_id has access to. This query is optimised thinking most of the users will have NULL as group_id without relying on ‘IN subquery’

HIPAA mandates specific security standards for protecting ePHI, including requirements for access controls, encryption, audit controls, and safeguards against un-authorised access or disclosure. Compliance with HIPAA requires adherence to the HIPAA Security Rule, Privacy Rule, and Breach Notification Rule. Under HIPAA, certain information about a person’s health or health care services is classified as Protected Health Information (PHI)

https://github.com/microsoft/presidio

--

--

Praveen Manvi

Senior (Architect, Director, VP) Software Engineer. Building web scale SaaS/Multi-Tenant Solutions in Cloud (AWS,GCP,Azure). Ex-Yahoo, Ex-Oracle, Ex-{startups}.