Prevent Data Breaches: Identity Logs and Machine Learning

An identity platform like ForgeRock sits right in the heart of an enterprise, with a view of all apps, identities, devices, and resources attempting to connect with each other. It turns out that this is a perfect position to gather rich log identity data to use to prevent data breaches.

Prevent Data Breaches? It's Hard.

An attacker has the luxury of finding the easiest way to break-in, whereas a defense team has to secure every possible attack surface. There were 12,440 new breaches in 2018, which was an increase of 424% over the known breach count in 2017. A total of 14.9 billion identity records were found to have been exposed during the year, up from 8.7 billion available in 2017. Some of the hardest breaches to find are micro data breaches, which are spread over a long period of time. Data breaches through micro transactions are becoming more prevalent and are very hard to detect.

Identity Logs and Machine Learning: How To Approach the Problem
  1. We are in the right position: All authentication (AuthN) and authorization (AuthZ) requests and identities behavior events are tracked and logged by our IAM products. 

  2. We stream raws logs into a big data store and store a few months of data. 

  3. We analyze behavioral patterns on logs generated by identities. When we represent these patterns in a latent space, we can use the pattern to train models to detect anomaly behaviors.

Machine Learning Algorithms Showing Promise
Log Embedding

We leveraged word embedding to learn temporal contextual information. This helped us to learn what events naturally occur with identities and group them into a latent space. After further experimentation using a customized version of Non Contrastive Loss, we converged to a 50 dimensional temporal representation of an identity behavior in the latent space.

 

Identity Log Blog 1.png
 
Autoencoders

We use stacked autoencoder to compress the log embeddings with artificial bayesian noise in the input. The bottleneck layer compressed higher dimension log embeddings into principal lower dimensional representation. The decoder learned to reconstruct from the lower dimensional representation. We used simple reverse indexing methods to map and extract information from the log entries.

 

Identity Log Blog 2.png
 
Initial Results

We have over 90% accuracy in predicting anomaly which is used through a graphQL API to predict micro-data breaches. Our t-SNE visualization corroborates these results.

Identity Log Blog 3.png

In Part 2 of this blog series on how to prevent data breaches, which will appear next month, we will delve into metrics, derived metrics, A/B testing, back-testing, and how we improved on this model.

To learn more about ForgeRock Identity Platform, visit us here. If you prefer to speak to someone directly, contact us today.

 

Who Is Nach Mishra?

Who is Nach? Nach is our head of AI/Data engineering platform team. He has over 10+ years of experience in integrating AI into Cloud Products. Before, joining ForgeRock, Nach worked at Apple and Oracle in technical lead roles building AI into products that have been used by millions of users. Beyond work, Nach is an avid aviator. On the weekend, you will find him hanging out with his family or flying around the Bay Area.

Recent Posts:

Prevent Data Breaches: Making Sure The Algorithms Work

An identity platform like ForgeRock is the backbone of an enterprise, with a view of all apps, identities, devices, and resources attempting to connect with each other. This is a very nice position to gather rich log identity data to use to prevent data breaches.

Is Your IAM Vendor Keeping up with the Cloud?

The ForgeRock Identity and Access Management  Platform can be deployed in many different cloud services like AWSGoogle, Azure, and even in 

IoT Edge Controller: Trusted Identity at the Device Level

On Tuesday, ForgeRock announced  the availability of its IoT Edge Controller, which provides consumer and industrial organizations with the ability t