Prevent Data Breaches: Identity Logs and Machine Learning

An identity platform like ForgeRock sits right in the heart of an enterprise, with a view of all apps, identities, devices, and resources attempting to connect with each other. It turns out that this is a perfect position to gather rich log identity data to use to prevent data breaches.

Prevent Data Breaches? It's Hard.

An attacker has the luxury of finding the easiest way to break-in, whereas a defense team has to secure every possible attack surface. There were 12,440 new breaches in 2018, which was an increase of 424% over the known breach count in 2017. A total of 14.9 billion identity records were found to have been exposed during the year, up from 8.7 billion available in 2017. Some of the hardest breaches to find are micro data breaches, which are spread over a long period of time. Data breaches through micro transactions are becoming more prevalent and are very hard to detect.

Identity Logs and Machine Learning: How To Approach the Problem
  1. We are in the right position: All authentication (AuthN) and authorization (AuthZ) requests and identities behavior events are tracked and logged by our IAM products. 

  2. We stream raws logs into a big data store and store a few months of data. 

  3. We analyze behavioral patterns on logs generated by identities. When we represent these patterns in a latent space, we can use the pattern to train models to detect anomaly behaviors.

Machine Learning Algorithms Showing Promise
Log Embedding

We leveraged word embedding to learn temporal contextual information. This helped us to learn what events naturally occur with identities and group them into a latent space. After further experimentation using a customized version of Non Contrastive Loss, we converged to a 50 dimensional temporal representation of an identity behavior in the latent space.


Identity Log Blog 1.png

We use stacked autoencoder to compress the log embeddings with artificial bayesian noise in the input. The bottleneck layer compressed higher dimension log embeddings into principal lower dimensional representation. The decoder learned to reconstruct from the lower dimensional representation. We used simple reverse indexing methods to map and extract information from the log entries.


Identity Log Blog 2.png
Initial Results

We have over 90% accuracy in predicting anomaly which is used through a graphQL API to predict micro-data breaches. Our t-SNE visualization corroborates these results.

Identity Log Blog 3.png

In Part 2 of this blog series on how to prevent data breaches, which will appear next month, we will delve into metrics, derived metrics, A/B testing, back-testing, and how we improved on this model.

To learn more about ForgeRock Identity Platform, visit us here. If you prefer to speak to someone directly, contact us today.


Who Is Nach Mishra?

Who is Nach? Nach is our head of AI/Data engineering platform team. He has over 10+ years of experience in integrating AI into Cloud Products. Before, joining ForgeRock, Nach worked at Apple and Oracle in technical lead roles building AI into products that have been used by millions of users. Beyond work, Nach is an avid aviator. On the weekend, you will find him hanging out with his family or flying around the Bay Area.

Recent Posts:

Augment Your Legacy IAM

Have you ever run into a situation where you know exactly what you have to do to solve the problem but can’t do it?

Modernize IAM for Government: A Real World Example

I recently had the chance to do a podcast with my friend and colleague Tommy Cathey, ForgeRock RVP of Public Sector. Tommy and I have worked together for years, and I am thrilled that he is bringing his deep public sector knowledge to ForgeRock (and this podcast).

How to Compare Digital Identity Providers for CIAM

Comparing and selecting digital identity providers for CIAM (customer identity and access management) is a daunting task. With the fast-paced nature of business and technology today, you need to ensure that you’re not only able to meet all your current requirements, but those to come.