How AI can alleviate data lifecycle risks and challenges


The volume of business data worldwide is growing at an astounding pace, with some estimates showing the figure doubling every year. Over time, every company generates and accumulates a massive trove of data, files and content – some inconsequential and some highly sensitive and confidential in nature.



Throughout the data lifecycle there are a variety of risks and considerations to manage. The more data you create, the more you must find a way to track, store and protect against theft, leaks, noncompliance and more.


Faced with massive data growth, most organizations can no longer rely on manual processes for managing these risks. Many have instead adopted a vast web of tracking, endpoint detection, encryption, access control and data policy tools to maintain security, privacy and compliance. But, deploying and managing so many disparate solutions creates a tremendous amount of complexity and friction for IT and security teams as well as end users. The problem with this approach is that it comes up short in terms of the level of integration and intelligence needed to manage enterprise files and content at scale.


Let’s explore several of the most common data lifecycle challenges and risks businesses are facing today and how to overcome them:


Maintaining security – As companies continue to build up an ocean of sensitive files and content, the risk of data breaches grows exponentially. Smart data governance means applying security across the points at which the risk is greatest. In just about every case, this includes both ensuring the integrity of company data and content, as well as any user with access to it. Every layer of enterprise file sharing, collaboration and storage must be protected by controls such as automated user behavior monitoring to deter insider threats and compromised accounts, multi-factor authentication, secure storage in certified data centers, and end-to-end encryption, as well as signature-based and zero-day malware detection.


Classification and compliance – Gone are the days when organizations could require users to label, categorize or tag company files and content, or task IT to manage and manually enforce data policies. Not only is manual data classification and management impractical, it’s far too risky. You might house millions of files that are accessible by thousands of users – there’s simply too much, spread out too broadly.

Moreover, regulations like GDPR, CCPA and HIPAA add further complexity to the mix, with intricate (and sometimes conflicting) requirements. The definition of PII (personally identifiable information) under GDPR alone encompasses potentially hundreds of pieces of information, and one mistake could result in hefty financial penalties.


Incorrect categorization can lead to a variety of issues including data theft and regulatory penalties. Fortunately, machines can do in seconds–and often with better accuracy–what it might take years for a human to do. AI and ML technologies are helping companies quickly scan files across data repositories to identify sensitive information such as credit card numbers, addresses, dates of birth, social security numbers, and health-related data, to apply automatic classifications. They can also track files across popular data sources such as OneDrive, Windows File Server, SharePoint, Amazon S3, Google Cloud, GSuite, Box, Microsoft Azure Blob, and generic CIFS/SMB repositories to better visualize and control your data.


Retention – As data storage costs have plummeted over the past 10 years, many organizations have fallen into the trap of simply “keeping everything” because it’s (deceptively) cheap to do so. This approach carries many security and regulatory risks, as well as potential costs. Our research shows that exposure of just a single terabyte of data could cost you $129,324; now think about how many terabytes of data your organization stores today. The longer you retain sensitive files, the greater the opportunity for them to be compromised or stolen.