Getting a Grip on Unstructured Data

By Matt Kelly posted 10-07-2025 11:00 AM

Recommend

Good data governance is the cornerstone of any successful GRC program, so today let’s talk about something that can often be a mortal threat to good data governance — unstructured data.

Unstructured data is just what the name suggests: data that exists somewhere within your enterprise, but lacks any specific structure to help you understand the information it contains. Examples include…

Text-based data, such as emails on an employee’s laptop, AI-generated marketing materials, or customer reviews left on your corporate website
Multimedia data including audio recordings, training videos, security camera footage, or photos posted to a departmental website
Machine data such as log files, sensor readings, or clickstream data

Essentially, any data that isn’t based on a neatly predetermined format (such as the headers and labels you’d see in an Excel spreadsheet or an ERP database) qualifies as unstructured data.

Unstructured data can pose all sorts of risks to your GRC efforts. Hackers could steal it and cause a privacy breach; employees might see something they shouldn’t and dive into insider trading; an artificial intelligence app could consume erroneous data as part of its learning process, and start reaching incorrect conclusions.

Unstructured, however, doesn’t necessarily mean ungoverned. CISOs and other GRC leaders can use several practices to bring order to their unstructured data and keep your risks in check.

How Unstructured Data Brings Risk

First let’s be clear that unstructured data is not inherently bad. On the contrary, you can put unstructured data to good use in several ways. For example, AI systems will happily churn through reams of unstructured data so they can improve their learning models and give better answers. Or you might want to preserve unstructured data so that in the event of litigation, an e-discovery tool could sift through it all and find relevant records of business communications or events.

Instead, risks arise when you don’t know how unstructured data is generated within your enterprise. That is, if you don’t know which business processes generate what types of data, then you also won’t know where that data might be stored, how much of it you have, or what should be done with it.

That’s a governance problem, which GRC teams can address in several ways.

Start With Business Processes

Begin by examining your business processes to understand how new data is created in your enterprise in the first place. Ideally, work with First and Second line operating teams or business analysts to map out specifically how each team creates different types of unstructured data, even if you don’t know the exact contents within each type.

For example, presumably all teams will generate email and text communications. Most will likely use other desktop software tools too, such as Word or PowerPoint apps; and a growing number will use generative AI tools. Other teams will generate more exotic types of data such as activity logs, sensor readings, and so forth.

Also remember to consider whether any third parties upload data into your enterprise. That could be anything from vendor contracts to social media posts, customer reviews, or comments on corporate blogs.

You want to answer three fundamental questions.

Which business processes create what types of data? That’s not quite the same as asking “Who owns this trove of data?” to assign responsibility for it. The question is more like, “Which types of data do you, Business Function X, end up creating?” so that you’ll understand the compliance and security risks each business function poses.

Where does the data exist? You want to know where the data exists physically, so you can understand which data privacy obligations around the world are applicable. You also want to understand where the data exists “logically” within your IT environment. For example, “Vendor contracts are housed in this drive; customer reviews are stored on that one.”

What security, compliance, or contractual obligations might exist for each type of data? This helps to clarify any duties you have to, say, preserve data for litigation; or which databases should have multi-factor authentication for added security; or which types of data can be deleted without worry.

But About the Unstructured Data You Have…

Some readers might now be muttering, “This is all great for any new unstructured data my company might create; but I have piles of unstructured data causing risk right now. What about that stuff?”

Fair point. GRC teams will also need a data discovery process to identify all the unstructured data you already have.

After all, the very nature of unstructured data means that it isn’t created on any formal schedule; employees and third parties just create the stuff as needed. You can’t be sure of how much unstructured data you have unless you have a process to go look for it. You certainly can’t determine the completeness or accuracy of the data if you don’t even know how much you have.

Right now, lots of companies still rely on manual processes for data discovery. That’s cumbersome and an invitation to trouble, if your manual process happens at the wrong time, or overlooks certain parts of your enterprise, or can’t handle unusual new data types.

So one question for GRC teams is how you’ll automate data discovery in the future, to give you a more accurate and immediate picture of the unstructured data you have. Then you can match what that picture tells you with the information governance you want to achieve — finding gaps, correcting business processes, applying controls, and so forth.

That’s what information governance is all about: GRC teams helping the rest of the enterprise understand how the data that they create should be handled, so that those teams don’t foist compliance or security risks onto the enterprise unnecessarily. Build the principles and processes to guide those efforts, and your GRC risks will become much easier to manage.

About Matt Kelly

Matt Kelly is an independent compliance consultant specializing in corporate compliance, governance, and risk management. He shares insights on business issues on his blog, Radical Compliance, and is a frequent speaker on compliance, governance, and risk topics.

Kelly was recognized as a "Rising Star of Corporate Governance" by the Millstein Center in 2008 and named to Ethisphere's "Most Influential in Business Ethics" list in 2011 and 2013. He served as editor of Compliance Week from 2006 to 2015.

Based in Boston, Mass., Kelly can be contacted at mkelly@RadicalCompliance.com.

#SOXPro
#Cybersecurity
#ProcessImprovement
#ControlsManagement
#InternalAudit
#RiskManagement
#SOX

0 comments

26 views

Permalink

https://www.progroups.org/blogs/matt-kelly/2025/10/07/getting-a-grip-on-unstructured-data

Blogs