What is Shadow Data? 

In simple terms, shadow data is your company’s data that is copied, backed up or housed in a data store not governed, under the same security structure, nor kept up-to-date by security or IT. 

​As an example, think about your main production data store. Of course, this is where you have your content, applications and data accessible to all those who require it, but you also are keenly aware of it, keep it up to date, and have rigid security protocols in place. By contrast, consider the copies that are made of the data in that production database that are not being secured: the copy that exists in a test environment, in an unmanaged backup that started as a lift and shift, or orphaned backup and abandoned databases.
 

Shadow data

What is the Difference between Shadow IT and Shadow Data

You’ve likely heard the term “shadow IT.” This is the technology, hardware, software, applications or technology projects that are run outside the governance and oversight of your corporate IT. 

At one point, shadow IT was scary, a major threat to the security of an organization’s data. However, as the challenge became more known and companies took it seriously, teams figured out how to manage and contain it. 

Since then, major advancements in technology – like the mass migration to the cloud – have brought us data democratization, which in itself is a boon to all organizations and consumers. Your data is important, and allowing greater access to this data for those who need it creates more opportunities, more effectiveness. 

However, the cloud also allowed data to be spread around to various places you may not even be tracking. Gone are the days of completely self-contained, on-premise systems. With greater access comes greater risk. And now a new threat has arrived. One that, in comparison, dwarfs the risk of shadow IT. It’s the largest threat to your data security: shadow data. 

Do you know where your sensitive data lives? And do you have the tools and resources to manage it? Shadow data is a prominent yet frequently overlooked problem, but there are tools and resources to tackle it and secure your most valuable currency – your data.

Why Does Shadow Data Occur?

As more and more companies move to the cloud the landscape of cloud technologies expands and becomes more complex. As more and more developers utilize the flexibility of the cloud to spin up new data storage assets with the click of a button, without consulting security or IT, the data attack surface also increases. Add in data democratization and the lack of a perimeter and shadow data becomes increasingly prevalent, as does the risk of data breach as traditional data security strategies fail to keep up. 

There are four major factors that have changed cloud data protection and given way to the spectre of shadow data: 

  1. The proliferation of technology and the associated high complexity: Dozens of technologies are used to store, use and share data in the cloud. They can be managed by the service provider or developers directly, and often each one is configured differently. This has created multiple architectures that rapidly change and bring new risks. Today, developers can spin up or copy an entire datastore in seconds.

  2. Data protection teams have fallen behind: Today, data protection teams can’t stop developers from making changes but merely try to set guardrails to allow fewer mistakes. They are relegated to a ‘catch up’ mode. Continually kept in the dark, they can no longer assume they know where all the data is. So they spend more time asking questions and hoping that policies are being followed.

  3. Data democratization: As more value is placed on the concept of making data available to all that need it, the risks increase. And manual efforts to categorize and secure all the data stores are ineffective.

  4. No on-premises perimeter: Cloud data is a shared data model. It’s meant to be accessible from anywhere, given the right credentials. There is no longer a single choke point of protection and monitoring.

By viewing this video, you are providing your express consent that your viewing history has been captured and may be shared with our affiliates or third-party providers that may also combine with other data they collect about you, e.g. your use of their services. We and our third-party providers may use this information to present you with offers, promotions, or other marketing that we think you'll find relevant.

What Are Examples of Shadow Data?

Think about where all of your data might live. And then think about where copies of this data may exist. In a typical example, you likely have the following:

  • Test Environment: Most organizations have a partial copy of their production or RDS database in a development or test environment, where developers are building applications and testing programs. Many times developers are moving quickly and may take a snapshot of the data but fail to properly remove or secure the copied data. Or simply forget about it.

EC2
  • S3 Backups: You’ll also have at least one backup data store, as a means to be prepared for any breaches or damage to your production environment. It’s your contingency plan and it stores exact copies of your production data. But these are often an afterthought and less monitored therefore can mistakenly expose large amounts of data to the public.

Amazon
  • Leftover Data from Cloud Migration: As many organizations move to the cloud, it obviously requires a “lift and shift” data migration project, where the original database was moved into a modern cloud data store. But more often than not, the original data never got deleted, so that lingering instance remains unmanaged, unmaintained and often forgotten.

MySQL
  • Toxic Data Logs: Developers and log frameworks log sensitive data, which creates sensitive files that are not classified as sensitive, lack the proper access control and encryption, and can be easily exposed.

Kubernetes
  • Analytics Pipeline: Of course, your data is only useful if you can consistently reference and analyze it, so many companies will store data in some type of analytics pipeline using the likes of Snowflake or others. 

snowflake

All of these are unique data stores in and of themselves and any of them can be a dangling S3 backup, an unlisted embedded data store, or just become a stale data store. The problem is, they all contain sensitive data: customer information, employee information, financial data, applications, intellectual property, etc. And most likely they’re not visible to your data protection teams. They’ve become invisible, unmanaged, and unsecured. 

This is your Shadow Data. Check out our insightful shadow data infographic for a visual representation of the threat.

Data Breach Examples caused by Shadow Data

Shadow data can be your biggest vulnerability. In a lot of cases, this data is not used anymore. Forgotten about or not even visible or accessible to corporate IT teams. On the whole, the people in your organization who should know about these stores of data don’t know about them, leaving it open prey to cybercriminals. 

In fact, most data breaches often occur in shadow data environments. 

Take for example the very recent SEGA Europe data breach, where the massive gaming company inadvertently left users’ personal information publicly accessible on an Amazon Web Services S3 bucket. 

The mishap left wide open for hackers and cybercriminals to dig into many of SEGA Europe’s cloud services, along with API keys to their instances of MailChimp and Steam, which provided full access to these services for anyone who found it. 

Fortunately for SEGA, the joint efforts of SEGA’s internal security team, combined with a team of external security researchers, the mishap was discovered and access to sensitive data was contained. 

How did this happen? Shadow data. Someone inadvertently stored secure, sensitive files in a publicly accessible AWS S3 bucket and didn’t realize the extent of vulnerability. It is quite easy to misconfigure an Amazon AWS bucket, and that little mistake could have cost the company irreparable damage.  

Twitter also experienced something quite similar, where, due to a ‘glitch’ that caused user’s personal information and passwords to be stored in a readable text format on their internal system, rather than disguised by their process known as “hashing”.  

The mishap caused embarrassment and scrutiny for Twitter. The major social platform had to publicly urge its more than 330 million users to change their passwords. 

For many organizations, a simple breach of one of their shadow data environments could be crippling. 

How to Discover, Monitor and Minimize the Risks of Shadow Data

Unmanaged data stores inevitably occur. Shadow data occurs. It’s unintentional, and it’s a normal byproduct of an organization moving at the pace of the cloud. But there are ways to ensure you’re protected and have the proper visibility into every place your data may live. 

  • Continuous Monitoring

  • Catalog Data – Relationships, Flows & Dependencies

  • Data Hygiene

  • Proactive Data Protection

Cloud-native monitoring solutions built in the cloud, for the cloud now exist to combat shadow data and allow data protection teams to move at the speed of the cloud. These cloud data security solutions must Discover and Classify continuously for complete visibility, Secure and Control to improve risk posture and Detect Leaks, and Remediate without interrupting data flow.

As you evaluate solutions to protect your sensitive cloud data, ensure you have a platform that can scan your entire cloud account and automatically detect all data stores and assets, not just the known ones. Ensure that once data is scanned, the solution can categorize and classify the data, maintaining a cloud datastore framework that allows you to prioritize and manage all of your assets effectively. 

Having full data observability lets you understand where your shadow data stores are and who owns them. Doing so leads to a secure environment, faster, smarter decision-making across the enterprise, and the ability to thrive in a fast-moving, cloud-first world.