Skip to content

What is
data as an asset?

Teams first notice the problem when more of their meetings are spent talking about where data is and what it is than what they learned from it. Conversations get confused by different names for the same dataset and delayed as the team searches through folders to remember where they put it. Projects can’t start until they find the one person who knows where the data is. And if that one person has left the company, it will take even longer.
These problems arise when organizations treat data as a single-use, expendable resource instead of a reusable asset that will continue to grow in value. To begin using data as an asset, organizations need to tame the chaos, address the reproducibility crisis, then create the dream of reusability.


The problem we've all faced
Asset 1 copy
Asset 1-1

Uncontrolled flexibility and speed create data chaos

It's aways at the worst time that organizations discover their data has become a tangled mess of data sources, storage solutions and datasets. Because each team, or each team member, needs flexibility to move fast, they adopt their own tools, practices and conventions. Some data ends up in Dropbox, Google Drive, Box or Sharepoint. Other data ends up in AWS or Google Cloud. Each of these decisions is a reaction to an immediate need. And every day this continues, the harder it gets to turn these data resources into assets.


The first step to begin managing data as an asset without sacrificing the flexibility that teams need is to organize it into atomic packages that can be copied from a shared location to anywhere it's needed while maintaining its integrity. Each team creates packages in their preferred environment syncs it with the shared storage. The best of both worlds: The flexibility that teams need to move fast with a shared source of truth where anyone else can find and access what they need.
Asset 2

The Reproducibility Crisis

Even with data in a shared location, many organizations find it difficult or impossible to reproduce results and analysis from just a few months ago. Sometimes its because of a new version of the data. Sometimes it a new version of the code.

Often they can't tell which it is.
Asset 2


The next step to begin using data as an asset is to maximize reproducibility:
  • Store contextual metadata alongside the data so teams can find the exact data they're looking for.
  • Track each version of the data so teams know they're using the same version every time.
  • Link data to code so teams can run the same analysis every time.

If you can't find the data, you can't reproduce the analysis.

Asset 5

The Dream of Reusable assets

Ultimately, it isn't enough for data to be well organized and results to be reproducible. If organizations can't reuse datasets for new purposes, the data is effectively still a single-use resource.

The main thing that prevents datasets from being usable in a wider context is an incomplete understanding of their context and their relationship to other data.
Asset 5

Data without context is not reusable


The final step in moving from data as an expendable resource to data as a reusable asset is to begin linking data to other datasets and resources that will allow teams to recreate and understand the original context, and understand how the data can be leveraged in new ways.
Asset 7

Ready to get started?

These are hairy problems, and they’re more common than most biotech startups are willing to admit. Overcoming them takes time, resources and bandwidth that are in short supply. But more importantly, the problem can feel overwhelming for a biotech team whose expertise is in computational biology and data science, not in data management.
That’s why we created Quilt to address the specific challenges of managing biotech data by minimizing the effort needed to turn your data from a resource into an asset.
Asset 7

Quilt is an Advanced AWS Technology Partner

Quilt Data is an AWS Advanced Technology Partner. Quilt brings seamless collaboration to Amazon S3 by connecting people, pipelines, and machines using visual, verifiable, versioned data packages. Amazon Web Services provides secure, cost-effective, and scalable big data services that can help you build a Data Lake to collect, store, and analyze massive volumes of heterogeneous data.