What We Are Doing

Our work is focused around "Data Packages", lightweight "packaging" format for data that provide a basis for convenient publication, installation and management of datasets.

They aim to deliver functionality similar to "packaging" in software and "containerization" in shipping: a simple wrapper and basic structure for transportation of data that significantly reduces the friction in data sharing and integration, supports automation and does this without imposing major changes on the underlying data being packaged.

The specification is now fairly mature and has a growing set of tooling around it. There is a particular focus on tabular data but any kind of data can be "packaged". Because of its lightweight and simple nature it is easy to adopt both for data publishers, data users and data tool makers.

Standards & Patterns

A small set of lightweight 'data package' standards and patterns providing a base structure on which tooling and integration can build.

Tooling & Integration

Making it easy to use and publish data packages from your existing apps and workflows whether that's Excel, R, or Hadoop!

Outreach & Community

Engaging and evangelizing around the concepts, standards and tooling and building a community of users and contributors.

Find out more about our Vision

There's too much friction working with data - friction getting data, friction processing data, friction sharing data.

This friction stops people: stops them creating, sharing, and collaborating with data.

It stops the cycle of find, improve, share that would make for a dynamic and productive data ecosystem.

We need to make an open data ecosystem that, like open-source for software, is useful and attractive to those without any principled interest, the vast majority who simply want the best tool for the job, the easiest route to their goal.


Our Key Principles

1 Focused

We have a sharp focus on one part of the data chain, one specific feature – packaging – and a few specific types of data (e.g. tabular).

2 Web Oriented

We build build for the web using formats that are web "native" such as JSON, work naturally with HTTP such as plain text CSVs (which stream).

3 Distributed

Distributed rather than centralized: we design for a distributed ecosystem with no centralized, single point of failure or dependence.

4 Open

Anyone should be able to freely and openly use and reuse what we build. Our community is open to everyone.

5 Existing Tooling

Integrate as easily as possible with existing tools both by building integrations and designing for direct use – for example we like CSV because everyone has a tool that can access CSV.

6 Simple, Lightweight

Add the minimum, do the least required, keep it simple. For example, use the most basic formats, require only the most essential metadata, data should have nothing extraneous.

Related Projects at Open Knowledge