This project is a community-driven effort of Open Knowledge Foundation Labs.
Anyone can get involved and contributions are welcome. There are 4 main areas of work:
Infrastructure & Tools
We want to make it one click or one line to create or use data provided in our standard formats in any relevant tool or application.
Help make this happen – check open requests »
Documentation & Outreach
We need to provide simple tutorials and guides. We need to engage users and tell relevant communities what we're doing.
Help us make this happen Get in touch »
Join the Discussion
Join the discussion on our mailing list:
We need help suggesting, preparing and maintaining datasets. Note that:
- We are not a general registry for data – we have specific criteria for what the datasets we list
- We package data rather than create it – our focus is to take source data and ensure it is of high quality and in a standard form
- We preserve a clean separation between the data source, the data package and this registry – for example, data packages are stored in git repos hosted separately (preferably github)
Suggest a Dataset
To propose a dataset for addition you open an issue in the Registry with the details of the proposed dataset.
Preparing and Submitting a Dataset
The key steps are:
Preparing a Dataset
All datasets MUST be provided in source form as "data packages" and SHOULD be in Simple Data Format. We also recommend storing in a git repo on GitHub. Simple Data Format Data Packages are designed to be very simple but still have just enough structure to be usable by tools. You can check out an existing, exemplar data package (GDP) on GitHub here.
A Data Package has the following structure on disk:
datapackage.json # metadata and data schemas for this data package README.md # [Optional] README in markdown format my-data.csv # one or more data files in CSV format
datapackage.json is the central file in a Data Package as it
provides both general metadata and schema information in a structured form
that is machine usable. It is a JSON file and full details of its structure are in the data packages specification.
All data must be provided as CSV files (UTF8 encoded). We recommend having only one data file in a data package. See the Simple Data Format specification for more information.
For dataset to become officially published – and therefore featured on this site – it is assessed against the following criteria: