Core Data Curators Guide
- Please take 2m to introduce yourself in the discussion forum so that other team members can get to know you
- Read the contributing guide below so you:
- understand the details of the curator workflow
- can work out where you'd like to contribute
- Stop: have you read the contributing guide? The next items only make sense if you have!
Now you can dive in with one or both of:
- Researching: start reviewing the current queue - add new items, comment on existing ones etc
- Packaging: check out the “Ready to Package” section of the queue and assign yourself (drop a comment in the issue claiming it)
Fig 1: Overview of the Curation Workflow [Source Drawing - Full Size]
There are 2 areas of activity:
- Preparing datasets as Core Data Packages - finding them, cleaning them, data-packaging them
- Maintaining Core Data Packages - keeping them up to date with the source dataset, handling changes, responding to user queries
Each of these have sub-steps which we detail below and you can contribute in any and all of these. [In fact given how many of us there are you will almost end up doing several of these at once!]
Preparing Datasets as Core Data Packages
There are different areas where people can contribute:
- Packaging up data
- Quality assurance
- Final Publication into the official core datasets list
Often you will contribute in all 4 by taking a dataset all the way from a suggestion to a fully packaged data package published online.
This involves researching and selecting datasets as core datasets and adding them to the queue for packaging - no coding or data wrangling skill is needed for this
- To propose a dataset for addition you open an issue in the Registry with the details of the proposed dataset.
- Identify relevant source or sources for the dataset
- To propose a dataset you do not have to know where to get the data from (e.g. you could suggest “US GDP” as a core dataset without yet knowing where to get the data from)
- Discuss with Queue Manager(s) (they will spot your submission and start commenting in the github issue)
- If good => Shortlist for Packaging - add Label “Status: Ready to Package”
2. Packaging up data
Once we have a suggested dataset marked as "ready to package" we can move to packaging it up.
How to package up data is covered in the general publishing guide.
3. Quality Assurance
This involves validating and checking packaged datasets to ensure they are of high quality and ready to publish.
- Validate the Data Package and review the data in the Data Package
- Post a validation link and a view link in the comments for the issue in the Registry related to your Data Package.
We have a few extra specific requirements:
Maintaining Data Packages
Many data packages package data that changes over time - for example, many time series get updated monthly or daily.
We need people to become the "maintainer" for a given dataset and keep it up to date by regularly adding in the new data.
List of datasets needing a maintainer
Core Data Assessment Criteria
For a dataset to be designated as "core" it should meet the following criteria:
- Quality - the dataset must be well structured
- Relevance and importance - the focus at present is on indicators and reference data
- Ongoing support - it should have a maintainer
- Openness - data should be open data and openly licensed in accordance with the Open Definition
Guide for Managing Curators
Intro Email for New Joiners
You are being added to the Core Data Curators mailing list as you indicated your interest in the project through the online form.
This list is announce-only and will be used rarely. General discussion takes place in the public forum:
To kick-off your core data curatorship we encourage you to:
Introduce yourself in forum here: http://discuss.okfn.org/t/core-data-curators-introductions/145/24
Take a look at the Core Data Curators guide: http://data.okfn.org/doc/core-data-curators