Open data publication guide
Guidance for public bodies and other organisations in Wales on publishing their data openly.
This file may not be fully accessible.
In this page
Introduction
This guide is intended to help public bodies and other organisations in Wales publish their data openly and make it as accessible as possible so that it is available for reuse.
Open data is data that’s freely available to everyone to access, use and share*
* But must be published under an open licence
Publishing data openly should be the default unless there are specific reasons not to.
The Open Data Institute (ODI) Data Spectrum helps highlight when data may not be suitable for open publication. It provides a visual representation of data classed as open, shared or closed and the associated limitations in terms of access. In such situations, it may be possible to share data with other individuals/organisations if certain conditions are satisfied.
For this guide, publication of open data relates to non-commercially sensitive and non-personal public sector data. As such, the following should not be published as open data:
- any personal or sensitive data
- any data that you do not own
- any data under copyright where you do not have permission to do so
Organisations are encouraged to make their open data as accessible as possible. The following 5-star rating scheme, developed by Sir Tim Berners-Lee, provides an indication of the differing levels of openness. It is recommended that, where possible, organisations aim to publish their data at the 3 star level as a minimum.
5 Star rating scheme
1 star - Make your data available online (whatever the format) under an open licence
2 stars - Make your data available as structured data but proprietary format (e.g. Excel instead of image scan of a table)
3 stars - Make your data available in a non-proprietary open format (e.g. CSV as well as Excel)
4 stars - Use identifiers (URIs) to denote things, so that people can point at your data
5 stars - Link your data to other data to provide context
Who is this guide for?
Although this guide is primarily for public sector bodies in Wales, the principles outlined in this document may also be applicable to third sector, academic and private sector organisations in Wales.
This guide is for publishers and potential publishers of open data. It aims to help organisations publish the datasets specified within the guide, as well as any other datasets they wish to publish openly.
Why publish open data?
Public sector bodies have a responsibility for making their data openly available. There are also significant benefits to taking this approach, including:
Greater openness, transparency and accountability
Helps people understand how decisions are made, where money is spent and how organisations are performing.
Improved services
Enables better planning and targeting of services so that they meet people’s needs.
Innovation and economic growth
Increasing availability of data provides opportunities to drive innovation, which in turn helps increase economic growth.
Improves data quality
Increased likelihood of inaccuracies being identified leading to improved data quality.
Improved decision-making
Enables people to make more informed decisions.
Efficiencies
Helps save time and money by simplifying processes and reduces requests for data.
Which datasets to publish openly?
It is recommended that all public bodies in Wales regularly publish the following data openly:
- Organisational charts
- Senior management pay
- Public Sector Equalities Duty (PSED) data
- Welsh Language data
- Freedom of Information (FoI) requests
- Expenditure exceeding (£500)
- Government procurement card transactions
- Procurement information on contracts
1. Organisational charts
Organisations should publish an organisation chart covering staff in at least the top 3 levels of the organisation. The following information should be included for each member of staff included in the chart:
- grade
- job title
- department and team
- whether permanent or temporary staff
- contact details
2. Senior management pay
As a minimum, the following information should be published, in line with the Accounts and Audit (Wales) Regulations 2014:
- the number of employees whose remuneration in that year was at least £60,000 in brackets of £5,000
- details of remuneration and job title of certain senior employees whose salary is at least £60,000, and
- employees whose salaries are £150,000 or more must also be identified by name
Note that where an employee or officer who is employed or engaged on a temporary or part-time basis, the sums of £60,000 and £150,000 should be reduced pro rata.
3. Public Sector Equalities Duty (PSED) data
Organisations are required under the Equality Act 2010 to publish equality data annually. Organisations should publish information on their workforce in relation to the following protected characteristics:
- age
- disability
- gender reassignment
- marriage and civil partnership
- pregnancy and maternity
- race
- religion or belief
- sex
- sexual orientation
4. Welsh Language data
Organisations that are subject to the Welsh language standards, should aim to openly publish data in relation to the compliance notices they have received from the Welsh Language Commissioner.
On an annual basis, organisations should publish information on their employees Welsh language capabilities, in relation to oral, listening, reading, writing and understanding skills.
- the number of employees with Welsh language skills by level of capability
- the percentage of employees with Welsh language skills by level of capability
- Welsh language skills assessment framework used
5. Freedom of Information (FoI) requests
The Freedom of Information Act 2000 provides public access to information held by public authorities. Therefore, organisations should aim to publish the following information:
- date request was received
- information requested
- response
- date of response
6. Expenditure exceeding (£500)
Organisations should publish details of each individual item of expenditure that exceeds £500. This includes items of expenditure such as:
- individual invoices
- grant payments
- expense payments
- payments for goods and services
- grants
- grant in aid
- rent
- credit notes over £500, and
- transactions with other public bodies
For each individual item of expenditure, the following information should be published:
- date the expenditure was incurred
- department which incurred the expenditure
- beneficiary
- summary of the purpose of the expenditure
- amount
- Value Added Tax that cannot be recovered, and
- merchant category (e.g. computers, software, etc)
7. Government procurement card transactions
Organisations should publish details of every transaction on a Government Procurement Card. For each transaction, the following details should be published:
- date of the transaction
- department which incurred the expenditure
- beneficiary
- amount
- Value Added Tax that cannot be recovered
- summary of the purpose of the expenditure, and
- merchant category (e.g. computers, software, etc)
8. Procurement information
In publishing procurement related data, public bodies that are part of the devolved Welsh public sector should publish transparency information via Sell2Wales following the Open Contracting Data Standard (OCDS).
Further guidance is contained in WPPN 01/24: Transparency – publication of contract award notices.
How to publish data openly?
The main steps for publishing data openly are:
- Use an accessible open format for publishing your data
- Structure your data clearly
- Provide metadata
- Apply an open licence
- Make your data available
1. Use an accessible open format
When formatting your data, you should ensure it is well structured and, where practicable, available in an open format so that it is as accessible as possible. Ideally, you should also aim to make your data available in machine-readable format where possible.
Open formats are known as non-proprietary formats as they do not require the use of licensed software to be able to access the data. Using open formats is preferable as proprietary formats, such as Microsoft Excel, can limit people’s ability to access data stored in this format.
Machine-readable format means putting data in a structured format so that it can be automatically processed by a computer by using code.
Which open format should I use?
The most appropriate open format to use will depend on the type of data that you are planning to publish. The following provides some of the open formats that are available for the different data types:
Tabular data
The most common structure for data is tabular. Data is organised into rows and columns listing values, such as expenditure.
Open formats include ODS and CSV
Complex data
This is where relationships exist between data points, geospatial data is a good example of complex data.
Open formats include JSON, XML, KML, GML and GeoJSON
Textual data
This is data presented in the form of words, sentences and paragraphs, such guidance documents and reports.
Open formats include ODT, ODP and HTML
Images
Graphical or pictorial data.
Open formats include: JPEG 2000 and PNG
Audio
Sound or audio data.
Open formats include: FLAC and ogg
Video
Data comprising of a series of moving images.
Open formats include MPEG-4/MP4, WebM and MKV
2. Structure your data clearly
Whatever the type of data you are publishing it is important that it is well structured. By this, we mean that the data is clearly laid out so that it is easy to use and understand.
It is important that the data is structured in a consistent way so that it meets relevant data standards. Using agreed data standards whilst structuring your data will enable the data to be easily understood, used and shared. In addition to cross-cutting data standards, such as formatting dates and time, there are also topic specific standards, such as data standards relating to health care and contracting. Further information on open standards for UK government data is available on GOV.UK.
Try to maintain a consistent structure to your data. If possible, when updating your data, avoid changes to how the data is structured/formatted. So, for example, try not to change the names of your data fields/items or the order in which they appear.
3. Provide metadata
Metadata is descriptive information that helps users better understand your data and use it effectively. It can inform people who produced the data and when, how often it will be updated and highlight any limitations they need to be aware of. It can also help make data easier to find.
The amount of metadata and the level of detail you present alongside your data will vary depending upon the data itself and its intended use.
Where appropriate it is advised that you use a recognised metadata standard to ensure consistency in the metadata presented. However, if this is not proportionate, as a minimum you should provide the following metadata elements, which appear in most metadata standards, alongside your data:
Metadata elements
Title
A clear, easily understandable title.
For example, ‘Listed buildings in Wales, 2023’.
Summary
A brief description of what your data covers.
For example, ‘This dataset shows the location of buildings and structures of national importance that are given legal protection by being placed on a ‘List’ of Buildings of Special Architectural or Historic Interest.’
Notes
Notes about the limitations of the data, revisions or adjustments to the data, how regularly the data will be updated, etc.
For example, ‘The Listing of buildings and structures is an ongoing process. Buildings and structures can also be removed from the List in a process called ‘De-Listing’. To avoid re-using old data, users should periodically obtain the latest version of the data.’
Published by
The organisation and/or team responsible for the data published.
For example, ‘Cadw - Historic Environment Service of the Welsh Government’.
Contact details
Details of who to contact if users have queries about the data.
For example, ‘CADW@gov.wales’.
Last updated
Date of when the data last updated.
For example, 05 June 2023.
Licence
The licence the data is available under.
For example, ‘Open Government Licence (OGL)’.
Keywords
Words commonly used in relation to the data being published.
For example, ‘Listed buildings, Historic buildings, Historic environment’.
Metadata standards: which one to use?
If you plan to use a metadata standard, in determining which one to adhere to you need to consider if there are any statutory or non-statutory requirements that specify which metadata standard should be used.
For example, the EU INSPIRE Directive mandates the collection of metadata for spatial data that meet certain requirements. Therefore, a standard such as the UK GEMINI metadata standard would need to be used as this is compatible with the INSPIRE requirements.
If you are planning to publish your data via a publishing platform/website then you may be required to follow a specified metadata standard.
If none of these restrictions apply, when selecting the most appropriate standard to use, consider what information people might need to be able find your data, understand it and use it.
Where appropriate, it is recommended that you use one of the following metadata standards:
DCAT
The Data Catalog Vocabulary (DCAT) defines a standard way to publish machine-readable metadata about a dataset.
Dublin Core
Dublin Core is a commonly used metadata standard used to describe a variety of physical and digital resources.
UK GEMINI
UK GEMINI (GEo-spatial Metadata INteroperability Initiative) is the UK geographic metadata standard.
4. Apply an open licence
Data and information is subject to copyright. Terms and conditions for re-use of open data must be clearly specified. This may take the form of a licence such as the Open Government Licence or may take the form of a statement issued by the copyright owner. This explains to users how they can re-use the data.
If you are unsure about which licence to apply or how to write a copyright statement, please seek advice from your own organisation.
Which licence to use?
The default open licence for public bodies is an Open Government Licence (OGL). The licence sets out the terms and conditions of re-use
Further information on the OGL and how to apply it can be found on the National Archives website.
Other open licence information
Where OGL is not suitable, there are other open licences that can be used. The Open Data Institute (ODI) recommends that you should chose a licence that supports your open data business model. To help you make an informed decision about which is most suitable type of open licence for your data, the ODI website provides 2 comprehensive overviews:
Publisher’s Guide to Open Data Licensing – Open Data Institute (ODI)
Reuser’s Guide to Open Data Licensing – Open Data Institute (ODI)
5. Make your data available
You should publish your data via your own website or a specific publishing platform. Also, for your data to be as accessible as possible, consideration should also be given to Welsh language and accessibility standards.
Welsh Language
When publishing your data, you must consider the Welsh language duties that apply to your organisation. Whilst there may not be a requirement to publish your data bilingually it is recommended that, where possible, you make your data and metadata available in both Welsh and English.
How best to present data and metadata bilingually will need to be considered on a case-by-case basis. It is preferable to display Welsh and English alongside each other, where possible. However, this might not be the best option when presenting detailed and/or complex data. If, for example, you are presenting detailed data with lots of metadata in spreadsheet format, people may find the data easier to consume and understand if you present the Welsh and the English in different sheets or workbooks.
Therefore, when presenting your data bilingually you should consider how complex and detailed your data is and what is the best option for making your data as accessible and re-useable as possible for users.
One resource that may prove useful in making your data and metadata available bilingually is the Welsh Government’s website for translators, BydTermCymru.
Accessibility standards
The Public Sector Bodies (Websites and Mobile applications) Accessibility Regulations 2018 requires public sector bodies to take the necessary measures to make their websites and mobile applications more accessible by making them perceivable, operable, understandable and robust, unless it would be disproportionate to do so.
Whilst the regulations only apply to website and applications, the whole concept around open data is about making data accessible to everyone by removing barriers to use. Therefore, consideration should be given to how and where you publish your data so that it is as accessible as possible.
Where to publish open data?
There are generally 2 options for publishing your data openly:
- Use your existing website
- Use a publishing platform
The simplest and most cost effective option, particularly where you only have a small number of datasets, is to publish your data on an existing website. Be aware that this option may limit the accessibility of your data. You will also need to consider how you maintain updates to your data and how to manage the growing number of datasets you produce.
Using a publishing platform offers greater accessibility in terms of both finding and using your data. Publishing platforms are essentially websites where data is located that people can search and download. Some of these platforms also have the functionality that ensures the data is machine-readable.
If you opt to use a publishing platform, you must then decide on whether to:
- use an existing publishing platform
- purchase an open data publishing platform
- develop your own publishing platform
Where possible it is recommended that you use an existing platform. From a user’s point of view, fewer publishing platforms makes it easier to find the open data they need. Re-using existing platforms also saves on costs and resources.
If no suitable platforms are available then there are open data platforms, which can be purchased that may be more cost effective than developing your own platform.
Which publishing platform should I use?
DataMapWales
DataMapWales is an authoritative geospatial publishing platform for the public sector in Wales. Direct access to data also provided via APIs. It supports geospatial data.
Contact: Data@gov.wales
OpenDataWales
OpenDataWales is a website for public sector bodies to publish their open data. It supports tabular data.
Contact: Enquiries@data.cymru
StatsWales
StatsWales is a publishing platform for official and national statistics, and management data. The platform allows users to manipulate and access data via download and APIs. It supports tabular data.
Contact: stats.web@gov.wales
When choosing which platform is best for you, you may wish to consider the following:
- Is there a platform that publishes open data for your specific sector/topic/data type?
- Are there any costs attached with using the platform both initially and on an ongoing basis?
- Does the platform provide all the functionality that you require?
- Does it impose any restrictions such as certain data and metadata standards?
- How sustainable is the platform? Are you confident that it will continue to be supported?
- How easy is it for you upload your data to the platform?
- Does the platform conform to any organisational standards you may have?