Five Star Linked Open Data

Arguments in support of and real world impediments to delivery

The Data Revolution and Government Transparency

In this second post in the series on the topic of The Data Revolution and Government Transparency, I wish to cover some of the benefits of linked open data for public sector bodies and their projects, and some of the issues holding back delivery of such work.

Background

May 26, 2010

10:16

Tim Berners-Lee addresses an audience in Washington DC to introduce and explain the 5-star rating system for Linked Open Data.

As a background to this post, a degree of knowledge about public sector projects is desirable, and an overview of the five star rating system for Linked Open Data is required.

Fortunately Sir Tim Berners-Lee, creator of the World Wide Web, provides an explanation of the rating system for Linked Open Data in this YouTube video, recorded on his visit to the Gov 2.0 Expo in Washington DC in 2010.

Berners-Lee summarised the Principles of Linked Data in a Design Issues note he published to the World Wide Web Consortium (W3C) back in 2006:

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names
  • When someone looks up a URI, provide useful information
  • Include links to other URIs. so that they can discover more things

5-Star Linked Data Primer

5 Star Steps

"5 Star Steps" by Michael Hausenblas

   

5-star linked data is a concept developed by Berners-Lee to describe the highest level of openness and accessibility of data on the web. It represents a set of principles and practices that enable data to be easily connected and understood by both humans and machines.

At the core of 5-star linked data is the use of standardized formats, such as RDF (Resource Description Framework), to structure and represent data in a machine-readable way. This allows for seamless integration and linking of information from diverse sources. Additionally, 5-star linked data emphasizes the use of URIs (Uniform Resource Identifiers) as unique identifiers for resources, enabling reliable referencing and cross-referencing.

Another crucial aspect of 5-star linked data is the incorporation of semantic annotations. By using ontologies, vocabularies, and controlled vocabularies, data can be enriched with meaningful context, facilitating better interpretation and understanding. This semantic layer enables advanced reasoning and inference, supporting more sophisticated analysis and knowledge discovery.

Furthermore, 5-star linked data encourages the publication of data openly and with a license that permits reuse. This openness promotes transparency, collaboration, and the creation of value-added services by allowing others to build upon and integrate the data into their own applications.

Overall, 5-star linked data represents a comprehensive approach to organizing, connecting, and sharing data on the web. By adhering to its principles, data becomes more interoperable, discoverable, and reusable, unlocking the potential for enhanced data integration, advanced analytics, and new insights in various domains.

The World Wide Web Consortium offer a more detailed primer,  Linked Data Platform 1.0 Primer

Open Data (1-Star → 3-Star)

The requirements to publish open data datasets are often narrow in nature, coming in the form of top-down mandates to expose particular metrics or such as Freedom of Information (FOI) requests to answer particular questions. In both cases they are not full transparency, there are selective.

To publish such datasets, most often means getting database administrators and/or developers to perform extracts (queries/reports) from data repositories (most often relational in nature) that they are likely familiar with (so domain knowledge most often a prerequisite). That data is then often transformed and possibly aggregated into another representational view, before releasing that information to the requesting party or wider world. That information is most often released in human-readable formats such as PDF reports (1-star data), or machine-readable formats such as Excel or CSV (2-star data and 3-star data respectively).

The consumers of that information, then have an onerous activity of their own, to transform what they receive into a form of data suitable for their own needs and, where appropriate, relate that transformed data to other datasets (possibly from other parties) to perform queries and produce reports across datasets.

So it is a burdensome and time-consuming activity for both the providers/publishers of sub 4-star data, and for the consumers of that data too.

Linked Open Data (4-Star & 5-Star)

It is actually fine to publish PDF reports, Word documents, Excel spreadsheets or CSV data. However, to further the open data cause, these publications should be issued alongside more machine-readable, more raw, more granular, 4-star and 5-star data files / endpoints.

4-star and 5-star linked data provide a more structured, interconnected, and accessible web of data, enabling more powerful data integration, analysis, and discovery. These qualities contribute to increased efficiency, improved decision-making, and the potential for new insights and innovations in various domains.

Key Benefits:

  • Data QualityBy following the principles of linked data and incorporating semantic annotations, 4-star / 5-star linked datasets often have higher data quality. The use of standardized ontologies, vocabularies, and controlled vocabularies helps ensure consistency and accuracy, reducing ambiguity and errors in data interpretation.
  • Data Reuselinked data of higher star ratings is more reusable and adaptable. It provides a solid foundation fore the fusion of data from different sources, for creating data mashups, and repurposing data for different use cases. This flexibility encourages collaboration, innovation, and the creation of value-added services.
  • Data Interoperability4-star / 5-star linked data adheres to well established open standards, principles and best practices, making it easier for applications from different domains to exchange and interpret data. This promotes greater interoperability, enabling seamless integration of data from various sources.
  • Data Integration4-star / 5-star linked data facilitates improved integration of diverse data sources. With the use of common identifiers, consistent linking, and alignment of data structures, it becomes easier to combine and relate information from different domains and disciplines. This enables more comprehensive analysis, cross-domain insights, and the development of innovative applications.
  • Data DiscoverabilityHigher-rated linked data, particularly 5-star linked data, provides additional metadata and context, improving the discoverability of information. This enables users and applications to find and access relevant data more effectively.
  • Knowledge DiscoveryThe improved structure and expressiveness of 4-star / 5-star linked data allow for more advanced reasoning and inference capabilities. This can enable the discovery of new insights and supports advanced data analytics and machine learning algorithms.

Data Freedom!

Free your data from your apps! The project from which they were created, the staff that worked on the original delivery..

5-star linked data promotes the freedom and liberation of data from specific applications or systems. By adhering to the principles of 5-star linked data, data becomes more interoperable and can be easily shared and connected across different platforms, applications, and domains.

The use of standardized formats, such as RDF, and unique identifiers like URIs, allows data to be decoupled from specific applications and infrastructure. This means that data can be accessed, reused, and repurposed by various applications, services, and users without being tightly bound to any specific software or platform.

Furthermore, the emphasis on open publication and licensing encourages the release of data in a way that enables others to freely use, build upon, and integrate it into their own applications and services. This freedom to access and reuse data fosters collaboration, innovation, and the creation of value-added services that can leverage the available data in new and creative ways.

5-star linked data promotes the liberation of data from application and domain specific silos, allowing data to flow more freely, be reused across different contexts, and unlock its full potential in driving innovation and creating value.

The needs of the wider world or open data community are seen as secondary, to delivering functionality for known users (who most often they work with or know).

So why aren't more public bodies offering (or consuming) Linked Open Data?

Direction

This has to come from the top. The drive being delivered in an infectious manner to inspire all to follow the cause.

Lack of Awareness

A lack of awareness and understanding as many government agencies and public sector organisations may have limited awareness and understanding of the concept of linked open data. This lack of knowledge about its benefits, technical requirements, and potential use cases can impede its adoption.

Cultural and organisational Barriers

Adopting linked open data often requires a cultural shift within public sector organisations. This shift includes embracing a more open and collaborative approach to data sharing and breaking down silos that traditionally exist between different departments or agencies. Resistance to change, bureaucracy, and a lack of collaboration between entities can hinder the uptake of linked open data.

Linked Open Data – where the focus is on internal use and not openness to the public or wider world – is highly useful and desirable within large organisations (especially those distributed in nature), and also across disparate public sector bodies: to enforce definition, increase data governance and stewardship, reduce overall costs, etc.

There remain the 'It doesn't benefit me [directly]' lack of foresight and problem of 'Not out of my pot' project-focussed budgetary constraints. Linked Open Data relies on the network effect to succeed. However, adoption of the 5-star way should not be viewed as what I'd term 'Open Data Altruism', data provided merely for the benefit of others.

 Open Data Altruism is not just good for others .. more bodies offering their data in a 5-star manner .. means authoritative sources to consume and link to. 

Open Data Altruism is not just good for others (other departments or public bodies, society, private enterprise, or future consumers), the practise of more bodies offering their data in a 5-star manner / being more altruistic with their data, means more datasets and authoritative sources for other projects to consume and link to. This includes you linking to the work of others, lightening your load and pushing governance nearer to [if not directly to] the authoritative source.

Resource Constraints

Public sector organisations invariably face limited budgets and staff shortages. Allocating resources for the development and maintenance of a 5-star linked open data infrastructure, including data publication, curation, and support, can be challenging within these constraints.

Legal & Policy

Public sector entities have to navigate legislation and policy frameworks to ensure compliance and address issues such as data licensing, copyright, intellectual property rights, and data sharing agreements. These legal complexities can create barriers and uncertainties around the release and use of linked open data.

Privacy & Security Concerns

Public sector organisations handle sensitive data, including personal information, that needs to be protected to comply with privacy regulations. Releasing data openly while maintaining privacy can be challenging. Striking the right balance between data openness and privacy/security measures requires careful consideration and robust data anonymisation techniques.

Data Quality and Governance

Ensuring data quality is crucial for effective linked open data. However, many public sector organisations face challenges related to data quality, including inconsistent data formats, incomplete datasets, and data discrepancies across various systems. Establishing proper data governance practices and investing in data cleaning, integration, and validation processes can be resource-intensive and time-consuming.

Technical

The complexity involved implementing and maintaining a 5-star linked open data infrastructure requires technical expertise and resources. It involves using standard formats, protocols, and tools such as RDF, SPARQL, and triple stores. Public sector entities may struggle with the technical complexity, particularly if they have limited IT capabilities or outdated infrastructure.

Whether it be the learning curve of adopting a paradigm shift in the way data is seen or familiarity with an relational database approach given that it is what people have been used to for at least two decades now, reluctance, amongst many, is there. Additionally, data is often tied to the departmental systems that collect that data (not necessarily intentionally), it's just that when you develop a system within the boundaries of a time-based project and to specifc requirements then the underlying data structures will correlate heavily with the requirements of today implemented.

It's easier to stick to the pattern of delivery that you're used to: Web or Windows front-end with a Microsoft or Oracle RDBMS back-end. It's a well-trodden path. Many organisations are now using document databases too to store non-relational data, this is great. This does not though bake the strong-typing of granular data using definitions derivable from publicly-shared universal definitions into the design or storage of that data, at source, at definition.

There is still a general lack of appreciation around the use of URI patterns to access granular data and an awareness of semantic technologies to add meaning and classification to that data. Even where people understand and get it, their inclination is 'Nice but not for now, it doesn't help me as such.'

'Ticking the box' vs Delivering real Raw Data

While achieving a 5-star rating is commendable, it does not guarantee that the dataset truly delivers raw and valuable data. Some organisations may focus on meeting the technical requirements of each star rating without considering the actual quality and usefulness of the information provided. This can result in datasets that fulfill the minimum criteria for each star but lack the depth, accuracy, or relevance that users require.

Delivering real raw data goes beyond the superficial adherence to the 5-Star Linked Data rating system. It requires organisations to invest in data quality, ensuring that the information is accurate, up-to-date, and relevant to the intended users. It also necessitates a comprehensive understanding of the domain or subject matter being addressed, as well as a commitment to ongoing maintenance and updates to ensure the dataset remains valuable over time.

By prioritizing the delivery of real raw data, organisations can provide users with the foundation they need for meaningful analysis, insights, and decision-making. This approach focuses on data usability, aiming to deliver information in its most granular and unaltered form, allowing users to extract maximum value and draw their own conclusions.


An interesting bit of radio

Jim Al-Khalili (Professor of Physics)

University of Surrey

April 14, 2015

27:46

Sir Nigel Shadbolt, Professor of Artificial Intelligence at Southampton University, believes in the power of open data. With Sir Tim Berners-Lee he persuaded two UK Prime Ministers of the importance of letting us all get our hands on information that's been collected about us by the government and other organisations.

Readers may be interested to hear Professor Sir Nigel Shadbolt, Chairman and Co-Founder of the Open Data Institute, interviewed by Jim Al-Khalili on the Radio 4 programme, 'The Life Scientific'. The programme also features audio from the Technical Director of the Open Data Institute ODI's Jeni Tennison, a name many of you may know who have read her blog Jeni's Musings and the Data.Gov.UK website.

Richard Fortune
Written By

Richard Fortune

A seasoned Lead Architect delivering complex distributed cloud solutions with four decades coding experience.