The choice of the right vocabularies, and in some cases, the creation of a domain specific vocabulary, is an important part of the publishing process of Linked Data. At the Dutch Land Registry, we’ve developed a structured approach combining requirements engineering, data structure visualization and the proposed recommendation of SHACL (Shape Constraint Language) as part of the publishing process for Linked Data. We’ve successfully tested this approach in a large scale governmental project for the Environmental Planning Act that uses Linked Data as the primary method for describing meta-data. Part of the approach is a toolset to visualize the SHACL shape structures and chosen vocabularies, as webpages but also as a visual model, not unlike a traditional UML model. The approach made it easier to distinguish the concern of finding out the right requirements and the concern of choosing the right vocabulary. From a business perspective, the benefits are a faster adoption of semantic technology by business and IT people and the reuse of existing non-Linked Data models and knowledge, resulting in a faster publishing process with a higher quality.
The Dutch Environmental Planning Act regulates activities that have an impact on the environment, e.g.: building activities, industrial activities, waste management. Data from a large variety of sources is necessary to find out under which circumstances such activities are allowed. A good understanding of the semantics of the data from all these sources is paramount.
Linked Data has been chosen as an excellent technology to link the different descriptions of all these data sources. A challenge was that most business- and IT people that maintain the original datasets are not familiar with Linked Data, but are using more "traditional" technology like UML, ERD and XSD.
We address this challenge by translating the original models to a combination of SHACL shapes, SKOS concepts and OWL classes. From this starting-point, we use requirement engineering and semantic engineering principles to design the optimal model from a Linked Data perspective. The resulting model is visualized to facilitate the dialog between business and IT owners and our team. The visualization resembles UML, but is geared to the specifics of a Linked Data model.
After obtaining a clear understanding of the semantics of the original data, we choose the vocabularies to express this data as Linked Data. We try to reuse existing vocabularies as much as possible, as long as the semantics of the terms in these vocabularies match the semantics of the original data.
We are thankful for all the people that are involved in the creation of the SHACL recommendation. Although the recommendation states that it is a language for validation of RDF graphs, we found out that it can also be used as a way of bridging the gap between the semantic web community and the traditional IT community. We’ve learned that this innovative approach in combination with a visual representation of the model is an effective way of developing Linked Data models. We’ve found out that such a visual representation should not be exactly UML - as it would misguide the users that they are looking at a "traditional" data model. It should also not be something completely different - as it would be too hard for these users to understand such visualizations.
Marco Brattinga MSc is Ordina’s principal expert for intelligent data-driven organizations. Marco studied Information Technology at the University of Twente. He has more than 15 years of experience as an IT architect and consultant in the public sector. Since 2011 he has been working on semantic web technology for the exchange of information between organizations and data management issues, in particular the semantic description of datasets. Marco is actively involved in various communities that stimulate semantic web technology in the public sector.