Robert David


Integrating data engineering and software engineering to improve the development process.

​We present the Unified Governance use case as part of the ALIGNED research project to bring together data engineering and software engineering. As part of the PoolParty development process, we identified challenges that can be coped by aligning data issues and software development process to support each other. First, we face the problem of managing a lot of software development artifacts as part of our development process which can benefit from a semantically enhanced automatic support system. Therefore we want to express development process data like requirements and issue reports, that is maintained by using Confluence and JIRA, as Linked Data by modeling as RDF based on the ALIGNED ontology. This enables us to do semantic evaluations of the data to support the development process. Examples are the detection of duplicates and the association of requirements and bug reports. Second, we face inconsistencies that occur in the data of PoolParty projects. These can be caused by application errors, but can also be introduced by a user importing data into the project store. Because these inconsistencies can prevent the application from functioning correctly, we want to identify them and take countermeasures to prevent future occurrences. The Unified Governance provides a methodology for supporting the software development process by aligning it with data engineering presented as an integrated processing framework.

A common problem in software engineering is managing the artifacts that are generated during ongoing development. We use the Confluence tool for collaboration in managing the requirements by doing template supported specifications. The JIRA issue tracking tool is used for managing development artifacts based on the scrum methodology, which includes epics, stories and bugs. Furthermore, we use JIRA for support so customers can report bugs as JIRA tickets. Confluence and JIRA allow for an integration based on linking JIRA issues to Confluence pages. This helps in associating requirements with stories and is done as part of the sprint planning. The problem is that there are much more associations that could support in the development process and that would save a lot of time and efforts. First, we would like to identify duplicate issues, which is important for issues that represent bugs. These duplicates happen quite often because the time taken to detect and report a bug is similar for multiple persons which results in a duplicate report. Therefore, we need an easy and fast method for identifying duplicates on demand to be able to prevent them. Second, there are a lot of JIRA issues that are not associated with requirements because no manual linking was done. Again, this is the case for bugs, but also for improvements. It would support the development process if we can associate these issues automatically with requirements that they address or affect. We want to automatically generate a semantic representation of these development artifacts. Based on a common ontology and knowledge model, we will be able to solve these problems better than with lexical approaches only. Furthermore, we can exploit the information as Linked Data for further use. The system also supports the calculation of similar issues based on a semantic model and provides a graph-based search with navigation options to the user for evaluations. We present a prototype for integrated issue management based on semantic technologies. The system is aligned with the PoolParty application to semi-automatically generate tickets and to support the data requirements by assisting in dealing with inconsistencies introduced in the application's project data. By analyzing software defects as part of the life cycle management, we can establish contraint checks to validate data when introduced into PoolParty projects and provide repair strategies to correct the data. Doing so provides the user with the possibility to prevent the import of inconsistent data as part of the PoolParty application.


​Robert David is CTO of the Semantic Web Company. He is the Lead Developer of PoolParty Semantic Suite and head of the development team. Robert holds a degree in software engineering and has extensive professional experience in web technologies.