The police in the UK are in a process of transformation both in their services and technology. Part of this transformation includes prioritising their role in public protection and safeguarding. There is intense pressure from the government and the public to ensure that the police do all they can to intervene before early indications of vulnerability escalate into serious harm and/or low level criminal activity escalates into prolific or violent offending.
Data and the IT systems that hold it a have a huge role to play in enabling this shift to happen efficiently. But these tools are not evolving fast enough to cope with the pace of change. The processes for gathering information and sharing it are still mostly laborious, time consuming and manual. And these tools do not provide any additional context in the data.
Using semantic technologies to understand the underlying relationships between people, locations and events allows officers to see a complete 360 degree view of a vulnerable individual. And having this view allows the right kind of intervention and support to be assigned.
In our presentation we will demonstrate how adopting a multi model approach helped us solve this problem for one police force. We will show how we were able to use RDF triples alongside XML data imports to harmonise data from 11 different databases, and provide a single interface allowing police analysts to search, explore, and analyze the data.
A key problem that semantic technology helped us to tackle was the difficulty of de-duplicating objects across these databases. Due to the nature of the systems, it was impossible to reply on the presence of unique identifiers for people, places, objects, and events. This problem was further compounded by the presence of dirty data, inconsistent spellings of names and places between different databases, and the fact that people often move, change names, and may use aliases which are not properly checked. We used RDF to provide a flexible model to describe the relationships between people, used algorithms such as double metaphone to help to normalize spellings of names, and reference data from the Ordnance Survey to accurately identify and plot addresses and place data.
In police work the analysts have to be able to trace the provenance of all data that is used in an investigation, and know that the information being presented to officers is as correct and complete as possible. There is also a need to provide guarantees over information and data security. By using the envelope pattern to store harmonized data, and semantic information together with the source document from which it was derived, we have a lightweight and durable method of guaranteeing the provenance of all information. Furthermore the role based access controls present in the database make it impossible for data to leak out to unauthorized users.
We used MarkLogic as a multi model database to store: the unmodified input data from the 11 source databases; the harmonized representations of people, objects, locations, and events; the RDF relationships between everything. By taking advantage of its ability to query across all indexes at once, we were able to quickly build out the prototype application and demonstrate real value to the police analysts we were working with in only 12 weeks.
Jennifer Shorten is Technical Delivery Architect for Consulting Services, EMEA. She joined the company in 2014 after six years working on MarkLogic implementations with some of the world’s leading media and publishing companies. In her seventeen years in the industry, Jen has helped customers meet their strategic goals by overcoming data and content challenges. Jen has a particular interest in semantics and has had the opportunity to lead the UK consulting team on some of the most advanced semantics implementations for MarkLogic customers globally.
Originally from New York City, Jennifer holds a Bachelor’s Degree in Biopsychology from Hampshire College and an MSc in Technology Management from NYU Polytechnic School of Engineering.