A brief talk with Bernhard Haslhofer

April 28, 2017

Bernhard Haslhofer is a Data Science Chair at Semantics 2017. He's looking forward to the Data Science Track which premieres this year. Bernhard is especially curious about all submissions that go beyond traditional Linked Data topics. In the interview he talks about his projects, his hopes regarding the submissions and how they may help shaping the greater picture.

Can you tell something about your work/research focus?

I am working as a Senior Scientist at the Digital Insight Lab, the Data Science group of the Austrian Institute of Technology (AIT) - Austria's largest research and technology organization. We are a group of computer scientists, statisticians, applied mathematicians, and data engineers and we cooperate with researchers and industry in various fields to solve data-oriented problems in specific domains. As Data Scientists, we often act as a bridge between research fields and provide the practical methods and tools required for the typical data science workflow, which - according to our definition - consists of data aggregation, normalization, analytics, and visualization, as well as data publication and preservation.

At the moment, I spend most of my time working on developing new methods for analyzing the structure and dynamics of virtual currency ecosystems, such as Bitcoin. Virtual currencies are a fantastic object of study because all transactions ever executed in these systems are openly available, which gives us the opportunity to compute a complete picture of an economic ecosystem. We use clustering algorithms to identify economic actors (e.g., marketplaces, exchanges), network analytics techniques to understand the monetary flows within the ecosystem, and anomaly detection techniques to find patterns in data that do not conform to normal and expected behavior. The big challenge lies in the data volume: we are dealing with network data structures consisting of billions of nodes and edges and need to find horizontally scalable solutions for our algorithms. 

Predictive maintenance is my second research topic. The goal is to lower maintenance costs of manufacturing or production plants by predicting machine outages and maximizing the interval between repairs. We do this by monitoring the mechanical conditions of machines via sensor and process data and build statistical models for predicting outages using all sorts of anomaly detection and machine learning techniques.

Which trends and challenges you see for linked data/semantic web?

Linked Data is a great technique for exposing or exchanging structured data in a semantically well-defined and interoperable format. In my opinion, this method makes sense for all sorts of (enterprise) data integration scenarios spanning several system boundaries.

My favorite development within the last decade is the combination of the Linked Data technique with the Open Data philosophy, because this led to the release of numerous openly available datasets and - even more importantly - the development of new applications and the realization of new ideas. So, it is all about finding the combination of a great idea, the data you need to realize that idea, and the selection of appropriate algorithms, tools and techniques to get things done.

For Linked Data I see two major trends and challenges:

First, techniques and models developed by the academic community will be further grounded by practitioners. This already happened in the past with the introduction of formats like JSON-LD or the development of easy-to-use semantic models like schema.org and my guess is that this trend will continue. I would be interested in seeing more research and development into the direction of large scale data processing and horizontal scalability - meaning data-center scale.

Second, I believe that Linked Data, which still has a very strong data-engineering focus, will become part of a greater picture and an enabling technology for all sorts of subsequent data-oriented tasks ranging from automated extraction of (enterprise) knowledge graphs from large document collections, building semantic information retrieval indices, or all sorts of predictive modeling tasks.

What are your expectations about Semantics 2017 in Amsterdam, especially about the data science track?

My hope is that the data science track, which premieres at SEMANTICS 2017, attracts many exciting submissions going beyond traditional Linked Data topics and help shaping the greater picture I mentioned before. Amsterdam is just the perfect place for this attempt because it is home of a very strong Linked Data community and a leading European Data Science Center. So I am very optimistic that this year's SEMANTICS will be a success and enthusiastic to see high quality contributions from different fields. And, of course, I am looking forward to meet and exchange ideas with great people in one of my favorite cities, which is Amsterdam