Jane Frazier

Industry

NERDing out: Job Title Normalisation in an Online Employment Marketplace

​One of the key features of a successful online employment marketplace is the ability to match people with the most relevant job opportunities. Our business uses data about candidates, jobs and hirers to perform this task. One valuable datapoint in this process is the job titles, which we discover in semi-structured forms in a candidate’s employment history and in a hirer’s job advertisement. For retrieval, recommendation and analysis purposes, the ability to successfully normalise the various forms in which users provide a given job title on-site to an authoritative form and understand the relationships between the job title and others is essential. Our team have developed innovations to our job title normalisation process by constructing NERD (Named Entity Recognition and Discovery), a suite of web services that leverage data represented using Linked Data standards housed in ontology management software. This role title data is then continuously improved using automated processes that gather insights from our marketplace. This presentation will outline the details of these innovations, the challenges faced along the way, and the key ways we measure success.

This presentation will cover problems and opportunities we faced, solutions implemented, and lessons learned.
Previously, our job title normalisation processes had a number of  limitations, including:

  • Our controlled vocabulary of job titles was term-based rather than concept-based, resulting in the lack of a persistent way to identify job titles and limited means by which to relate them
  • Curation of the vocabulary was managed in spreadsheets that required constant manual migration to achieve any beneficial changes for consumers
  • Because the vocabulary was managed in this inefficient manner, we had no single point of truth for job title data. This increased the risk of low data quality and allowed for multiple inconsistent versions of the vocabulary to exist at once
  • Although it has historically been simple for us to measure the quantity of normalisation (i.e. how many job ads include a normalised job title) it was difficult for us to measure the quality of normalisation (i.e. how valuable the normalised job title was for matching purposes or how similar the normalised title was to the original intent of the candidate or hirer)

 Our new normalisation processes include the following developments:

  • The construction of a ‘gold set’ of optimally normalised job titles for measuring normalisation quality
  • Representation using Linked Data standards (SKOS with custom extensions) in an ontology management tool, providing a single location for curation and consumption
  • The construction of a new normalisation API which dynamically consumes data directly from our ontology and ultimately enriches our customer experience
  • A semi-automated approach to discovering synonymous terms and building relationships
  • The development of a continuous improvement cycle that leverages marketplace insights and our normalisation API

 As we developed our new processes, we faced various opportunities to learn and adapt our solutions. The understanding and satisfaction of the needs of consumers of our data and services within the business is a constant priority. At the beginning of our journey these needs were quite straightforward, but as the success of our work became known across the business, we began to encounter more normalisation-related requirements, some of which were in direct conflict with others. As I will discuss, it became clear to us that the success of job title normalisation in our marketplace is only possible through a combination of improvements to our data, our web services, and the business processes that bring them together.

 

CV

​Jane is an information scientist working to improve user access to information on the Web. As Ontology Operations Lead at SEEK, the world’s largest online employment marketplace, Jane and the Ontology Services team manage and make available SEEK’s ontology and related web services for search, enrichment, matching, and analytics. She has previously worked as Data Librarian at the Australian National Data Service (ANDS), working to develop tools for the creation and discovery of controlled vocabularies relevant to the Australian research community, at the Dryad Digital Repository curating data underlying publications in the biosciences, and at the University of North Carolina at Chapel Hill Metadata Research Center exploring automatic subject indexing processes for Dryad. From 2013 to 2014 Jane led the research and development of a cataloguing system for collectible items with Stanley Gibbons, one of the world’s oldest stamp collecting firms. Jane holds the Master of Science in Information Science from the University of North Carolina at Chapel Hill, the Master of Music from the University of Michigan and the Bachelor of Music from the University of North Carolina at Greensboro.
 

Interested in this talk?

Register for SEMANTiCS conference
Register