JobsTheWord’s ability to analyse hundreds of millions of tiny snippets of information in a multitude of different formats extracted from the internet each day, mean their algorithms and training data have been built into something akin to a PhD in recruitment market knowledge.
The term ‘big data’ is currently being used by many companies in the recruitment industry simply because it is a buzzword, however, it is extremely difficult to apply big data to our market, and even harder to get something useful from it.
JobsTheWord discovered a problem which we were able to resolve by marrying our advanced analytic skills with our in-depth knowledge of the recruitment industry. We then evolved our idea to create business value for our clients from our data.
After just two years, we have revolutionised the way employers source fresh talent and are now the UK market leaders in the data-driven approach to recruitment. This means we can offer a tailored service to our clients, which has simply “blown them away”!
However, managing big data successfully on a technology level is one thing; managing big data so that it supports business goals successfully is another.
So What’s our Secret?
Firstly, we invested heavily in enabling our Data Science team (who live above the Big Data team) to build enough domain knowledge to make better decisions when formulating our “Recruitment Market PhD Platform”.
Algorithms are generally stated as the ‘big secret’ but there are not that many base algorithms; it is the exploration of finding the right mixture of them, and the application of the right type of intelligence, which we found to be key.
We use ‘Real Data’
As some forms of big data stream in real-time, it merits analytic processing in real-time. Streams pour out from Web servers, robots and other machinery, sensors, social media, supply chains, RSS feeds, events, transactions, and customer interactions. Some of this fresh information gives JobsTheWord a competitive edge.
Real-time functionality at this extreme level is beyond the average tools and platforms and so we complement our systems with a tool for Complex Event Processing (CEP). This can be programmed to spot opportunities and problems in streaming data in real-time.
CEP excels with multi-data-source correlations, even when the sources are an eclectic mix of traditional enterprise sources (enterprise applications, relational databases), and new ones (streaming data, machine data, social media data, and NoSQL data platforms such as Hadoop).
We use Profiling
JobsTheWord profiling involves the use of algorithms and training data that allow the discovery of patterns, or correlations, in large quantities of data. We profile on many levels, not only people, jobs and companies, but also skills, geospatial and numerous others.
The real challenge is being able to use data to create an image of the ideal employee for a given role and maximise the chance of the employer (our client) developing them into a valuable asset.
Our secret is knowing what combination of keywords build a picture, for example, of a Sous Chef, a Data Analyst, or software sales company. With the multitudinous array of data now available – not just via social media, but right across the web – we understand what language and keywords relate to subjects.
We Use Co-occurrence Grouping
We also utilise co-occurrence rules to create categories. These enable the discovery and grouping of concepts that are strongly related within the set of records. When concepts are found together in records, co-occurrence reflects an underlying relationship that is of value in specific category definitions. This process creates co-occurrence rules that can be used to create a new category, extend a category, or as input to another category. There are, of course, various other methodologies in the mix some of which are defined below.
Our big data expert,Will Crandle, is always happy to advise companies on how they can beat their competitors.
UNDERSTANDING THE TECHNICAL TERMINOLOGY
The Definition of Big Data
According to the Leadership Council for Information Advantage (2012), big data is not a precise term. It describes “data sets that are growing exponentially and that are too large, too raw, or too unstructured for analysis using relational database techniques.” However, the following are the accepted versions of what big data is.
The base definition of big data: Big data is first and foremost about data volume, namely large data sets measured in tens of terabytes, or sometimes in hundreds of terabytes or petabytes. Before the term big data became common parlance, we talked about very large databases (VLDBs). VLDBs usually contain exclusively structured data, managed in a database management system (DBMS). In many organisations, big data and its management follow the VLDB paradigm.
The extended definition of big data: In addition to very large data sets, big data can also be an eclectic mix of structured data (relational data), unstructured data (human language text), semi-structured data (RFID, XML), and streaming data (from machines, sensors, Web applications, and social media).
Big Data and Analytics Technologies
Traditional relational databases and structured queries are no longer sufficient to manage and exploit the quantities, varieties and velocities of emergent data sources, so companies need to incorporate new technologies into their IT systems in order to progress. This has led to the development of new distributed computing paradigms known collectively as big data, and analytics technologies such as NoSQL and others that handle unstructured data in its native state. Hadoop (HDFS), for example, manages and processes the extremes of big data for data integration, data warehousing, and analytics. HDFS clusters are known to scale out to hundreds of nodes that scale up to handle hundreds of terabytes of scale-based data.
Unsupervised Learning, Supervised Learning and Reinforcement Learning
Unsupervised Learning: no labelled data is available
Supervised Learning: labelled data available
Reinforcement learning: Learning and relearning based on the actions and the effects or rewards from those actions.
A causal model is an abstract model that describes the causal mechanisms of a system. The model must express more than correlation (something linked in some way, not independent of each other) because correlation does not imply causation.
The extrapolation of information about something based on known qualities. This process involves the use of algorithms or other mathematical techniques that allow the discovery of patterns or correlations in large quantities of data, aggregated in databases. When these patterns or correlations are used to identify or represent people, they can be called profiles.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in one sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields.
A data-driven technique to find visual similarity, which does not depend on any particular image domain or feature representation.
A key module for any intelligent system is the capability of identifying and classifying correctly data items in a pre-defined set of classes.