Machine Learning Goes Back to the Future

Share

In the first installment of this blog series, we described how the quest for artificial intelligence (AI) gave us the discipline of machine learning – the study of how to enable an intelligent agent to learn from data to improve its performance. But what has any of that got to do with commercial analytics?

Learning and predicting from data pre-dates the study of AI – and dusting off centuries old, tried and tested mathematical techniques like linear regression and Bayesian statistics turned out to be (much) easier than some of the “hard” problems in AI.

Not only that, but as as more and more business processes and systems were computerized during the 70s and 80s – and as commercial databases begin to proliferate as a result – applying these methods to the data in those databases also turned out to have very valuable commercial applications, like forecasting demand for perishable products in grocery retail, for example, or identifying potentially fraudulent transactions in retail finance.

“Knowledge discovery in databases” started as an off-shoot of machine learning, with the first Knowledge Discovery and Data Mining workshop taking place at an AI conference in 1989 and helping to coin the term “data mining” in the process – a term that we will come back to a little later in this blog. And so, AI gave rise to the study of machine learning – which led in turn to data mining.

Supervised and unsupervised methods

Machine learning is often concerned with making so-called “supervised predictions”, i.e. in learning from a training set of historical data in which objects or outcomes are known and are labelled, so that the intelligent agent can differentiate between, say, a cat and a mat. Or so that it can learn to identify the signals in petabytes of sensor data that characterize the imminent failure of a train, a jet engine or a paper mill. The objective, in both cases, is to produce a model that can predict a target variable – whether an object is a cat or a mat, or whether a train will fail or not within the next 36 hours – from input data – images harvested from the Internet, or the readings from the temperature, pressure and vibration sensors on the train.

By contrast, data mining is often also concerned with the discovery of previously unknown patterns or structures in data. Retailers, for example, have long been interested in finding groups of customers who behave in similar ways and in “clustering” shopping missions, to understand consumer behavior and how stores are shopped. These are examples of the applications of “unsupervised methods”; we are still feeding the clustering algorithms historical data, but the data aren’t labelled - because we don’t know exactly which outcomes we are looking for. When one of us undertook our first customer behavioural segmentation project using an unsupervised approach, for example, we were not expecting to find a large group of consumers shopping our stores between 5pm and 9pm and whose baskets almost exclusively contained breath mints, flowers and chocolates – nor another, buying almost exclusively frozen products, apparently for immediate consumption. But there they were!

Four things to remember

We don’t want you to get too hung-up on history or terminology, but we do want you to understand four things.

Firstly, you simply can’t “machine learn everything” – not least because supervised methods pre-suppose that you have a relevant, labelled training data-set to learn from and because the results of an unsupervised analysis may be hard to interpret, or even irrelevant. But also because in many cases there are anyway better routes to goal. By and large the big web properties don’t try to “machine learn” how big or which colour to make the “buy it now button” – they mostly run multiple, concurrent A/B tests instead. It’s quicker and it’s easier. And the output is not a prediction that may – or may not – prove to be accurate; but instead is a measurement of whether treatment A is more effective than treatment B for a particular customer segment right now and that can be easily compared with other similar measurements.

Secondly, that “data mining” was once the cool new term – popularised, in part, by vendors and marketing departments who thought that “knowledge discovery in databases” wasn’t catchy enough – and who wanted to try and distinguish the application of these methods to commercial data captured in databases from dry, dusty and apparently far-off academic concerns about machine leaning and AI. Fast-forward three decades - and now vendor marketing departments are in many cases attempting to differentiate their offers from existing data mining technologies by applying the label “machine learning” to them, apparently without realising that the term pre-dates the term “data mining”. In a very real sense, the marketing hype has literally come full circle.

Thirdly, that data mining started as an off-shoot of machine learning – itself a product of the pursuit of AI – and that the fields remain closely linked and continue to share multiple techniques, algorithms, and researchers. So closely linked, in fact, that the two expressions are often used interchangeably - and in many situations are practically synonymous. When a mobile telecommunications company builds a model to predict which customers are likely to churn based on historical data that describes customers who have already recently cancelled their service, we can – and probably we should - call that “machine learning” (because we are using a computer to build a model from labelled historical data), even if we use a mathematical method, like linear regression, that pre-dates Turing, digital computers and the Dartmouth Conference. In practice, you will find plenty of practitioners describing the same activity as “data mining”, “data science” or just plain old “analytics”. You say tom-ay-to, and I say tom-ah-to.

Lastly, whilst you should absolutely embrace some of the newer machine learning techniques and technologies – as we’ll see later in this series of blogs, the deep learning family of methods in particular has already become the de facto solution for a whole range of high-value business problems - you would be unwise to throw out the more established methods and techniques in the process. Because as we’ll also see later, in many cases we may prefer a simple solution that is sufficiently accurate to a more complex one.

(Author):
Martin Willcox

Martin leads Teradata’s EMEA technology pre-sales function and organisation and is jointly responsible for driving sales and consumption of Teradata solutions and services throughout Europe, the Middle East and Africa. Prior to taking up his current appointment, Martin ran Teradata’s Global Data Foundation practice and led efforts to modernise Teradata’s delivery methodology and associated tool-sets. In this position, Martin also led Teradata’s International Practices organisation and was charged with supporting the delivery of the full suite of consulting engagements delivered by Teradata Consulting – from Data Integration and Management to Data Science, via Business Intelligence, Cognitive Design and Software Development.

Martin was formerly responsible for leading Teradata’s Big Data Centre of Excellence – a team of data scientists, technologists and architecture consultants charged with supporting Field teams in enabling Teradata customers to realise value from their Analytic data assets. In this role Martin was also responsible for articulating to prospective customers, analysts and media organisations outside of the Americas Teradata’s Big Data strategy. During his tenure in this position, Martin was listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data- driven business in 2016. His Strata (UK) 2016 keynote can be found at: www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid; a selection of his Teradata Voice Forbes blogs can be found online here; and more recently, Martin co-authored a series of blogs on Data Science and Machine Learning – see, for example, Discovery, Truth and Utility: Defining ‘Data Science’.

Martin holds a BSc (Hons) in Physics & Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a solo glider pilot, supporter of Sheffield Wednesday Football Club, very amateur photographer – and an even more amateur guitarist.

View all posts by Martin Willcox

Follow Connect

(Author):
Dr. Frank Säuberlich

Dr. Frank Säuberlich leads the Data Science & Data Innovation unit of Teradata Germany. It is part of his repsonsibilities to make the latest market and technology developments available to Teradata customers. Currently, his main focus is on topics such as predictive analytics, machine learning and artificial intelligence.
Following his studies of business mathematics, Frank Säuberlich worked as a research assistant at the Institute for Decision Theory and Corporate Research at the University of Karlsruhe (TH), where he was already dealing with data mining questions.

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International. Frank has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

His professional career included the positions of a senior technical consultant at SAS Germany and of a regional manager customer analytics at Urban Science International.

Frank Säuberlich has been with Teradata since 2012. He began as an expert in advanced analytics and data science in the International Data Science team. Later on, he became Director Data Science (International).

View all posts by Dr. Frank Säuberlich

Follow Connect