CXOs: You're Doing Data Analytics Wrong

I’m Sorry CXOs, but You’re Mostly Doing Analytics All Wrong

Share

We tend to talk about digitisation a bit like our parents and grandparents talked about sex in the sixties. Like we invented it.

Only we didn’t.

The first wave of digitisation arguably started in the late 80s and early 90s with the client-server revolution. During the late 80s and early 90s, organisations shovelled money at Information Technology (IT) like it was going out of fashion. But as Robert Solow pointed out in 1987 when he wryly observed that “you can see the computer age everywhere but in the productivity statistics,” at the level of the whole economy productivity actually decreased over this period.

Put simply, organisations were doing IT all-wrong.

The so-called “productivity paradox” was not resolved until Hammer, Davenport and the Business Process Re-engineering crowd taught the rest of us that effective digitisation was not about automating existing business processes, it was about re-inventing end-to-end processes to make them more streamlined, more efficient and more customer focused.

Fast-forward to the present day and as Erik Brynjolfsson among others has pointed out, the parallels are striking. Once again, organisations are shovelling money at IT – this time at Machine Learning (ML) and Artificial Intelligence (AI) initiatives housed in swanky innovation labs (check the shiny espresso machine count – if it’s higher than four-per-floor, you have probably stumbled into the lab). And once again, productivity growth in the OECD countries is underwhelming – even before the impact of the current COVID-19 pandemic has worked its way through the wider economy.

There is no return on investment in technology until we deploy in production and change the way we do business - and Gartner’s research indicates that at least 65% of the predictive models built in organisations today are never implemented in production. For all of the caffeine-driven work done in all of those innovation labs, most of the insight that is generated never leaves the lab except as PowerPoint. And since you can’t connect a slide deck to an operational business process, mostly we are not changing the way we do business - and so are never likely to generate any ROI. As a former boss of mine once remarked when declining to sign-off one of my less well thought-through project proposals 20 odd years ago, “old business process + expensive new technology = expensive old business process."

Business leaders need instead to start to think about Analytics as 1-2-3.

#1 is for data preparation and management. Data wrangling still accounts for up to 80% of the cost and effort of analytic projects. If Machine Learning really is to become ubiquitous we need to bring very significant pressure to bear on that number. And that means that we need to stop handcrafting one-data-pipeline-per-machine-learning process and to focus instead on enabling re-use by investing in curated “feature stores” of variables with proven predictive value.

#3 is for deployment and operationalisation. There are several different analytic operationalisation design patterns that make more-or-less sense, depending on the use-case. But in very many cases, scoring predictive models in production and at scale requires us to performantly process multiple features and to ship the result (typically a prediction or a next best action) to a connected operational touchpoint. A feature store built on a high-performance, Enterprise grade, parallel Data Warehouse platform gives us scale, performance, high availability and reliability, channel integration – and, of course, rapid access to all of the variables in the Feature Store. And even where this design pattern doesn’t make sense – for example, where performance or network considerations mean that we need to score models at the “edge” - we still need to instrument these models to understand where they have been deployed and the predictions they are making, because predictive models have a shelf-life and need to be maintained, updated and, ultimately, retired.

#2 is for model training. It’s the sexy bit, where we get to geek-out with cool technologies and to play with a dozen different algorithms (Support Vector Machines are soooo last year, dahling). And that’s a problem – because as an industry, we have dramatically over-rotated on the fun part at the expense of #1 and #3. That means that the productivity of our Data Scientists is lousy, our time-to-market is measured in months (if we’re lucky) – and that in 65% of cases, we never even get to production. For the foreseeable future, you will probably need to use multiple different technologies to support your model building activities. But you should insist that those tools pull data from the feature store, that any new features created are added back to the feature store – and that models are published in a format that can be consumed-and-run on your centralised and integrated data platform or an Edge Node. No exceptions. None.

Machine Learning and AI really are going to become ubiquitous – and really will be the basis of competitive advantage in most industries. And that means we’re going to have to scale them; even on fairly conservative maths, within the next few years a typical, national grocery retailer is going to need to score at least 150M predictive models in production every single day just to run a competitive supply chain. Only organisations that think about Machine Learning and Artificial Intelligence in terms of Analytics 1-2-3 - with a heavy emphasis on #1 and #3 – are going to make the cut.

All the rest of us are doing is boosting the profits of the espresso machine makers.

(Author):
Martin Willcox

Martin leads Teradata’s EMEA technology pre-sales function and organisation and is jointly responsible for driving sales and consumption of Teradata solutions and services throughout Europe, the Middle East and Africa. Prior to taking up his current appointment, Martin ran Teradata’s Global Data Foundation practice and led efforts to modernise Teradata’s delivery methodology and associated tool-sets. In this position, Martin also led Teradata’s International Practices organisation and was charged with supporting the delivery of the full suite of consulting engagements delivered by Teradata Consulting – from Data Integration and Management to Data Science, via Business Intelligence, Cognitive Design and Software Development.

Martin was formerly responsible for leading Teradata’s Big Data Centre of Excellence – a team of data scientists, technologists and architecture consultants charged with supporting Field teams in enabling Teradata customers to realise value from their Analytic data assets. In this role Martin was also responsible for articulating to prospective customers, analysts and media organisations outside of the Americas Teradata’s Big Data strategy. During his tenure in this position, Martin was listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data- driven business in 2016. His Strata (UK) 2016 keynote can be found at: www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid; a selection of his Teradata Voice Forbes blogs can be found online here; and more recently, Martin co-authored a series of blogs on Data Science and Machine Learning – see, for example, Discovery, Truth and Utility: Defining ‘Data Science’.

Martin holds a BSc (Hons) in Physics & Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a solo glider pilot, supporter of Sheffield Wednesday Football Club, very amateur photographer – and an even more amateur guitarist.

View all posts by Martin Willcox

Follow Connect