The Cloud used to mean “renting someone else’s server” – but increasingly it is coming to represent a set of architecture and design patterns that define how we design and deliver 21st
century Enterprise computing solutions. For data and analytic platform solutions, that means re-thinking solutions in terms of Services, APIs and Data Products.
What changes in the Cloud? Everything! Compute, storage and network services are configured as code and available on-demand. Services can be started, stopped and scaled-up-and-out as required. Operations are automated by default. PAYG pricing models limit risk and enable rapid experimentation, lowering barriers to entry and improving time-to-value. Perhaps most importantly, a rich ecosystem of composable services with well-defined interfaces (“APIs”) enable “snap together” system development, by enabling Services to be called and connected to create re-useable processes that encapsulate important business processes. At a time of huge economic uncertainty, increasingly what matter are time-to-value and agility; Cloud architectures combined with “DataOps” methods enable agile, incremental development of the data and analytic products without which any digital transformation project is doomed to irrelevance at best, outright failure at worst.
Conceptually, you could argue that much of this isn’t new. The Service Oriented Architecture movement from the noughties, the Object-Oriented programming revolution that preceded it - and even the structured programming movement that dates back to the 50s and 60s – they all shared many of the same objectives. But as a British Prime Minister once famously observed “what matters is what works." Cloud design patterns have been shown not just to work well for very many use-cases, but also to scale rapidly and economically. And that’s why they matter.
Durable and flexible Cloud Object storage is already providing organisations with “any data, any format” flexibility. And it’s doing so economically, enabling organisations to retain “cold” data indefinitely – or at least, for as long as the business requires and the regulator allows. But reliable and durable object storage will also enable radical architectural simplification at petabyte scale, by, for example, dramatically simplifying high availability and backup and recovery solutions, and operations, and supporting the deployment of “Enterprise Data Operating Systems."
Of course, by itself, storing data adds cost, not value - and an Operating System is only one component of a successful computing platform – so Cloud Analytic Architectures include pluggable processing engines that can read and write directly to the Enterprise Data OS so that data can be processed and value created. In many cases, those processing engines maintain local copies of integrated and modelled data in formats optimised for scalability and performance, rather than for durability and economy. But because the pluggable processing engines also read and write directly to the object storage layer, they enable data to be loaded from the object storage layer, archived to it – and queried and combined dynamically and at run-time, for example, to support exploration and discovery workloads. With data increasingly being streamed into these data backbones, that means that any application or platform that can plug in to the data operating system is connected to data from across the Enterprise - and in close to real time. That gives organisations an opportunity to modernise decades old data acquisition processes so that analytics can move from being predominantly batch-oriented, to event-driven – which is a critical concern, as more and more business and business processes are digitised and move online.
So far, so good. But there’s a “but."
If you have a background in large-scale, Enterprise data management, look at the reference architectures and design patterns touted by the big Cloud Service Providers (CSPs) and ask yourself “what’s wrong with this picture?” There are Services galore, APIs aplenty – but data products are frequently depicted as little more than an S3-compliant bucket. And that’s leading a generation of architects and designers to make poor decisions that lead to the rapid accumulation of technical debt that ultimately slow the business down, rather than speeding it up. Data that are siloed and hard-wired to just the analytic business process that each pipeline supports necessarily result in the creation of more stovepipes and more silos – at which point Cloud design patterns can become less “virtuous circle” and more “vicious cycle."
Data and analytics only have real value when they are used by organisations to improve performance, by reducing costs, increasing customer satisfaction or driving new growth. In a time of huge economic uncertainty, what matters are time to value and agility. The best way I know to go faster is to eliminate unnecessary work – and to automate as much as possible of the rest. Re-using data products is the ultimate “eliminate unnecessary work” play – and it is how successful organisations
are able to move rapidly from experimentation and testing to the deployment of predictive analytics in production and at scale.
By themselves, Enterprise Data Operating Systems are necessary rather than sufficient: raw data must be refined and processed into data products before it can be widely exploited to drive improved business results; and integrated and connected data are still required to enable the optimization of end-to-end business processes. Agility and time to value require us to architect and design data products for re-use – and re-use requires us to think critically about which data products are required to support the analytic use cases that will enable the business, how they will be used and the characteristics that are required.
Many Hadoop-based Data Lakes failed because organisations de-emphasised data management. As data and analytics migrate to the Cloud, organisations that continue to take a laissez-faire approach to data management are likely to fail for a second time with Cloud Object Store based Data Lakes.
What all of that means is that getting your Cloud data architecture right starts with understanding which data products you need to support your analytic processes, the roles that they perform - and the functional and non-functional characteristics that those roles demand. And it means understanding which data will be combined and re-used so frequently that we should incur the cost of centralising and integrating them – and which data need merely to be connected, to enables frictionless ETL processing and “whatever, whenever, wherever” query freedom, so that more users get access to more data, more quickly, wherever that data resides in the ecosystem.
We call that architecture pattern “the Connected Cloud Data Warehouse
” – and more on that next time.
Martin leads Teradata’s EMEA technology pre-sales function and organisation and is jointly responsible for driving sales and consumption of Teradata solutions and services throughout Europe, the Middle East and Africa. Prior to taking up his current appointment, Martin ran Teradata’s Global Data Foundation practice and led efforts to modernise Teradata’s delivery methodology and associated tool-sets. In this position, Martin also led Teradata’s International Practices organisation and was charged with supporting the delivery of the full suite of consulting engagements delivered by Teradata Consulting – from Data Integration and Management to Data Science, via Business Intelligence, Cognitive Design and Software Development.
View all posts by Martin Willcox
Martin was formerly responsible for leading Teradata’s Big Data Centre of Excellence – a team of data scientists, technologists and architecture consultants charged with supporting Field teams in enabling Teradata customers to realise value from their Analytic data assets. In this role Martin was also responsible for articulating to prospective customers, analysts and media organisations outside of the Americas Teradata’s Big Data strategy. During his tenure in this position, Martin was listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data- driven business in 2016. His Strata (UK) 2016 keynote can be found at: www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid; a selection of his Teradata Voice Forbes blogs can be found online here; and more recently, Martin co-authored a series of blogs on Data Science and Machine Learning – see, for example, Discovery, Truth and Utility: Defining ‘Data Science’.
Martin holds a BSc (Hons) in Physics & Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a solo glider pilot, supporter of Sheffield Wednesday Football Club, very amateur photographer – and an even more amateur guitarist.