Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.



I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Policy.

Stages of Grief for Data Scientists and It Alike: Making Open Source Work in Paranoid Corporations, Part II

Stages of Grief for Data Scientists and It Alike: Making Open Source Work in Paranoid Corporations, Part II
In part one of this blog, we covered the first two stages of grief and left stage two having, from the data scientist’s perspective, inched forward in the battle to gain access to open source tools and functionalities on the corporate network.

However, even with success, the ‘free’ open source software requires installation and management by IT, as well as change control. All this requires budget – a difficult sell given the initial expectations of no capital expenditure requirement and instant gratification on the value achieved!

This leads us to our next stage in the grief process: depression.

Stage three: depression

Fighting for the solution that fits all needs at once – such as the latest and most flexible solutions that allow data scientists to get quick and up-to-date insights, but still fit into the requirements of the IT department – is time-intensive and exhausting. Unexpected events, such as additional costs which have not been considered before, on its last mile, are depressing for all who have been involved. Often this leads to stalemate between management and employees, and the process inevitably stops with no real solution.

In despair, short-term fixes, such as ValidR, may seem attractive to get the job done. However, data scientists should work with their IT department and business colleagues to implement a solution that can utilise the explosion of analytical tools and engines and address the many use cases required.

A longer-term solution has come with the emergence of analytical platforms, which are designed to empower the analytic community to build and operationalise analytics to drive business innovation.

For example, Teradata provides an integrated data and analytics environment that delivers analytic functions and engines at scale; allowing users to easily build and use analytics through support for their preferred analytic tools and languages.

But how can we get new software and tools like this into operation? This is where data scientists and IT professionals can work together, to deliver a viable business case.

Costs are easy to articulate, but what about benefits? Well, this all comes down to the use cases the data scientist is trying to address. So, for example, if fraud detection could be improved by 10% using a new suite of machine learning models – that has a value that can be quantified.

The typical approach is to include as many of these use cases as are required to meet the return on investment threshold for your organisation. All the value cases are much more compelling to approvers if you involve someone from finance in the calculation process. Then they are ‘their’ numbers too!

So, there is light at the end of the tunnel and the depression felt earlier should, hopefully, be lifting – which brings us to the final stage…

Stage four: acceptance

It’s important that data scientists and IT don’t rely on point solutions, but work together to establish a longer-term solution. This last stage looks at other solutions that a forward-looking organisation should include in its armoury.

Let’s assume we’ve solved the problem of getting open source tools into the corporate ecosystem. Once one issue is resolved, it will usually be followed up by the data scientist saying something like:

“It would be really great if we could add Python to the stack.”

And then: “We’d like to experiment with AI – can we have Tensorflow?”

It should be noted that any data scientist worth their salt (and pay!) will want a path to production for whatever they end up developing. So how does an organisation create a space for adventurous data scientists to try out new toys, as well as allowing IT to insulate the organisation from problems that may result, whilst also learning something about the demands of new open source capabilities?

Cloud solutions can help to meet the needs of data scientists by using the cloud to create standalone data labs. This allows them to create environments using the very latest version of the software and the supported tool sets, while utilising the data structures and security profiles of our existing customer production systems.

Businesses can then import defined data sets into these data labs for any ad-hoc and experimental work that the data scientists need to do.

A major advantage of the cloud option is the ability to scale up or down the analytic processing capability, on-demand. The type of work performed by data scientists means that the elasticity of demand for resources is usually very high, so the flexibility of the cloud provides the perfect working model for these types of data labs/test/development workloads.

This makes developing the business case for any lab instance much easier – you get a bigger bang for your buck when you need it. It’s a bit like renting a nice car for your holiday.

This is all aligned with the future architecture too – so that when we upgrade the enterprise environment, we can bring these users back in-house without them having to change a single line of code.

Security is often the seen as the number one issue when considering a cloud offering. We feel that the emphasis by the cloud vendors on security means that this should not be an issue if their guidance is followed.
This does require a change in focus for IT teams though; moving away from managing on-premise platforms to policing data transfer and storage protocols, such as encryption or tokenisation – the process of substituting a sensitive data element with a non-sensitive equivalent that has no exploitable value).

Most on-premise platforms allow remote access – and we’re not sure that this confers magical security advantages over the major vendor offerings. This approach also allows managed isolation from production systems without fiddling about in data centres.

Summary

Going through these stages of grief, we’ve tried to suggest a mix of actions, projects and approaches to map a pathway through open-source grief to genuinely new capabilities for innovation and business transformation, for both data scientists and their colleagues in IT.

Will this work for you in your organisation? Hopefully some ideas will – if there was a perfect solution, we wouldn’t have needed to write this series! We’d be really interested to know how readers get on.

If you have found this interesting and want to explore further with Teradata, feel free to reach out to the authors –  Stewart.Robbins@Teradata.com to talk about exciting use cases, or Greg.Loxton@Teradata.com for the boring technical stuff.


Portrait of Greg Loxton

(Author):
Greg Loxton

Greg is an accomplished Solution Architect with over 25 years’ experience of Data and Analytics in a variety of industries. Specialising in Analytical and Big Data Architecture, Greg has extensive experience of building solutions in numerous enterprise environments. He compliments his technical expertise with a business focus, ensuring that customer requirements are understood so that solutions are developed that meet the business’ needs and satisfy their on-going requirements.
  View all posts by Greg Loxton
Portrait of Stewart Robbins

(Author):
Stewart Robbins

Stewart Robbins is responsible for providing expertise within the Financial Services sector and in defining strategic data and analytics capabilities within banking, delivering business-focused support to all dimensions of projects. He brings subject matter expertise in the areas of Analytics Consulting, Data Science, Customer Insight, Predictive Modelling, and CRM in presales, solution development and delivery activities.

Stewart has extensive client-side and consulting experience in the design of analytic capabilities and delivering measurable value through data discovery and analysis. Whilst Stewart has past hands-on experience of data science, he is also an experienced leader of such teams and programmes of activity. He has helped senior business leaders develop their analytics visions with detailed supporting strategies and roadmaps.  

Prior to joining Teradata, Stewart was an analytics consultant at Hewlett Packard (now DXC Technology) and spent 25 years as an analytics practitioner and leading analytics teams at E.ON Energy, Barclays Bank and Nationwide Building Society.

Stewart is a graduate of University College London with a BSc in Economics and Geography and is also an Associate of the Chartered Institute of Banking.

View all posts by Stewart Robbins

Turn your complex data and analytics into answers with Teradata Vantage.

Contact us