In the series on accelerating innovation in the analytic ecosystem we are focused on understanding the larger problem domains from both an IT and business perspective. In the first two articles we focused on Flexibility
. In this article we will focus on Accessibility.
As stated originally, recognizing the conflict between business and IT is the first step in understanding the broader problem. No one is at fault and both are only focusing on meeting their objectives. It can appear we are at an impasse, but there is a solution.
The goal is a win-win outcome enabled by Business and IT simultaneously achieving their objectives with the least friction possible. We will continue to leverage the analytics capability framework in Figure 1 to focus on the different needs of the business and IT and provide recommendations for meeting those needs.
The framework below depicts 3 foundational capabilities required for success in a modern analytic architecture. These three capabilities are:
- Flexibility – The ability to choose the most appropriate software resources, e.g. tools, languages and libraries, to accelerate the user’s time to insight and minimize operationalization efforts.
- Simplicity – The ability to quickly provision and decommission analytic resources, e.g. compute, storage and network, in a simplified, manageable and cost-effective manner for business user and IT.
- Accessibility – The ability to efficiently find, secure and govern information and analytics within the entire analytic ecosystem without slowing down the business users or jeopardizing production.
Figure 1 – Analytics Capability Framework
Enabling accessibility to data with enough security and governance to maintain control
Data is rapidly expanding across multiple data stores within a modern ecosystem, which makes it harder than ever to find data.
Without the proper use of metadata tools and thoughtful oversight, data becomes difficult to find and hard to understand reducing the enterprise’s ability to create actionable analytics.
So far, we have enabled flexibility in the choice of tools and simplified the provisioning of analytic sandbox resources, but we have another capability which is needed by the user, accessibility. Accessibility includes the ability to search and locate data within the enterprise regardless of its location, format, or origin. The ability to have access rights to the data and the ability to know the level of trust they can put on the data.
Metadata management is a critical capability which simplifies the ability to find, understand and utilize data within the enterprise. The business contributes business glossary and collaborative metadata, while IT contributes technical and operational metadata. For a typical enterprise there are four kinds of metadata that provide four different views of the data:
- Technical metadata: Data stores, databases, tables, columns, directory structures, file names, sizes, data types, volumes, plus items documented in the source system
- Business metadata: Business knowledge that users possess regarding their data, e.g. consisting of business descriptions, relationships, comments, annotations, classifications such as subject areas, fitness-for-use, quality ratings, etc.
- Operational metadata: Processing metadata, e.g. data freshness, usage, security and logging
- Collaborative metadata: Analytic datasets, models, methods, techniques, algorithms, shared between analysts.
Governance is an important capability to accelerating innovation while ensuring appropriate data management processes are in place. Modern metadata tools enable finding and understanding data, but governance needs to be modernized to empower the business to innovate via discovery type analytics. Historically data governance focused on security and access control of production data as well as conserving system resources to ensure that the service level agreements were achieved for operational workloads.
Governance for discovery/exploration sandboxes should be more flexible to promote innovation, yet still protect enterprise assets. Sandbox users don’t know how to size their exact data and resource needs. They need access to existing production data, the ability to easily add new data to their sandbox, and adequate compute and storage resources to perform complex analytics on both.
The prudent aspect of governance relates to numerous privacy regulations which governance must enforce to protect assets from hackers and legal penalties which can negatively impact a company’s reputation. The governance processes must ensure that sensitive data is appropriately protected (e.g. encrypted, masked, anonymized), and that access is only granted to users with appropriate authorization. While IT has a good understanding and control of production environment, discovery environments create an opportunity for exposure.
The following list summarizes key needs from the Business and IT to enable data access while still ensuring proper controls.
- Easily find the location, meaning, quality, sensitivity, and context of data within the enterprise
- Simple process to obtain access to existing data
- Simple permissions to load new data from outside sources into the exploration area.
- Faster path to move new analytics and data into production
- Need to share analytic data and processes between analysts to optimize efficiency of analysts in the enterprise
- Access to current technical and operational metadata
- Comprehensive metadata management capabilities
- Understand how data is used in discovery analytics
- Ensure data is properly secured and access protections are enforced to limit exposure
- Continuously monitor sandbox workloads and data to limit security risks
- Ensure that newly loaded user data from unknown sources is not sharable with other users
- Identify operationalization opportunities within and across user sandboxes
- Access to current business and collaborative metadata
Recommendations for enabling accessibility via metadata, security and governance:
- Implement a modern data catalog which automatically gathers business, operational and technical metadata from key analytic data stores such as the data warehouse, data lake and other analytic datastores
- Partner with the business to ensure comprehensive data and analytics governance practices are in place for the production environment and light weight, flexible governance is in place for the discovery environment
- Establish basic principles for the discovery environment (i.e. sandboxes), such as:
- Limit data sharing in the exploration zone
- Minimize data movement from production to exploration zone via on-demand access to production data
- Leverage IT approved tools where possible
- Document sandbox data transformations,
- Favor trusted data sources over less trusted sources
- Only used approved sanctioned data stores for storing sandbox data
- Create simplified self-service processes for the business to request access to data for use in discovery zones, allowing users to search and review objects in the data catalog based on location, quality, fit for purpose, access privileges, etc.
- Establish data classification rules that describe data trustworthiness to enable user awareness of the state of data within the analytic ecosystem
- User managed sandbox - Bronze
- IT managed, landed/raw, standardized/cleansed - Silver
- IT managed, highly integrated, highly reusable, high quality - Gold
- Proactively monitor sandbox environments for opportunities to move workloads from exploration to production
- Proactively audit discovery zone data for regulatory, privacy and auditing considerations and work with users as appropriate to reconcile potential compliance violations
- Follow the basic principles for discovery environments listed above
- Leverage the data catalog to locate the most trusted-data products to enable the highest quality analytics and streamline operationalization efforts
- Ensure business subject matter experts contribute business knowledge in the form of an enterprise business glossary, tags, associations, user-defined annotations, classifications, ratings, etc. to the data catalog
- Include data quality and confidentiality levels in access request to expedite access approval and implementation workflow
- Utilize discovery environment which leverages data virtualization under the covers to simplify cross platform data access
- Use authorized technologies, and IT development standards wherever possible to minimize IT time and effort to move discovery analytics into production
- Document sandbox data transformation steps to expedite IT operationalization of discovery analytics into production
Business and IT have very different needs but must work together in a cooperative manner to deliver success to the organization.
When IT gives up burdensome controls over discovery environments, it will enable them to focus more on getting analytics and data into production. When the business embraces more flexible, consistent controls, they gain speed to business value.
IT should shift from being focused on control to becoming a provider of services focused on increasing the agility of the business. Enabling low friction access to enterprise data via the user’s choice of tools goes a long way toward providing a win-win outcome for Business and IT.
IT is viewed as an ally and valuable partner of the business when they enable the business users by:
- Agreeing to support a diverse set of analytic tools for business users in both exploration and production.
- Enabling simplified access to cost-effective elastic analytic compute and storage resources
- Enabling virtualization to enable users on an analytic platform to access to data located on multiple data stores on demand
- Implementing an automated data catalog, so users can rapidly locate and understand the data to create better analytics
- Proactively monitoring discovery environments for analytics which are being run repeatedly and now ready for operationalization
The business is also viewed as an ally and valuable partner of IT by:
- Using the approved analytic tools when possible and communicating with IT early on when new tools are needed
- Using IT provisioning processes, which ensures sanctioned resources are manageable and monitorable
- Using elastic computing resources in a responsible manner to minimize costs, e.g. dropping discovery resources when no longer needed
- Adhering to the basic principles for discovery environments
- Partnering with IT to ensure smooth transition of discovery analytics to production
- Preventing the sharing of sandbox data with other users to minimize security compliance concerns and prevent cross sandbox analytic development on a “house of cards”
The analytic capability framework provides an approach which values Business and IT needs, creating a win-win environment. Organizations which adopt such frameworks will accelerate innovation and outpace their competitors in today’s modern era.
Dwayne Johnson is a principal ecosystem architect at Teradata, with over 20 years' experience in designing and implementing enterprise architecture for large analytic ecosystems. He has worked with many Fortune 500 companies in the management of data architecture, master data, metadata, data quality, security and privacy, and data integration. He takes a pragmatic, business-led and architecture-driven approach to solving the business needs of an organization.
View all posts by Dwayne Johnson
Mark is a principal ecosystem architect at Teradata with 25+ years of data warehouse experiencing as system engineer, enterprise architect, and implementer of large data warehouses to Fortune 500 companies. He has performed consulting engagements at many large Fortune 1000 leveraging his analytic skills and technical knowledge to implement innovative data driven solutions focused on delivering value by optimizing efficiency or growing sales. He has spoken at the Teradata user conference on topics ranging from dual active implementations to workload visualization techniques. He holds a patent related to applying state machine concepts to managing high availability of systems.
View all posts by Mark Mitchell