Connect Teradata QueryGrid to Azure HDInsight

Share

Many Teradata customers are interested in integrating Teradata Vantage with Microsoft Azure first party services. This Getting Started Guide will help you connect Teradata QueryGrid to Azure HDInsight service.

Although this approach has been implemented and tested internally, it is offered on an as-is basis. Teradata QueryGrid does support Azure HDInsight and it is highly encouraged to work with Teradata to determine the optimal configuration and architecture. This paper approach does not replace official product documentation.

We encourage your feedback. We want to understand what you found useful and how we can improve this guide. Please send your feedback to rupal.shah@teradata.com.

Disclaimer: This guide includes content from both Microsoft and Teradata product documentation.

Overview

This article describes the minimal steps to connect Teradata QueryGrid to a HDInsight (Hadoop) cluster and query it from Teradata Vantage on Azure. It will highlight deployment requirements for Teradata (DIY) on Azure, Teradata Viewpoint, Teradata QueryGrid and Azure HDInsight. We will use Teradata Viewpoint to configure these services to connect seamlessly between platforms. Finally, we will create a foreign server connection and leverage QueryGrid fabric to execute SQL from Vantage to a HDInsight cluster.

This is a diagram of the workflow.

About Azure HDInsight

Azure HDInsight is a Big Data service from Microsoft that brings 100% Apache Hadoop and other popular Big Data solutions to the cloud. A modern, cloud-based data platform that manages data of any type. Whether your data is structured or unstructured, and of any size, HDInsight makes it possible for you to gain the full value of Big Data.

With HDInsight, you can seamlessly process data of all types through Microsoft’s modern data platform. Our platform provides simplicity, ease of management, and an open Enterprise-ready Big Data solution. HDInsight provides a platform for all of your Big Data needs including Batch, Interactive, No SQL and Streaming. It also comes with a strong eco-system of tools and developer environment.

Supported cluster types include: Hadoop (Hive), HBase, Storm, Spark, Kafka, Interactive Hive (LLAP), and ML Services.

For more information see documentation.

About Teradata Vantage

Vantage is the platform for Pervasive Data Intelligence; delivering real-time intelligent answers to users and systems across all parts of an organization. It leverages 100 percent of a business’s data, regardless of scale, volume, or complexity. Running Teradata Database software on Azure is similar to running a Teradata Database in your on-site data center.
Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.
Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.
Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.
For more information see documentation.

About Teradata QueryGrid^™

Teradata QueryGrid 2.x is a data analytics fabric that provides seamless, high-performing data access, processing, and movement across one or more data sources. Teradata QueryGrid supports the following connectors:

Teradata Database
Hive
Spark SQL
Oracle (only as a target connector)
Starburst Enterprise Presto

For more information see documentation.

Prerequisites

You are expected to be familiar with Teradata Vantage, Teradata QueryGrid and Azure HDInsight
You will need the following accounts, and systems:

Azure Pay-as-you-go subscription

Teradata Vantage (DIY) on Azure instance (version 16.20) or later.

Procedure

Once you have met the prerequisites, follow these steps:

Requirements
Deploy Teradata (DIY) on Azure with Teradata Viewpoint and Teradata QueryGrid Manager
Create subnet for Azure HDInsight cluster
Create Azure HDInsight cluster
Create public IP address and inbound port
Configure Teradata QueryGrid Manager
Configure Teradata Viewpoint
Configure QueryGrid in the Viewpoint QueryGrid Portlet
Configuring and using QueryGrid connectors
Load Data to HDInsight
Query HDInsight data from Vantage
Cleanup

Note: Capture all usernames, passwords, private and public IP addresses for later use.

Step 1: Requirements
Before deploying a Teradata ecosystem, your Azure subscription must have sufficient level of permissions assigned with Contributor role at the subscription level and sufficient quota limits.

Logon to Azure portal, click on Subscription icon for quick check. Otherwise, on the left pane click on Cost Management + Billing > Cost Management > Azure subscriptions

For more information, see the Azure Documentation Center and search for Contributor role azure subscription.

Next, check if your subscription has sufficient quota limits (recommended: 128 cores). On your subscription page, Click Usage + quotas, filter on Select a Provider (e.g., Microsoft Compute) and All locations (e.g., deployment region). If nothing shows up, change Show only items with usage to Show All.

Click Request Increase if quota limit is not sufficient, click Quota type and choose Compute-VM (cores-vCPUs) subscription limit increases and follow instructions to submit request.

For more information see this link for instructions on requesting a core quota increase and on Azure subscription and service limits, quotas, and constraints, read this

Step 2: Deploy Teradata on Azure (DIY)

A solution template allows you to deploy multiple Teradata products simultaneously. At deployment, you can create a new resource group or use an existing, empty resource group that contains the VNet, VMs, storage accounts, network security groups, and so on.

In the portal, click Create a resource sign located in the upper left and search for Teradata. A list of Teradata products appears under the Results pane.
Select the Teradata Vantage (DIY) solution template and click Create
Configure Basic blade settings – Enter new Resource group click Create new, choose Region, enter Password for VM and click Next
Configure Database blade settings – Enter DBC Password, choose Database Tier: Enterprise and Number of nodes: 2 and click Next
Configure Viewpoint blade settings – Choose Yes and enter Password and Viewpoint Image version: Multiple Systems and click Next through to QueryGrid Manager blade
Configure QueryGrid Manager blade settings – Choose Yes and click Next through to General Settings (blade11)
Configure General Settings – Choose defaults and click Next
In Review + create blade, click Create after validation.

Deployment process can take anywhere between 40 to 60 minutes depending on your configuration and the availability of resources in your region.

Note: Unlike Teradata Vantage (DIY) Developer tier, Enterprise tier software is not free and Azure infrastructure for VM, Virtual Network and Storage cost exists.

For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

Step 3: Create Subnet for Azure HDInsight Cluster

Before we deploy HDInsight, we need to create a subnet for HDInsight cluster in the Teradata ecosystem virtual network.

In the portal, open the Teradata resource group or click on Go to resource group
Click on vnet-teradataSubnets and click + Subnet and add subnet name (e.g., HDISubnet) and leave options at their default values and click OK

3. Refresh Subnets to view newly added HDISubnet

Step 4: Create Azure HDInsight cluster

Next, create HDInsight (Hadoop) cluster with 2 headnodes and 2 workernodes.

In the portal, click Create a resource sign and search for HDInsight
Select the Azure HDInsight solution template and click Create
Configure Basics blade settings – Enter new Resource group click Create new, enter Cluster name, choose Region (same as Teradata deployment), select Cluster type and version (e.g., Hadoop 2.7.3/HDI 3.6) and enter Cluster login password and click Next: Storage

Note: Check Teradata® QueryGrid™ Components and Connectors Compatibility Matrix for HDInsight version support details.

4.Configure Storage blade settings – Choose Primary storage account select (New) <HDI Cluster name> and leave other properties default (blank) values and click Next: Security + networking
5. Configure Security + networking blade settings – Choose Virtual network vnet-teradata and the Subnet you created in step 3 (e.g., HDISubnet) and click Next: Configuration + pricing
6. Configure Configuration + pricing blade settings – Change Worker node Number of nodes from 4 to 2 for this article and click Review + create

7. Click Create once validation passes. Expect deployment to take 15-20 minutes.

Step 5: Create public IP address and inbound port

For outside clients to gain access to Teradata Database and VMs you need to create a public IP address to a NIC interface and create an Inbound security rule for port 1025. By default, Teradata on Azure and Teradata products (Teradata Viewpoint and Teradata QueryGrid Manager) does not deploy with a public IP address or inbound port to access the database.

Create Public IP addresses

In the portal, open the Teradata on Azure resource group and click on a NIC interface: database-nic00
Click on IP configurations and click on an ip Name
1. Set Public IP address to Associate
2. Under the Choose public IP address dropdown, click Create new
3. In the Add a public IP address prompt enter Name and change Assignment to Static
4. Click OK and Save

4. Refresh Inbound security rules to view new port entry

Step 6: Configure Teradata QueryGrid Manager

After deploying a QueryGrid Manager VM, you need to logon to the VM to add a user and set a password for the Viewpoint account to access the QueryGrid Manager VM. You will need the QueryGrid Manager VM username and password created during deployment.

Log on to the QueryGrid Manager VM using PuTTY and public IP address created in the previous step.
Switch to the root user environment.

> sudo su -

Add user ‘tdqgm’ to the QueryGrid Manager user group to prevent permission errors.

# sudo usermod -A tdqgm azureuser

Set the default viewpoint password

# /opt/teradata/tdqgm/bin/reset-password.sh
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

Step 7: Configure Teradata Viewpoint

Before we can configure QueryGrid, we first need to add Teradata and QueryGrid Manager systems to the Viewpoint Monitored Systems portlet.

From your browser, open the Viewpoint portal logon screen using public IP address created in step 4 for Viewpoint VM.

Note: Ignore Your connection is not private message and click on Advance and click Proceed to link

Log in to the Viewpoint portal as admin and password created during VM deployment.
Click admin portlet (gear icon) and click on Monitored Systems portlet.

For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.

e. Add QueryGrid Manager in the Monitored Systems portlet:First, install a QueryGrid Root Certificate

Click Select Portlet dropdown
Click the Certificates.
From the Setup list, click Certificate Authority.
Click Install Certificate.
Type an alias (e.g., cert) for the certificate, up to 30 characters.
Select the A trusted SSL-enabled service option
Type the private-IP address for hostname of a Teradata QueryGrid Manager instance (e.g.10.0.0.9)
Type 9443 as the port number.
Click Install

Next, add QueryGrid Manager in the Monitored Systems portlet

Click Select Portlet dropdown
Click Monitored Systems.
Click + next to Systems and select Add QueryGrid.
Under General System Details, enter a system nickname (e.g., QGM), up to 8 characters.
Select the Enable system check box.
Enter the private IP address for Host ID of the Teradata QueryGrid Manager.
Under Login, enter the credentials viewpoint and the reset password performed in step 6 to enable Viewpoint to access QueryGrid Manager.
Click Apply and Close Admin portlet

For more information see Teradata® QueryGrid™ Installation and User Guide.

Step 8: Configure QueryGrid in the Viewpoint QueryGrid Portlet

In this section, we will add the QueryGrid portlet to Viewpoint and configure QueryGrid to access the HDInsight cluster. This will include adding a Data Center, Data Source Systems for Teradata and HDInsight cluster and add appropriate Teradata and HDInsight nodes and QueryGrid software to each Data Source System. Next, define a Fabric and add Connectors for Teradata and HDInsight. Finally, create a Link to define Teradata (as the initiating source) and HDInsight (as the target).

Note: This article will use the QueryGrid software version deployed in the solution template. We will not cover uploading the latest version.

Add QueryGrid portlet to the Viewpoint Portal

In the portal, click Add Content.
Locate and click the QueryGrid portlet.
Click Add.

b. Review Data Center

Under Fabric Components, select Data Centers.
(Optional) Click Edit on default Data Center created during QueryGrid Manager deployment to rename and click Save

c. Adding a Data Source System

Under Fabric Components, select Systems.
Click + next to Systems.
At Add System, enter system name (e.g.TD) for Teradata Data Source System
Select Data center
Select Node software version deployed from solution template
Resource Allocation, set Max memory per node to 1GB
Click Save
Repeat to add HDInsight Data Source System (e.g., HDI) with same properties

d. Adding Nodes to a System

You will need to add the private IP addresses of your Nodes to the respective Data Source Systems created in the previous step. Teradata nodes to the TD Data Source System and HDInsight (Hadoop) nodes (head and worker nodes) to the HDI Data Source System. This will enable the tdqg-node package to auto install across all nodes.

Under Fabric Components, select Systems.
Select the Data Source System you want to add nodes to. (e.g., TD)
On the Nodes tab, click + next to Nodes.

For more information see Teradata® QueryGrid™ Installation and User Guide.

Step 9: Configuring and Using Connectors

For this article and simple demonstration, use any Teradata client tool to create an authorization object, grant privileges and create a foreign server connection to our HDInsight cluster.

Log on as an Administrator, such as dbc, to the initiating Teradata Database system, and create an authorization object for the target server, for example:

CREATE AUTHORIZATION td_server_db.hiveservice AS DEFINER TRUSTED USER 'hive' PASSWORD 'hive' ;
An authorization object is created in the td_server_db database. Using the DEFINER clause make the authorization available globally to all users.

Grant the CREATE SERVER and EXECUTE FUNCTION privileges on the td_server_db database to the Administrator user, for example:

GRANT CREATE SERVER ON td_server_db TO dbc;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB TO dbc;

Create the foreign server connection to HDInsight (Hadoop) cluster

CREATE FOREIGN SERVER target_server_name
EXTERNAL SECURITY DEFINER TRUSTED target_server_auth
USING
LINK('linkname')
VERSION ('version')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;

For example, where Teradata Database is the initiating system and HDInsight cluster is the target/remote system using the QueryGrid Link definition called TD2HDI.

CREATE FOREIGN SERVER remotehdi
EXTERNAL SECURITY DEFINER TRUSTED hiveservice
USING
LINK('TD2HDI')
VERSION('active')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;

For more information see Teradata® QueryGrid™ Installation and User Guide.

Step 10: Load Data to HDInsight

Before we can query our HDInsight (Hadoop) cluster, we need to create a table and some data.

In the portal, open HDInsight resource group and click on publicIpheadnode- for the public IP address for your HDI headnode0 VM.
Log on as sshuser and password using Putty
Change to root user: sudo su –
At the prompt use Beeline (Hive server2 command line interface), type:

beeline -u "jdbc:hive2://<headnode0 private IP address>:10001/default;transportMode=http"

create table t_test(id int);
insert into t_test values(100);
insert into t_test values(101);

Step 11: Query HDInsight Data from Vantage

Finally, use any Teradata client tool to query HDInsight data.

Log on as dbc to the initiating Teradata Database system using the public IP address
Insert and Select data from HDInsight
1. insert into t_test@remotehdi values(1);
2. select * from t_test@remotehdi;

3. Join data from HDInsight and Vantage

create table td_server_db.localtbl (c1 int)
insert into td_server_db.localtbl values (200)
insert into td_server_db.localtbl values (201)
insert into td_server_db.localtbl values (2)
select * from t_test@remotehdi union all select * from td_server_db.localtbl

4. Create a View

create view td_server_db.t_test_v as select * from t_test@remotehdi;
select * from td_server_db.t_test_v;

Users do not even know they are accessing a foreign server

5. Show Foreign Server - Allows you to see a server object definition that contains the name value pairs that the associated table operators use to connect to the foreign serverSHOW FOREIGN SERVER remotehdi

6. Drop Foreign Server - In addition to deleting the server object and its associated information from the dictionary tables, all dependent entries on the associated table operators are deleted.DROP FOREIGN SERVER TD_SERVER_DB.remotehdi;

Step 12: Cleanup

To avoid incurring charges to your Azure subscription for the resources used, follow these steps.
Delete Resource Group

Logon in Azure portal, click on Resource groups in left pane.
Click on Resource group and click on Delete Resource Group in menu
In Delete Resource group pane type in resource group name to verify deletion.
Click Delete
Repeat to delete other Resource Groups

(Author):
Rupal Shah

Rupal Shah is a member of Teradata Partners Technical Consultant team. Prior to consulting on the Microsoft partnership, he was a technical consultant for the IBM Cognos and Oracle Hyperion partnerships. Along with his extensive experience working with business intelligence and ‘in-database’ solutions, Rupal has worked with various Teradata application organizations for whom he provided database consulting. He received his B.A. in Math and Computer Science from the University of California at San Diego, and he is currently based in San Diego. View all posts by Rupal Shah

Connect

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Policy.

Connect Teradata QueryGrid to Azure HDInsight

Overview

About Azure HDInsight

About Teradata Vantage

About Teradata QueryGrid^™

Prerequisites

Procedure

Step 2: Deploy Teradata on Azure (DIY)

Step 3: Create Subnet for Azure HDInsight Cluster

Step 4: Create Azure HDInsight cluster

Step 5: Create public IP address and inbound port

Step 6: Configure Teradata QueryGrid Manager

Step 7: Configure Teradata Viewpoint

Step 8: Configure QueryGrid in the Viewpoint QueryGrid Portlet

Step 9: Configuring and Using Connectors

Step 10: Load Data to HDInsight

Step 11: Query HDInsight Data from Vantage

Step 12: Cleanup

(Author):
Rupal Shah

The Top FinServ Trends & Predictions for 2022

Credit Risk Reloaded For A Modern World

Data Mesh and the City Planner

Turn your complex data and analytics into answers with Teradata Vantage.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Policy.

Connect Teradata QueryGrid to Azure HDInsight

Overview

About Azure HDInsight

About Teradata Vantage

About Teradata QueryGrid™

Prerequisites

Procedure

Step 2: Deploy Teradata on Azure (DIY)

Step 3: Create Subnet for Azure HDInsight Cluster

Step 4: Create Azure HDInsight cluster

Step 5: Create public IP address and inbound port

Step 6: Configure Teradata QueryGrid Manager

Step 7: Configure Teradata Viewpoint

Step 8: Configure QueryGrid in the Viewpoint QueryGrid Portlet

Step 9: Configuring and Using Connectors

Step 10: Load Data to HDInsight

Step 11: Query HDInsight Data from Vantage

Step 12: Cleanup

(Author): Rupal Shah

Related Posts

The Top FinServ Trends & Predictions for 2022

Credit Risk Reloaded For A Modern World

Data Mesh and the City Planner

Turn your complex data and analytics into answers with Teradata Vantage.

About Teradata QueryGrid^™

(Author):
Rupal Shah