feature to query JSON files on Microsoft Azure Blob Storage, where you can keep vast amounts of semi-structured or unstructured data. If you already have JSON files on your blob storage account, then you can skip to Connect Teradata Vantage using Native Object Store to Azure Blob Storage section.
There are many ways to create and process JSON data. In this getting started guide, we will use a Raspberry Pi online simulator to send JSON strings to an Azure IoT hub device, then route messages to the Azure Event hub, use an Azure Stream Analytic job to process, and then send it to our Azure Blob Storage container as JSON files. Finally, we will use the NOS feature in Vantage to connect and query the JSON files.
You are expected to be familiar with Azure services and Teradata Vantage with Native Object Store (NOS).
You will need the following accounts, objects, and systems. Links have been included with setup instructions.
1. Logon to the
. Leave the remaining fields set to their default values. Click
We suggest that you use the same location for all services you create. This will avoid confusing errors for this example.
2. Once the deployment is completed, click on
Go to resource to create a
container for your JSON files.
3. Click on
Containers icon and then click on
+Container and provide a new container name. Leave
Public access level as “Private” for this example.
4. Click on
Access keys and copy
Storage account name and
key1, which we will use later in the
Connect Teradata Vantage Native Object Store to Azure Blob Storage – Create Authorization Object section.
Create an Azure Event Hub namespace and hub
We need to create Event Hub namespace and event hub endpoint for Iot Hub device to send messages.
1. Using the portal,
create an event hubs namespace and event hub. Provide a
Namespace name, choose
Standard tier, create a new or use an existing
Resource Group and
Location.
Leave the remaining fields set to their default values. Click
Create. You may have to wait a few minutes for the system to fully provision the resources.
Open the Resource Group to see your Event Hub namespace.
We do not cover the Throughput Units and Enable Auto-Inflate properties in this guide. Please see the Azure documentation for details.
Next, select your
Event Hub namespace and create an event hub by clicking
+Event Hub. Provide a
Event Hub name. Leave remaining fields set to their default values and click
Create.
The Event Hub capture features supports only the Avro format to Azure Blob Storage. This guide uses Stream Analytics to move data to accommodate the JSON format.
After the event hub is created, you should see it in the list of event hubs below.
Create Azure Stream Analytic job
We want to move Event Hub messages to Azure Blob Storage. We have chosen to
process data from our event hub using Stream Analytics
1. Click on the created event hub (for example, eventhub1) and click on
Process data.
2. Click on
Explore to display the
Query dialog. If there are messages in your Event Hub, you can process your messages by clicking
Create.
3. Click
Deploy query to create a Stream Analytic job to move Event Hub messages to Azure Blob Storage.
4. Enter a
Job name in the
New Stream Analytic job pane and leave remaining fields set with their default values.
5. Click
Create.
6. Click on the
Outputs UI and click
+Add to define Blob Storage as an output.
7. Provide the
Output alias,
Storage account and
container information in the
Blob Storage/Data Lake Storage Gen2 pane. The
Path pattern (for example, <directory>/{date}/{time}) is optional.
8. Ensure that
Event serialization format is set to JSON. Leave remaining fields set to their default values and click
Save.
You can find more information on Path pattern property at
Azure Stream Analytic custom blob output partitioning.
8. Exit the
Output pane by clicking in the upper right corner.
9. Edit and save query with new Output alias (for example, output1).
10. Exit the
Query pane by clicking in the upper right corner.
11. Click
Start job with the
Now option.
Create Azure IoT Hub to route messages
We need to create an IoT Hub instance, register an IoT hub device, and setup a route/custom endpoint to get messages from our Raspberry Pi online simulator to Azure IoT hub and finally to Event Hub.
1. In the portal,
create an IoT Hub, register a new device and define message routing.
2. Click on
Create a resource at the top left side and search for IoT Hub. Click
Create.
3. In the
Basic tab, create a new or use an existing
Resource Group, provide an
Iot Hub Name, and in the choose
B1: Basic tier in the Size and Scale tab. Leave remaining fields set to default values and click
Review+create to validate. Click
Create.
For more information see
choosing the right IoT Hub tier. Configuring the Number of IoT Hub units property and a more general conversation on performance considerations is not covered here.
The online simulator requires a device identify in the registry to connect to a hub.
4. Click
Go to resource or in your IoT hub pane, open
IoT Devices, then select
+New to add a device in your IoT hub. Provide a
Device ID name and click
Save.
5. Click
Refresh in the IoT devices dialog to display your device and click on
Device. Copy the
Primary Connection String which we will use later to connect our Raspberry Pi online simulator to our IoT hub device (for example, rasppi).
We need to configure a route and endpoint for IoT Hub device to send messages to Event Hub.
6. In your IoT Hub pane, click
Message routing under
Messaging to
define a route and custom endpoint.
Routes is the first tab on the Message Routing pane.
7. Click +
Add to add a new route. You see the following screen. Enter a
Name for your route and choose an
endpoint. For the endpoint, you can select one from the dropdown list, or add a new one. In this example, click on
+Add endpoint and choose
Event Hub.
8. In the
Add an event hub endpoint pane, provide an
Endpoint name, an existing
Event hub namespace, and an
Event hub instance. Click
Create.
9. Click
Save to save routing rule. You should see the new routing rule displayed.
10. In the
Custom endpoints tab, click
Refresh to see your custom endpoint rule displayed under Event Hubs. Ensure that the
Status is Healthy.
Configure Raspberry Pi online simulator for new device
We will use the
Raspberry Pi online simulator as our source for JSON strings. Data will be sent to the IoT hub registed device, which we created in previous section.
This is an example of a JSON string that the simulator will generate. (Line wrapping was added for clarity).
{"messageId":2,
"deviceId":"Raspberry Pi Web Client",
"temperature":29.288325949023434,
"humidity":77.5147906}
We need to add the IoT hub device
Primary Connection String for the online simulator to connect.
1. Click on
START RASPBERRY PI SIMULATOR.
2. Edit line 15 and replace
'[Your IoT hub device connection string]'; with your
Primary Connection String.
3.
Run the online simulator for a few minutes. You should see JSON files in your blob storage container.
4. Click
Stop to stop the simulator.
If desired, you can view your blob storage data.
A. Logon to
Azure portal and click
Resource groups.
B. Find and click your resource group with storage account.
C. Click
container name.
D. Click on the JSON file and
Edit to view data.
Alternatively, you can use
Azure Storage Explorer, although configuring the Azure Storage Explorer is not discussed in this guide.
Connect Teradata Vantage Native Object Store to Azure Blob Storage
Native Object Store is a new capability included with Teradata Vantage 17.0 that makes it easy to explore data sets located on external objects stores, like Azure Blob Storage, using standard SQL.
Once configured, an object on Azure Blob will look just like a table in Vantage.
Detailed information can be found in the
Native Object Store – Teradata Vantage Advance SQL Engine 17.0 (Orange Book).
Vantage needs an authorization object to gain access to a Azure Blob Storage account. Authorization objects provide more security than just relying on Azure Blob Storage authorization.
The CREATE AUTHORIZATION DDL statement requires the
storage account name to be specified in the USER field and the Azure
access key we saved earlier in the PASSWORD field.
You must create the authorization object in the same database as the foreign table that will reference it.
1. Create an authorization object using syntax similar to the following.
CREATE AUTHORIZATION DefAuth_AZ
AS DEFINER TRUSTED
USER 'mystorageacctrs' /* storage account name */
PASSWORD '********'; /* storage account key */
If you delete and recreate your storage account with the same name, the access keys will change. Therefore, you
must drop and recreate your authorization objects.
A foreign table allows external data to be easily referenced within the Vantage Advanced SQL Engine and makes the data available in a structured, relational format. Below is a example of the syntax to create a simple foreign table.
The storage LOCATION information includes three parts, separated by slashes: the
AZ prefix, the
storage account name including the “blob.core.windows.net” suffix, and the
container name.
If external security authentication is used, then the
EXTERNAL SECURITY DEFINER TRUSTED DefAuth_AZ syntax in the DDL statement.
2. Create a foreign table using syntax similar to the following.
CREATE MULTISET FOREIGN TABLE json ,FALLBACK ,EXTERNAL SECURITY DEFINER TRUSTED DefAuth_AZ,
MAP = TD_MAP1
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload JSON(8388096) INLINE LENGTH 32000 CHARACTER SET UNICODE)
USING
(
LOCATION ('/AZ/mystorageacctrs.blob.core.windows.net/json')
MANIFEST ('FALSE')
PATHPATTERN ('$Var1/$Var2/$Var3/$Var4/$Var5/$Var6/$Var7/$Var8/$Var9/$Var10/
$Var11/$Var12/$Var13/$Var14/$Var15/$Var16/$Var17/$Var18/$Var19/$Var20')
ROWFORMAT ('{"record_delimiter":"\n", "character_set":"UTF8"}')
STOREDAS ('TEXTFILE')
)
NO PRIMARY INDEX ;
Now that the foreign table is created, you can access the object on Azure Blob Storage.
JSON is essentially a list of keys and values. You can obtain a list of the keys (attributes) with the JSON_KEYS table operator with the following statement.
SELECT DISTINCT * FROM JSON_KEYS (ON (SELECT payload FROM json)) AS j
You can obtain a list of the actual values by selecting a small number of rows without any filtering.
SELECT Payload.* FROM json
You can obtain the name-value pairs that includes the fields and values by omitting the asterisk from the
Payload keyword.
SELECT Payload FROM json
It is best practice to create a view on top of a table to provide a layer between users and tools and the underlying table.
Below is an example of a view on the JSON foreign table.
CREATE VIEW json_perm_cols_v AS (
SELECT
CAST(payload.messageId AS INT) MessageID,
CAST(payload.deviceId AS VARCHAR(25)) DeviceID,
CAST(payload.temperature AS FLOAT) Temperature,
CAST(payload.humidity AS FLOAT) Humidity
FROM json
);
At this point, the view may be used by users and tools.
Native Object Store does not automatically make a persistent copy of the external object that it reads through the foreign table.
You can copy the data in the external object into a table in Vantage. Below are a few simple COPY options.
The CREATE TABLE AS…WITH DATA statement creates a No Primary Index (NoPI) relational table as the target table by default. If you do not take further action, this NoPI table will have just the two columns: Location and Payload.
CREATE TABLE json_perm AS (select * from json) WITH DATA
It is better to create columns for each payload. The example below does this.
CREATE TABLE json_perm_cols AS
(SELECT
CAST(payload.messageId AS INT) MessageID,
CAST(payload.deviceId AS VARCHAR(25)) DeviceID,
CAST(payload.temperature AS FLOAT) Temperature,
CAST(payload.humidity AS FLOAT) Humidity
FROM json
)
WITH DATA
NO PRIMARY INDEX
Another option is using an INSERT…SELECT statement. This approach does require that the permanent table be created beforehand. The example below first creates the table and then performs the insert.
CREATE SET TABLE json_perm_empty ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO,
MAP = TD_MAP1
(
MessageId INTEGER,
DeviceId VARCHAR(25) CHARACTER SET LATIN NOT CASESPECIFIC,
Temperature FLOAT,
Humidity FLOAT)
PRIMARY INDEX ( MessageId );
INSERT into.json_perm_empty
SELECT
payload.messageId,
payload.deviceId,
payload.temperature,
payload.humidity
FROM json