Many Teradata customers are interested in integrating Teradata Vantage with Amazon Web Services (AWS) First Party Services. This guide will help you to connect Teradata Vantage to Salesforce using Amazon Appflow.
The procedure offered in this guide has been implemented and tested by Teradata. However, it is offered on an as-is basis. Amazon does not provide validation of Teradata Vantage using Amazon Appflow.
This guide includes content from both Amazon and Teradata product documentation.
Overview
This article describes how to retrieve customer information from Salesforce using Amazon Appflow and combine it with order and shipping information from Vantage to derive analytical insights.
Amazon Appflow transfers the customer data from Salesforce via Amazon S3. Teradata Vantage then uses Native Object Store (NOS) functionality to join the data on S3 with data in Vantage within a single query.
About Amazon Appflow
AWS AppFlow is a fully managed integration service that enables users to securely transfer data between Software-as-a-Service (SaaS) applications, like Salesforce, Marketo, Slack, and ServiceNow, and AWS services, like Amazon S3 and Amazon Redshift.
AppFlow automatically encrypts data in motion and allows users to restrict data from flowing over the public internet for SaaS applications that are integrated with AWS PrivateLink, reducing exposure to security threats.
About Teradata Vantage
Vantage is the connected multi-cloud data platfrom for enterprise analytics.
Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.
Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.
Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.
Teradata Vantage Native Object Store (NOS) can be used to explore data in external object stores, like Amazon S3, using standard SQL. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in an S3 bucket by simply creating a NOS table definition that points to your bucket. With NOS, you can quickly import data from S3 or even join it with other tables in the Vantage database.
See the
documentation for more information on Teradata Vantage.
Prerequisites
You should be familiar with Amazon Web Services concepts, Amazon Appflow, and Teradata Vantage.
You will need the following accounts and systems:
- AWS account with appropriate permissions (you can create free account),
- An S3 bucket,
- Teradata Vantage with the Advanced SQL Engine 17.0 or higher, and
- A Salesforce account that satisfies the following requirements:
- Your Salesforce account must be enabled for API access.
API access is enabled by default for Enterprise, Unlimited, Developer, and Performance editions.
- Your Salesforce account must allow you to install connected apps.
If this is disabled, contact your Salesforce administrator. After you create a Salesforce connection in Amazon AppFlow, verify that the connected app named "Amazon AppFlow Embedded Login App" is installed in your Salesforce account.
- The refresh token policy for the "Amazon AppFlow Embedded Login App" must be set to "Refresh token is valid until revoked". Otherwise, your flows will fail when your refresh token expires.
- You must enable Change Data Capture in Salesforce to use event-driven flow triggers. From Setup, enter "Change Data Capture" in Quick Find.
- If your Salesforce app enforces IP address restrictions, you must whitelist the addresses used by Amazon AppFlow. For more information, see AWS IP address ranges in the Amazon Web Services General Reference.
- If you are transferring over 1 million Salesforce records, you cannot choose any Salesforce compound field. Amazon AppFlow uses Salesforce Bulk APIs for the transfer, which does not allow transfer of compound fields.
- To create private connections using AWS PrivateLink, you must enable both "Manager Metadata" and "Manage External Connections" user permissions in your Salesforce account. Private connections are currently available in the Northern Virginia (us-east-1) and Oregon (us-west-2) AWS Regions.
- Some Salesforce objects can't be updated, such as history objects. For these objects, Amazon AppFlow does not support incremental export (the "Transfer new data only" option) for schedule-triggered flows. Instead, you can choose the "Transfer all data" option and then select the appropriate filter to limit the records you transfer.
Procedure
These are the steps to connect Teradata Vantage to Salesforce using Amazon Appflow:
- Create a flow in Amazon Appflow
- Explore the data using NOS in Vantage
Create a flow in Amazon Appflow
This step creates a flow using Amazon AppFlow. For this example, we are using a
Salesforce developer account to connect to Salesforce.
Log into the AWS Console
Sign into the AppFlow console with your AWS login credentials.
Click Create flow.
Ensure that you are in the right region.
Ensure that you have an S3 bucket created. We will use the bucket, “ptctsoutput”, for this guide.
Specify flow details
This step provides basic information for your flow.
Fill in
Flow name (i.e.
salesforce) and
Flow description (optional).
Leave
Customize encryption settings (advanced) unchecked.
Click
Next.
Configure flow
This step provides information about the source and destination for your flow.
For this guide, we will be using Salesforce as the source, and AWS S3 as the destination.
For Source name, choose Salesforce, then Create new connection for Choose Salesforce connection.
Use the defaults for Salesforce environment and Data encryption.
Give your connection a name (i.e. salesforce).
Click Continue.
At the Salesforce login window, enter your Username and Password.
Click Log In.
Click Allow to allow Appflow to access your Salesforce data and information.
On the Appflow Configure flow window, use Salesforce objects.
Choose Account to be the Salesforce object.
Use Amazon S3 as Destination name.
Select your S3 bucket (referenced above) where you want the data to be stored (i.e. ptctsoutput).
Click Run on demand for the Flow trigger.
Click Next.
Map data fields
This step determines how data is transferred from the source to the destination.
Select Manually map fields for the Mapping method.
For simplicity, choose Map all fields directly for the Source to destination filed mapping.
Once you click on Map all fields directly, all of the fields will display under Mapped fields.
You may click on the checkbox for the field(s) you want to Add formula (e.g. concatenate), Modify values (e.g. mask or truncate field values), or Remove selected mappings.
For simplicity, no checkboxes will be ticked in this guide.
For Validations, add in a condition to ignore the record that contains no “Billing Address”.
Click Next.
Add a filter
You can add filters to determine which records to transfer.
For this guide, add a condition to filter out the records that are Deleted.
Click Next.
Review and create the flow
Review all the information you just entered. Modify if necessary.
Click Create flow.
A “salesforce successfully created” message will display with the flow information once the flow is created.
Run the flow
Click Run flow in the upper right corner.
Upon successful completion of the flow, a message will display.
Click the link to the bucket to view the data.
The data from Salesforce will be in JSON format.
Change the data file properties
By default, the data from Salesforce is encrypted. We need to remove the encryption for NOS to access it.
Click on the data file in your S3 bucket, then click the Properties tab.
On the Properties tab, click on the Encryption tile.
Click on the None option to remove the encryption (e.g. AWS-KMS).
Click Save.
Explore the data in S3 using NOS dynamically
The Native Object Store (NOS) feature in Vantage has built in functionality to explore and analyze data in object storage, such as Amazon S3.
This section describes how you may dynamically read an object (file) in S3 without creating any permanent objects in Vantage. This is an agile method when you are investigating data.
Use the READ_NOS Table Operator
The READ_NOS table operator can be used to sample and explore a percent of the data without having to define a foreign table or a view on an object in S3.
The following SQL example shows the reading of an object in S3 using the full name of the object. You can use Teradata Studio or Teradata Studio Express to run SQL commands.
The ACCESS_ID is the access key ID for your AWS account and the ACCESS_KEY is your secret access key.
This is output from the above SQL command:
Import data from a file on S3 into Vantage with CREATE TABLE
If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign table does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.
This method uses the CREATE TABLE AS…WITH DATA statement. Using this approach allows you to choose which attributes within the foreign table payload that you want to include in the target table, and what the relational table columns will be named.
The above statement creates a table named, salesforceReadNOS.
Use the following SQL statement to query it.
This is a partial result from the SELECT statement.
Explore the data in S3 using NOS with a foreign table
The Native Object Store (NOS) feature in Vantage has built in functionality to explore and analyze data in object storage, such as Amazon S3.
Typically, you will define a permanent object, a foreign table, in Vantage to access the data in S3 as if it were a table within Vantage.
This section describes how you may use different capabilities of NOS to explore the data from Salesforce that is in the S3 bucket using a foreign table.
Create Foreign Table
A foreign table allows the data in S3 to be easily referenced within the Vantage SQL Engine and makes the data available in a structured relational format.
Login to Vantage using Teradata Studio or Teradata Studio Express.
Create an AUTHORIZATION object with the access key ID and secret access key for S3 bucket access. The Authorization object enhances security by establishing control over who is allowed to use a foreign table to access AWS S3 data.
The USER is the access key ID for your AWS account and the PASSWORD is your secret access key.
Create a foreign table on the JSON file in S3 using following command.
At a minimum, the foreign table definition must include a table name and location clause (highlighted in yellow) which points to the object store data. The Location requires a top-level single name, which is the S3 bucket.
If the file name doesn’t have standard extension (.json, .csv, .parquet) at the end, the Location and Payload columns definition is also required (highlighted in turquoise) to indicate the type of the data file.
Foreign tables are always defined as No Primary Index (NoPI) tables.
Once foreign table is created, you can query the contents of the S3 data set by using SELECT.
The foreign table only contains two columns: Location and Payload.
Location is the address in the object store system. The data itself is represented in the payload column, with the payload value within each record representing a single JSON object and all its name-value pairs.
This is a sample output from “SELECT * FROM salesforce;”.
This is a sample output from “SELECT payload.* FROM salesforce;”.
Use the JSON_KEYS Table Operator
JSON data may contain different attributes in different records. To determine the full list of possible attributes in a data store, you can use the JSON_KEYS table operator. (You must have first created the foreign table from the previous step.)
This is the partial output from the JSON_KEYS table operator.
Create View
Views can simplify the names associated with the payload attributes, which makes it easier to use the foreign table definition. (You must have first created the foreign table from the previous step.)
The following view statement uses attributes discovered from the JSON_KEYS table operator above.
With the view created, you can select the data from the file in S3 with a much simpler SQL statement.
This is the partial output from the view.
Join data from a file on S3 to in-database tables
Once a foreign table is defined, it may be treated like most other tables in Vantage, including joining them in a single statement for detailed analysis.
For example, imagine that Vantage has tables for orders and shipping. The following are the DDL statement that define these tables.
This is the Orders table, which holds the header information (one per order).
This is the Order Items table, which holds the individual items on each order (one or more per order).
This is the Address table, which holds the shipping address for customers.
Now further imagine that these table have the following data.
Orders:
Order items:
Shipping addresses:
By joining the salesforce foreign table to the above tables, we can retrieve a customer’s order information with a customer’s shipping information.
This is a sample of the results of the above query.
Import data from a file on S3 into Vantage with CREATE TABLE
If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.
This method uses the CREATE TABLE AS…WITH DATA statement. Using this approach allows you to choose which attributes within the foreign table payload that you want to include in the target table, and what the relational table columns will be named.
The above statement creates a table named, salesforceVantage.
Use the following SQL statement to query it.
This is a partial result from the SELECT statement.
Import data from a file on S3 into Vantage with INSERT SELECT Statement
If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.
This method uses the INSERT...SELECT statement. Using this approach allows you to both choose which attributes within the foreign table payload that you want to include in the target table as well as change the target table’s attributes. For example, you can specify that the target table be MULTISET or not, or you can choose a different primary index.
This approach does require the target table to be create beforehand, which the following SQL statement will do.
The data must then be added (inserted) from the file on S3 into the target table that was just created.
The previous statement above created a table named, salesforcePerm.
Use the following SQL statement to query it.
This is a partial result from the SELECT statement.
Cleanup
This is an
optional step to help you avoid extra charges in your AWS account.
Remove Amazon Appflow
Delete the Connections that you created for the flow.
Delete the flow.
Delete data in the S3 bucket
Delete the data from Salesforce in the S3 bucket.
If you no longer need the bucket, you may also delete the bucket.
Stop the Vantage instance
If you no longer need the Vantage instance, you may stop and terminate it.