Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.



I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Policy.

What Working “at Scale” Really Means

What Working “at Scale” Really Means

Solving the ten hundred thousand problem

Things seem to come in waves, and in the past few weeks I have had the same conversation at least seven times. I call it solving the ten hundred thousand problem (and that is not the same as a million). What this means is that if you have 10 users, running 10 queries, involving 10 rows or tables it is not that hard to manage. Now increase the scale to hundreds instead of tens and you may get by with a squad of good DBAs and sysadmins. Keep going to thousands (and even millions!!) and it goes beyond the ability of people to manage the environment.

When you need to move from the departmental solution or POC to “operational and production systems working at scale” you encounter three main challenges: optimizing the individual queries, managing the diverse workload, and overall system monitoring. Let’s take a look at each of these in a bit of detail.


Optimizer

The first challenge is that each individual query has to be effectively optimized every time it runs. That means that you cannot rely on people to provide hints or expect that users will write the best code or think that a business tool that generates generic SQL is best suited for every platform. The optimizer must be fully aware of data relationships, fully leverage parallelism, and be able to execute snippets of a query that in turn can be used to better optimize the next steps in that query.

Teradata Vantage is built upon the Teradata database, which has the richest optimizer for analytic workloads. Vantage takes that foundation and has expanded it to include optimization with machine learning and graph functions as well as incorporating data from other systems.

Workload Management

Once each query has been optimized, the workload as a whole needs to be managed. Not every query carries the same importance. There are web access queries that have stringent service levels that must be met, or customer experience suffers. The other end of the spectrum are long running queries of a “what if” nature where the run time is not as much of a concern, as long as insight can be gained from the completed query.

Of course, the mix along this spectrum is constantly changing. Some hours are heavily skewed to the tactical while other times are skewed to the strategic. As concurrency grows, overlapping priorities and conflicting requirements will become a problem to manage. 

Teradata Vantage has world class workload management whereby rules and service levels are broadly defined, and the system takes over from there to ensure all workloads are receiving the proper resources to satisfy the user expectations.

System Monitoring

While it is good to have queries optimized and workloads managed, there is the broader challenge of keeping the whole environment running smoothly as a single system. This would include message traffic, monitoring, error recovery, space management, as well as continuity in systems where loads and queries are happening all day. To add to the challenge is a concept like failover where multiple systems are being kept in synch for highly operational environments.

No small task indeed and this is where Teradata Vantage again leverages the rich experience of the past. Excelling not only in single system management but bringing a full suite of tools within IntelliSphere to simplify and automate multi-system ecosystems.

Conclusion

There is a big difference between running a POC and running “at scale.” There is also a difference between a departmental or point solution and running across your enterprise with consistent and integrated data. The worst time to understand the limitations of a solution is once value is shown and you need to grow by adding more users, more data, and more queries.

Teradata Vantage brings the power of all the above in addition to unparalleled scalability. And as new advanced analytics engines are added into the mix, Teradata will be working to bring the same rigor to those operations as well.

All of this combines to solve the ”ten hundred thousand” problem, allowing your business to go from insight to production without worry and to drive millions to your bottom line.


Portrait of Rob Armstrong

(Author):
Rob Armstrong

Starting with Teradata in 1987, Rob Armstrong has contributed in virtually every aspect of the data warehouse and analytical processing arenas. Rob’s work in the computer industry has been dedicated to data-driven business improvement and more effective business decisions and execution.  Roles have encompassed the design, justification, implementation and evolution of enterprise data warehouses.

In his current role, Rob continues the Teradata tradition of integrating data and enabling end-user access for true self-driven analysis and data-driven actions. Increasingly, he incorporates the world of non-traditional “big data” into the analytical process.  He also has expanded the technology environment beyond the on-premises data center to include the world of public and private clouds to create a total analytic ecosystem.

Rob earned a B.A. degree in Management Science with an emphasis in mathematics and relational theory at the University of California, San Diego. He resides and works from San Diego.

View all posts by Rob Armstrong

Turn your complex data and analytics into answers with Teradata Vantage.

Contact us