Understanding TPC and TPC-like Database Benchmarks

This post is Part 1 in a series about running benchmarks to measure the performance of relational databases.  This post provides a very brief introduction to the popular industry standard benchmarks TPC-C, E, H, and DS and the software programs you can use to run a variation of each benchmark.

The discussion here is generic and can be applied to any relational database like Oracle, SQL Server, MySQL, and PostgreSQL.  The TPC has many benchmarks for non-relational databases not discussed here.  The discussion here is also intended for informal benchmarking without publication. 

Why Benchmark?

Benchmark tools can answer a variety of questions and help predict success or failure.  Consider the following examples of why organizations perform internal benchmarking of database systems:

Predict change impact.  Organizations run benchmarks to see how a system change impacts performance.  Run a benchmark to establish a baseline, make a system or database change, and then re-run the benchmark to see how much it helps or hurts performance. 

Determine scalability.  All benchmarks are scalable.  You can scale-up the amount of data and users to find the server's optimal and peak loads.  Doing this in dev/test allows you to predict how many production servers are needed.

Compare two servers.  You can run the exact same benchmark on two servers and see which one performs better.  Some organizations like to compare different processor architectures, or test with varying numbers of CPU sockets and cores.

Compare two DBMS.  You can run the exact same database workload using the same data and users on different DBMS and see which performs better.  Organizations often combine benchmark results with system cost information.

Benchmarking is not about bragging.  You cannot legally publish database performance numbers for a commercial DBMS without the prior written authorization of the DBMS provider (i.e., Oracle, Microsoft, IBM, etc.)  Most open source databases have no such restriction, but you still need approval from the author of the benchmark and the author of the benchmarking tool anytime you publish a claim involving their trademarks.  I have another blog post about publishing database benchmark results.

In all database benchmarks the workload is fixed but the system is variable: you can use different processors, more or less memory, different storage, change the O/S or DBMS, but you cannot change the workload.  These facts make it possible to directly compare, for example, a DBMS on one server running Linux to a different DBMS on a different server running Windows.

It is important to know the cost of achieving performance numbers.  What if system A runs twice as fast as system B but costs five times as much?  All formal TPC benchmarks have cost requirement and the score is reported in terms of performance per unit of currency.  You should apply the same concept whenever benchmarking two systems informally.

Who Defines the Benchmarks?

Industry standard database benchmarks are defined by members of the Transaction Processing Performance Council (TPC), a global non-profit whose home page is tpc.org.  Their goal is to establish fair benchmarks for measuring total system performance under a database workload.  TPC members participate in various activities, but foremost, they work together to define fair benchmarks for each type of database workload. 

Prior to the TPC's founding in 1988 each system vendor had their own benchmark that favored themselves, such as the IBM TPI benchmark and the Tandem Debit-Credit Benchmark.  That made it impossible for customers to make fair comparisons of systems or databases.

TPC members today include commercial database vendors IBM, Microsoft, and Oracle; server vendors Dell, Cisco, and Lenovo; chip makers AMD and Intel; and operating system vendors Microsoft, Red Hat, and VMware.  All members can participate in working groups to define new benchmarks, but each working group has a limited number of members: the benchmark TPC-DS was designed by a working group consisting of Sybase, Oracle, and HPE before it was presented to the TPC as a whole.

Which Benchmarks are Available?

There are currently four industry standard benchmarks for relational databases: TPC-C and TPC-E have transactional workloads, and TPC-H and TPC-DS have Decision Support System workloads. 

Benchmark

Workload

Stress Points

Behavioral Notes

Other Notes

TPC-C

Transaction Processing with a Supplier Model

CPU, storage latency, network latency. 

Mostly small block random reads with occasional scans.  Data is randomly generated and unrealistic.  Designed for 2/3 reads, 1/3 writes.  Work is not well distributed and cannot guarantee all datafiles will be used.  Score tpmC is new orders divided by benchmark run time.

The number of users must be 10x the data scale.

 

Max score = 12.86 times scale.  A scale 5,000 system can't report above 64,300.

TPC-E

OLTP with Brokerage Model

CPU, storage latency, network latency. 

Designed for 90% reads, 10% writes.  Adds more tables, constraints, and transaction types than TPC-C.  Addresses the shortfalls of TPC-C by using uses realistic data and distributing work across all files.  Another difference is TPC-E requires steady state performance. 

Max score =
(scale / 500)*1.02. 

 

If perf varies by more than 2% during the test it is considered a failure and discarded.

TPC-H

DSS Relational

CPU scaling, storage bandwidth, network bandwidth

Just 8 tables with no required keys or indexes, and 22 queries.  Allows very small scaling (1 GB of data and 2 clients).  These issues are addressed in TPC-DS which has 24 tables, 99 queries, and raises the minimum size to 1 TB with 11 clients.

Includes single user and multi-user tests which each must be run twice.  Scoring uses a complex geometric mean.

TPC-DS

DSS Star Schema

CPU scaling, storage bandwidth, network bandwidth

24 tables (17 dimension and 7 fact), and 99 queries.  Raises scale requirement to a minimum of 1TB data and 11 clients.

A "normal" database is scale 10,000 (10TB) with 15 clients.

 

Here's a bit more description of each benchmark:

TPC-C - transactional order entry system based on a wholesale supplier model.  Five types of transactions compete for system resources with New Orders representing up to 45% of the total.  The final score is based solely on the number of New Order transactions completed divided by the number of minutes in the test period, so we generally call it NOPM but the official designation is tpmC.  Performance cannot exceed 12.86 transactions per minute times the scale of data.  The number of client users must be ten times the scale of data and all transactions must be painted on a client terminal to be counted.  For example: a scale 5,000 database requires 50,000 concurrent users and has a max allowed score of 643,000 NOPM.  I have a separate blog post on skipping these rules.

TPC-E - online transactional order entry system based on a stockbroker-trader model.  The transactions are more complex than TPC-C and it also uses a more modern business model (TPC-C predates the Internet and today's online models).  TPC-E uses many concurrent users simultaneously executing a variety of complex tasks.  Any transaction type can be executed at any time to create contention and show bottlenecks.  The goal is to show steady performance, and if performance fluctuates by more than 2% the test is considered a failure and the score erased.  TPC-E score is reported as Completed Trade Results Per Second, so it is the total number of completed trades divided by the benchmark run time in seconds.  There are actually 12 transaction types, but only Trades are scored.

TPC-H - measures the DSS performance at the analysis end of a data environment, whereas TPC-C is at the operational end.  TPC-H includes three tests: Load Test, Power User Test, and Throughput Test.  The Power User test gives all system resources to a single user who must complete 22 predefined queries one after the other.  The Throughput Test has many users competing for system resources, and each user runs the same 22 queries but in a different order and with different predicate values.  The overall test order is Load, Power User, Throughput, Power User again, and Throughput again.  During each execution of a Power and Throughput tests there are two "refresh functions" which loads new data from an imaginary TPC-C system into the active TPC-H database.  TPC-H has two scores: the Load Test score, and the Performance score which is calculated using a complex geometric mean formula that includes the data scale, the faster of your two Power Tests, and the faster of your two Throughput Tests.  If running TPC-H informally, then just add-up the total run time of all tests and that's your score which you can compare to running the same tests on another server or against another DBMS.

TPC-DS - This models the DSS performance of a system that tracks sales and returns for an organization with three sales channels: retail stores, catalogs, and an Internet marketplace.  TPC-DS uses a star schema whereas TPC-H has a relational schema.  TPC-DS has three times as many tables and over 4 times as many queries than TPC-H.  TPC-DS is also intended for larger systems: it does not allow data scales less than a terabyte whereas most people run TPC-H a terabyte or less data.  The test procedures and scoring are similar to TPC-H.  TPC-DS is seldom used due to a lack of tools that really support it: at last check the only tool supporting TPC-DS was SwingBench, but it's a variant and for Oracle only.

Historically, the first three benchmarks were TPC-A, B, and C in that order.  All three were transactional.  A was replaced by B which was replaced by C.  The fourth benchmark TPC-D was introduced in 1995 as the Council's first DSS workload benchmark, but it had static/predictable queries and was retired and replaced in 1999 by another DSS benchmark TPC-H which featured variable queries.

Some old benchmarks just won't die.  TPC-C was made obsolete many years ago by TPC-E, and TPC-H was obsoleted by TPC-DS, but that never really caught on and most people still run C and H.  Today, all four benchmarks are actively supported by the TPC.

Benchmarking Components

To conduct a database benchmark you will need all of the following items:

  • A server running a DBMS with enough storage to support your scale of testing.  This system is called the System Under Test or SUT.
  • For Oracle you must create the database.  For all other DBMS just create the cluster and the benchmark software will add a database.
  • One or more client computers on which to run the benchmark software and drive the workload against your database.  These clients are called Load Generators.
  • A schema generator to create the tables and other objects in your database.
  • A data generator to create test data at the desired scale.
  • A data loader to load the test data into your database, index it, and gather statistics.
  • The benchmark client software to run the predefined workload and report performance.

You provide the database server(s) hardware, the database software, and the Load Generator clients.  I've read about people using over a thousand database servers and several hundred client computers for a single test.  I typically just have one server to do everything.

Unofficial tools like HammerDB and Quest Benchmark Factory provide everything else you need.  They create tables, indexes, and keys as appropriate, generate the data and load it into your database, and gather optimizer statistics.  They also provide a user interface to run and monitor the benchmark.  They even have options to partition certain tables.  There's nothing left out.  Well, these tools don't know if you want to use a whacky database feature like in-memory column stores and so you may have to customize the database.  Don't over-customize things, because benchmarks have rules about that. 

Official TPC tools are much less friendly and frankly can be very challenging to work with.  Consider the following "issues":

  • They provide a script to create the required tables, but it is so generic that you might spend all day hand-customizing with options like partitioning and compression.
  • They provide a utility to generate test data in flat files, but they don't provide a way to load the data into your database.
  • It's also your job to create any desired keys, indexes, and to collect optimizer statistics. 
  • They don't provide an engine to drive the workload. 
  • Sometimes they give you a utility to create the query set, but they don't provide a way to orchestrate query execution with the approved parameters.

Frankly, I avoid using the official TPC tools for those reasons, unless I'm going to publish the results in which case the official tools are mandatory.  To get around all of their shortcomings it maybe necessary to hire a consulting firm that specializes in running formal TPC benchmarks with auditing and publishing.

That's my bit on understanding TPC and TPC-like benchmarks.  I hope it helps you.  Please see my related posts on benchmarking tools, publication restrictions, and so on.

Cheers!

Comments

Popular posts from this blog

Using DBGen to Generate TPC-H Test Data

Oracle Database Smart Flash Cache (DSFC)