Understanding TPC and TPC-like Database Benchmarks
This post is Part 1 in a series about running benchmarks to measure the performance of relational databases. This post provides a very brief introduction to the popular industry standard benchmarks TPC-C, E, H, and DS and the software programs you can use to run a variation of each benchmark.
The discussion here is generic and can be applied to any
relational database like Oracle, SQL Server, MySQL, and PostgreSQL. The TPC has many benchmarks for non-relational
databases not discussed here. The
discussion here is also intended for informal benchmarking without
publication.
Why Benchmark?
Benchmark tools can answer a variety of questions and help
predict success or failure. Consider the
following examples of why organizations perform internal benchmarking of
database systems:
Predict change impact. Organizations run benchmarks to see how a
system change impacts performance. Run a
benchmark to establish a baseline, make a system or database change, and then
re-run the benchmark to see how much it helps or hurts performance.
Determine scalability. All benchmarks are scalable. You can scale-up the amount of data and users
to find the server's optimal and peak loads.
Doing this in dev/test allows you to predict how many production servers
are needed.
Compare two servers.
You can run the exact same benchmark on two servers and see which one
performs better. Some organizations like
to compare different processor architectures, or test with varying numbers of
CPU sockets and cores.
Compare two DBMS.
You can run the exact same database workload using the same data and
users on different DBMS and see which performs better. Organizations often combine benchmark results
with system cost information.
Benchmarking is not about bragging. You cannot legally publish database
performance numbers for a commercial DBMS without the prior written
authorization of the DBMS provider (i.e., Oracle, Microsoft, IBM, etc.) Most open source databases have no such
restriction, but you still need approval from the author of the benchmark and
the author of the benchmarking tool anytime you publish a claim involving their
trademarks. I have another blog post about
publishing database benchmark results.
In all database benchmarks the workload is fixed but the
system is variable: you can use different processors, more or less memory,
different storage, change the O/S or DBMS, but you cannot change the
workload. These facts make it possible
to directly compare, for example, a DBMS on one server running Linux to a
different DBMS on a different server running Windows.
It is important to know the cost of achieving performance
numbers. What if system A runs twice as
fast as system B but costs five times as much?
All formal TPC benchmarks have cost requirement and the score is
reported in terms of performance per unit of currency. You should apply the same concept whenever
benchmarking two systems informally.
Who Defines the Benchmarks?
Industry standard database benchmarks are defined by members
of the Transaction Processing Performance Council (TPC), a global non-profit
whose home page is tpc.org. Their goal is
to establish fair benchmarks for measuring total system performance under a
database workload. TPC members
participate in various activities, but foremost, they work together to define
fair benchmarks for each type of database workload.
Prior to the TPC's founding in 1988 each system vendor had
their own benchmark that favored themselves, such as the IBM TPI benchmark and the
Tandem Debit-Credit Benchmark. That made
it impossible for customers to make fair comparisons of systems or databases.
TPC members today include commercial database vendors IBM,
Microsoft, and Oracle; server vendors Dell, Cisco, and Lenovo; chip makers AMD
and Intel; and operating system vendors Microsoft, Red Hat, and VMware. All members can participate in working groups
to define new benchmarks, but each working group has a limited number of
members: the benchmark TPC-DS was designed by a working group consisting of Sybase,
Oracle, and HPE before it was presented to the TPC as a whole.
Which Benchmarks are Available?
There are currently four industry standard benchmarks for
relational databases: TPC-C and TPC-E have transactional workloads, and TPC-H
and TPC-DS have Decision Support System workloads.
Benchmark |
Workload |
Stress Points |
Behavioral Notes |
Other Notes |
TPC-C |
Transaction Processing with a Supplier
Model |
CPU, storage latency, network
latency. |
Mostly small block random reads with
occasional scans. Data is randomly
generated and unrealistic. Designed
for 2/3 reads, 1/3 writes. Work is not
well distributed and cannot guarantee all datafiles will be used. Score tpmC is new orders divided by
benchmark run time. |
The number of users must be 10x the data
scale. Max score = 12.86 times scale. A scale 5,000 system can't report above
64,300. |
TPC-E |
OLTP with Brokerage Model |
CPU, storage latency, network
latency. |
Designed for 90% reads, 10%
writes. Adds more tables, constraints,
and transaction types than TPC-C.
Addresses the shortfalls of TPC-C by using uses realistic data and
distributing work across all files.
Another difference is TPC-E requires steady state performance. |
Max score = If perf varies by more than 2% during
the test it is considered a failure and discarded. |
TPC-H |
DSS Relational |
CPU scaling, storage bandwidth,
network bandwidth |
Just 8 tables with no required keys
or indexes, and 22 queries. Allows
very small scaling (1 GB of data and 2 clients). These issues are addressed in TPC-DS which
has 24 tables, 99 queries, and raises the minimum size to 1 TB with 11
clients. |
Includes single user and multi-user
tests which each must be run twice.
Scoring uses a complex geometric mean. |
TPC-DS |
DSS Star Schema |
CPU scaling, storage bandwidth,
network bandwidth |
24 tables (17 dimension and 7 fact),
and 99 queries. Raises scale requirement
to a minimum of 1TB data and 11 clients. |
A "normal" database is
scale 10,000 (10TB) with 15 clients. |
Here's a bit more description of each benchmark:
TPC-C - transactional order entry system based on a
wholesale supplier model. Five types of
transactions compete for system resources with New Orders representing up to
45% of the total. The final score is
based solely on the number of New Order transactions completed divided by the
number of minutes in the test period, so we generally call it NOPM but the
official designation is tpmC.
Performance cannot exceed 12.86 transactions per minute times the scale
of data. The number of client users must
be ten times the scale of data and all transactions must be painted on a client
terminal to be counted. For example: a
scale 5,000 database requires 50,000 concurrent users and has a max allowed
score of 643,000 NOPM. I have a separate
blog post on skipping these rules.
TPC-E - online transactional order entry system based
on a stockbroker-trader model. The
transactions are more complex than TPC-C and it also uses a more modern
business model (TPC-C predates the Internet and today's online models). TPC-E uses many concurrent users
simultaneously executing a variety of complex tasks. Any transaction type can be executed at any
time to create contention and show bottlenecks.
The goal is to show steady performance, and if performance fluctuates by
more than 2% the test is considered a failure and the score erased. TPC-E score is reported as Completed Trade
Results Per Second, so it is the total number of completed trades divided by
the benchmark run time in seconds. There
are actually 12 transaction types, but only Trades are scored.
TPC-H - measures the DSS performance at the analysis
end of a data environment, whereas TPC-C is at the operational end. TPC-H includes three tests: Load Test, Power User
Test, and Throughput Test. The Power
User test gives all system resources to a single user who must complete 22
predefined queries one after the other.
The Throughput Test has many users competing for system resources, and
each user runs the same 22 queries but in a different order and with different
predicate values. The overall test order
is Load, Power User, Throughput, Power User again, and Throughput again. During each execution of a Power and
Throughput tests there are two "refresh functions" which loads new
data from an imaginary TPC-C system into the active TPC-H database. TPC-H has two scores: the Load Test score,
and the Performance score which is calculated using a complex geometric mean formula
that includes the data scale, the faster of your two Power Tests, and the
faster of your two Throughput Tests. If
running TPC-H informally, then just add-up the total run time of all tests and
that's your score which you can compare to running the same tests on another
server or against another DBMS.
TPC-DS - This models the DSS performance of a system
that tracks sales and returns for an organization with three sales channels: retail
stores, catalogs, and an Internet marketplace.
TPC-DS uses a star schema whereas TPC-H has a relational schema. TPC-DS has three times as many tables and
over 4 times as many queries than TPC-H.
TPC-DS is also intended for larger systems: it does not allow data
scales less than a terabyte whereas most people run TPC-H a terabyte or less data. The test procedures and scoring are similar
to TPC-H. TPC-DS is seldom used due to a
lack of tools that really support it: at last check the only tool supporting TPC-DS
was SwingBench, but it's a variant and for Oracle only.
Historically, the first three benchmarks were TPC-A, B, and
C in that order. All three were
transactional. A was replaced by B which
was replaced by C. The fourth benchmark
TPC-D was introduced in 1995 as the Council's first DSS workload benchmark, but
it had static/predictable queries and was retired and replaced in 1999 by another
DSS benchmark TPC-H which featured variable queries.
Some old benchmarks just won't die. TPC-C was made obsolete many years ago by
TPC-E, and TPC-H was obsoleted by TPC-DS, but that never really caught on and
most people still run C and H. Today,
all four benchmarks are actively supported by the TPC.
Benchmarking Components
To conduct a database benchmark you will need all of the
following items:
- A server running a DBMS with enough storage to support your scale of testing. This system is called the System Under Test or SUT.
- For Oracle you must create the database. For all other DBMS just create the cluster and the benchmark software will add a database.
- One or more client computers on which to run the benchmark software and drive the workload against your database. These clients are called Load Generators.
- A schema generator to create the tables and other objects in your database.
- A data generator to create test data at the desired scale.
- A data loader to load the test data into your database, index it, and gather statistics.
- The benchmark client software to run the predefined workload and report performance.
You provide the database server(s) hardware, the database software, and the Load Generator clients. I've read about people using over a thousand database servers and several hundred client computers for a single test. I typically just have one server to do everything.
Unofficial tools like HammerDB and Quest Benchmark Factory provide
everything else you need. They create
tables, indexes, and keys as appropriate, generate the data and load it into
your database, and gather optimizer statistics.
They also provide a user interface to run and monitor the
benchmark. They even have options to partition
certain tables. There's nothing left
out. Well, these tools don't know if you
want to use a whacky database feature like in-memory column stores and so you
may have to customize the database. Don't
over-customize things, because benchmarks have rules about that.
Official TPC tools are much less friendly and frankly can be
very challenging to work with. Consider
the following "issues":
- They provide a script to create the required tables, but it is so generic that you might spend all day hand-customizing with options like partitioning and compression.
- They provide a utility to generate test data in flat files, but they don't provide a way to load the data into your database.
- It's also your job to create any desired keys, indexes, and to collect optimizer statistics.
- They don't provide an engine to drive the workload.
- Sometimes they give you a utility to create the query set, but they don't provide a way to orchestrate query execution with the approved parameters.
Frankly, I avoid using the official TPC tools for those reasons, unless I'm going to publish the results in which case the official tools are mandatory. To get around all of their shortcomings it maybe necessary to hire a consulting firm that specializes in running formal TPC benchmarks with auditing and publishing.
That's my bit on understanding TPC and TPC-like
benchmarks. I hope it helps you. Please see my related posts
on benchmarking tools, publication restrictions, and so on.
Cheers!
Comments
Post a Comment