Catoctin Data Blog

Posts

Showing posts from March, 2021

Using DBGen to Generate TPC-H Test Data

March 26, 2021

In this post I describe the Data Base Generator (DBGen) utility in terms of what it is, how to install it, and how to use it. This post assumes a Linux operating system. There are different build procedures for DBGen on Windows, but the concepts carry over. DBGen is used to create a TPC-H schema (i.e., it provides the CREATE TABLE statements) and to generate the data. DBGen is used in official TPC-H benchmarking, but it can also be used in informal TPC-H-like benchmarking. The schema and data is compatible with most 3 rd party benchmarking tools like HammerDB, so even if you plan to run a TPC-H-like test using HammerDB you may still use DBGen to create the schema and data. The CREATE TABLE statements are ANSI standard SQL and can be run in any DBMS without editing. There are no SQL statements for indexes or foreign key constraints provided by DBGen since those types of objects are not required. The data is created as delimited...

Understanding Database Latency

March 25, 2021

Latency is basically the amount of time you must wait to get a response from a request. It is measured at a specific component like a storage or network device. When you ask a computer to do something, each component involved in that request has a minimum amount of time it takes to reply with an answer even if the answer is a null value. If your database requests a single block from storage, then the time it takes to receive that block is storage latency. If the storage is attached over a network like Fibre Channel or SAS, then the network interfaces and cables each add their own latency. Latencies are often measured in milliseconds (ms). There are 1,000 milliseconds per second, and most people cannot perceive anything smaller than 1 millisecond. However, computers are wicked fast and the latency may need to be measured in microseconds (each μs is a millionth of a second) or even nanoseconds (each ns is a billionth of a second). Hard dr...

Understanding TPC and TPC-like Database Benchmarks

March 15, 2021

This post is Part 1 in a series about running benchmarks to measure the performance of relational databases. This post provides a very brief introduction to the popular industry standard benchmarks TPC-C, E, H, and DS and the software programs you can use to run a variation of each benchmark. The discussion here is generic and can be applied to any relational database like Oracle, SQL Server, MySQL, and PostgreSQL. The TPC has many benchmarks for non-relational databases not discussed here. The discussion here is also intended for informal benchmarking without publication. Why Benchmark? Benchmark tools can answer a variety of questions and help predict success or failure. Consider the following examples of why organizations perform internal benchmarking of database systems: Predict change impact . Organizations run benchmarks to see how a system change impacts performance. Run a benchmark to establish a baseline, make a system or ...