Using DBGen to Generate TPC-H Test Data
In this post I describe the Data Base Generator (DBGen) utility in terms of what it is, how to install it, and how to use it. This post assumes a Linux operating system. There are different build procedures for DBGen on Windows, but the concepts carry over. DBGen is used to create a TPC-H schema (i.e., it provides the CREATE TABLE statements) and to generate the data. DBGen is used in official TPC-H benchmarking, but it can also be used in informal TPC-H-like benchmarking. The schema and data is compatible with most 3 rd party benchmarking tools like HammerDB, so even if you plan to run a TPC-H-like test using HammerDB you may still use DBGen to create the schema and data. The CREATE TABLE statements are ANSI standard SQL and can be run in any DBMS without editing. There are no SQL statements for indexes or foreign key constraints provided by DBGen since those types of objects are not required. The data is created as delimited flat files, with one file for each of