Understanding Database Latency

Latency is basically the amount of time you must wait to get a response from a request.  It is measured at a specific component like a storage or network device.  When you ask a computer to do something, each component involved in that request has a minimum amount of time it takes to reply with an answer even if the answer is a null value.  If your database requests a single block from storage, then the time it takes to receive that block is storage latency.  If the storage is attached over a network like Fibre Channel or SAS, then the network interfaces and cables each add their own latency.

Latencies are often measured in milliseconds (ms).  There are 1,000 milliseconds per second, and most people cannot perceive anything smaller than 1 millisecond.  However, computers are wicked fast and the latency may need to be measured in microseconds (each μs is a millionth of a second) or even nanoseconds (each ns is a billionth of a second).  Hard drives have millisecond latency, flash storage has microsecond latency, and CPUs have nanosecond latency.  I provide examples later.

High latency is bad, and low latency is good.  Later I'll discuss some good and bad numbers.  For now think about the impact of high latency on database performance: when a user process makes a request they hold onto their CPU resource by spinning it until the response is received, so higher latency means a server full of spinning CPUs that aren't really doing any work.  See my post on spin locks.

Response time is the sum of all latencies involved.  That is, if a database client asks to see some data then the response time is the sum of the latencies for the client application, the database application, the processors, network, and storage.

Terms are overloaded.  When talking about storage we often say it has two latency components named service time and response time.  In this context, "service time" is the time it takes for the controller to do something, and "response time" is the time it takes the for the physical read from NAND.  If you request an empty page, then the controller replies directly within a microsecond or two with a null.  If you request a data page, then the controller forwards the request to the NAND chip holding the data and waits for it to reply.  It doesn't really matter if there are one, two, or even three stages involved because we are only affected by the total wait time which we simply call latency.  It is up to the hardware manufacturer to worry about the lower levels and try to decrease each time.

Hard disk drives also has two latency components, but their names fortunately don't conflict with other terms.  They use "seek time" to refer to the time it takes to move the head to the correct cylinder or track, and "rotation time" for the time it takes to rotate the patter until the desired sector is under the head.  You don't really need to worry about that.  You care about the total.  HDD manufacturers worry about the individual numbers and try to decrease each.

Latency occurs at each and every layer in the stack.  Consider that a CPU's L1, L2, and L3 caches each have a defined latency.  Latency is also added by the network interface card, network cable, and of course all layers of the storage stack.  CPUs have the smallest or shortest latency in a computer, followed by DRAM, then Persistent Memory (PMEM), flash storage, and lastly hard disk drives.  External things like tape archive would be even slower.

We measure latency in sub-second terms like milliseconds (ms), microseconds (μs), and nanoseconds (ns).  Each one is an order of magnitude smaller than the previous one.  Consider the following chart:

Unit of Measure

Relative to 1 Second

Example

second (s)

1 or 100

The time it takes to say "one Mississippi" (1,000,000,000ns).

millisecond (ms)

1/1,000 or 10-3

A block read from local disk might take 13ms (13,000,000ns).
A block read from NAS disk might take 15ms (15,000,000ns).

microsecond (μs)

1/1,000,000 or 10-6

A block read from NVMe flash might take 60μs (60,000ns).

A page read from PMEM takes roughly 1μs (1,000ns).
Fibre Channel cable adds about 1μs (1,000ns) in each direction.

nanosecond (ns)

1/1,000,000,000 or 10-9

A DRAM read takes 14ns to 85ns depending on generation.
A CPU cache hit takes 1ns to 10ns depending on cache level.

Latency Table

For databases, we want all layers in the stack to have sub-millisecond latency.  Actually, 1 millisecond won't kill you, but we prefer to use components with latency not to exceed 100 microseconds (0.1 ms).  However, any system component under a heavy multi-user workload might see latencies creep up into the "several milliseconds" range.  A storage device advertised with 0.56 ms latency might occasionally hit 3 ms under pressure.

Fun tip: you can derive IOPS from latency.  Consider the below example where we have a 15K RPM hard disk drive with a seek time of 3.4ms, rotational latency of 2ms, and a rotational penalty of 0.052 ms, which tells us the disk supports 184 IOPS:

                       1 second
IOPS = ------------------------------------------------
       (seek time + rotational latency + rotation time)


              1 second             1 second      1 second
IOPS = ------------------------ = ---------- = ----------------
       (3.4ms + 2ms + 0.052 ms)    5.452 ms    0.005452 seconds

IOPS = 184

The above example is for HDD storage.  Flash storage has no seek or rotation.  Flash also utilizes multiple channels to access many NAND packs concurrently whereas all HDD have a single actuator.  See your vendor's data sheet for latency numbers in various configurations and use cases.

Latency depends on block size.  Don't expect the same latency when reading an 8K database block, a filesystem 4K page, or a 512-byte HDD sector.  (And don't even measure latency on 1 MB reads).  As the data size increases so does latency, but it's a non-linear increase.  You might notice latency doubles each time you octuple the block size such as when you go from a 512-byte block to an 4K block.  But, think about that: you are reading 8X data in 2X time for an overall performance gain of 4X.  In fact, you might find optimal database performance with a 16K block size.  I'm just saying.

Measuring latency can be done in many ways.  Open source software utilities like fio allow you to measure storage latency in your choice of best possible or sustained under a heavy load.  Generally fio is used to baseline performance of new storage devices prior to installation of database software.  I have a separate blog post on fio.

Many databases can report latency.  Consider the Oracle AWR Report metric db file sequential read shown below.  We see here the average latency for a single block read is 6ms and accounts for 45% of all database time:

AWR Report Snippet

Some databases also provide a means to assess your system's best possible latency.  The Oracle Calibrate I/O utility can report storage latency down to the microsecond.  It also provides the latency in milliseconds so you don't have to do the math.  Here's an example of that:

SQL> @calibrate_io 20 10
old 6: DBMS_RESOURCE_MANAGER.CALIBRATE_IO (&1, &2, iops, mbps, lat);
new 6: DBMS_RESOURCE_MANAGER.CALIBRATE_IO (20, 10, iops, mbps, lat);
max_iops = 584268
latency  = .148
max_mbps = 19475
max_iops = 584268
latency  = 0
max_mbps = 19475

You can also monitor latency with host utilities like iostat and netstat.  I often recommend using iostat to monitor multipath storage because it's important to see the overhead imposed during the merge phase: each path might show latency that is a fraction of what the multipath device reports, but it's only that higher number that affects your database.  It's a bit hard to see in the below example but devices nvme0n1 and nvme1n1 have latency of 0.12 and 0.11 milliseconds while the multipath device dm-4 has latency of 0.96 ms.  That's 8x worse latency due to the merge phase overhead.

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
dm-4              0.00     0.00 158513.00    0.00  1238.38     0.00    16.00   153.09    0.96    0.96    0.00   0.01 100.10
nvme0n1           0.00     0.00 79257.00    0.00   619.20     0.00    16.00     9.90    0.12    0.12    0.00   0.01 100.00
nvme1n1           0.00     0.00 79256.00    0.00   619.20     0.00    16.00     9.10    0.11    0.11    0.00   0.01 100.00

I hope that gives you a good understanding of latency.  Check out our other blogs for specific examples of diagnosing latency in Oracle AWR reports, Calibrate I/O, and using the fio utility.

 


Comments

Popular posts from this blog

Using DBGen to Generate TPC-H Test Data

Oracle Database Smart Flash Cache (DSFC)

Understanding TPC and TPC-like Database Benchmarks