I/O wait or
wait% is often displayed by command-line Linux system monitoring tools such as top, sar, atop, and others. On its own, it’s one of many performance stats that provide us insight into Linux system performance.
I/O wait came up in a recent discussion with a new client. During our support call, they reported load spikes of 60 to 80 on their 32 CPU core system. This resulted in slow page loading, timeouts, and intermittent outages. The cause? Storage I/O bottleneck was initially hinted at by a consistently high iowait and later confirmed with additional investigation.
What is I/O wait? How does I/O wait affect Linux server performance? How can we monitor and reduce I/O wait related issues? Continue reading for the answers to these questions.
What is I/O wait?
I/O wait applies to Unix and all Unix-based systems, including macOS, FreeBSD, Solaris, and Linux.
I/O wait (iowait) is the percentage of time that the CPU (or CPUs) were idle during which the system had pending disk I/O requests. (Source:
man sar) The
top man page gives this simple explanation: “I/O wait = time waiting for I/O completion.” In other words, the presence of I/O wait tells us that the system is idle when it could be processing outstanding requests.
“iowait shows the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.” – iostat man page.
When using Linux top and other tools, you’ll notice that a CPU (and its cores) operate in the following states: us (user), sy (system), id (idle), ni (nice), si (software interrupts), hi (hardware interrupts), st (steal) and wa (wait). Of these, the user, system, idle, and wait values should add up to 100%. Note that “idle” and “wait” are not the same. “Idle” CPU means there is no workload present while, on the other hand, “wait” (iowait) indicates when the CPU is waiting in an idle state for outstanding requests.
If the CPU is idle, the kernel will ascertain any pending I/O requests (i.e., SSD or NFS) originating from the CPU. If there are, then the ‘iowait’ counter is incremented. If nothing is pending, then the ‘idle’ counter is incremented.
I/O wait and Linux server performance
It’s important to note that iowait can, at times, indicate a bottleneck in throughput, while at other times, iowait may be completely meaningless. It’s possible to have a healthy system with high iowait, but also possible to have a bottlenecked system without iowait.
I/O wait is simply one of the indicated states of your CPU / CPU cores. A high iowait means your CPU is waiting on requests, but you’ll need to investigate further to confirm the source and effect.
For example, server storage (SSD, NVMe, NFS, etc.) is almost always slower than CPU performance. Because of this, I/O wait may be misleading, especially when it comes to random read/write workloads. This is because iowait only measures CPU performance, not storage I/O.
Although iowait indicates that the CPU can handle more workload, depending on your server’s workload and how load performs computations or makes use of storage I/O, it isn’t always possible to solve I/O wait. Or not feasible to achieve a near-zero value.
Based on end-user experience, database query health, transaction throughput, and overall application health, you will have to decide whether or not the iowait reported indicates poor Linux system performance.
For example, if you see a low iowait of 1 to 4 percent, and you then upgrade the CPU to 2x the performance, the iowait will also increase. A 2x faster CPU with the same storage performance = ~ 2x the wait. You’ll want to consider your workload to determine which hardware you should pay attention to first.
Monitoring and reducing I/O wait related issues
Let’s look at some valuable tools used to monitor I/O wait on Linux.
atop– run it with -d option or press
dto toggle the disk stats view.
iostat– try it with the
-xm 2options for extended statistics, in megabytes and in two-second intervals.
iotop– top-like I/O monitor. Try it with the
-oPaoptions to show the accumulated I/O of active processes only.
auxf, then under the “STAT” column “D” usually indicates disk iowait.
strace– view the actual operations issued by a process. Read the
lsof– after you’ve identified the process responsible, use
-p [PID]to find the specific files.
Reducing I/O wait related issues
Take the following steps to reduce I/O wait related issues.
- Optimize your application’s code and database queries. This can go a long way in reducing the frequency of disk reads/writes. This should be your first approach because the more efficient your application is, the less you’ll have to spend on hardware long-term. See also: 100 Application Performance Monitoring (APM) & Observability Solutions.
- Keep your Linux system and software versions up-to-date. Not only is this better for security, but more often than not, the latest supported versions offer notable performance improvements, whether it’s Nginx, Node.js, PHP, Python, or MySQL.
- Make sure that you have free memory available. Enough free memory so that around half of the server’s memory is being used for in-memory buffers and cache, rather than swapping and paging to disk. Of course, this ratio will differ case by case. Therefore, be sure you are not swapping and kernel cache pressure isn’t high due to a lack of free memory.
- Tweak your system, storage device(s), and the Linux kernel for increased storage performance and lifespan.
- Finally, if all else fails: upgrade storage devices to faster SSD, NVMe, or other high throughput storage devices.
The iowait statistic is a helpful performance stat for monitoring CPU utilization health. It notifies the Sysadmin when the CPU is idle and can perform more computations. We can then use observability, benchmarking, and tracing tools such as those listed above to put together a complete picture of the system’s overall I/O performance. Your main goal should be to eliminate any iowait directly resulting from waiting on disk, NFS, or other storage-related I/O.
Published: Aug 19th, 2020 | Last updated: Jan 28th, 2022.