Measure Linux web server memory usage correctly

Important update: This article was originally posted back in 2014. However, as I later posted in 2017 in the blog post, Does your Linux server need a RAM upgrade? Let’s check with free, top, vmstat and sar …there was a Linux kernel change to address this in 2016. (This change should motivate New Relic and others to follow suit in how memory usage is reported. New Relic has made the change.)  There’s now memory marked as available by the Linux kernel: “Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free).” /proc/meminfo: provide estimated available memory
“Many load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up “free” and “cached,” which was fine ten years ago but is pretty much guaranteed to be wrong today. It is wrong because Cached includes memory that is not freeable as page cache, for example, shared memory segments, tmpfs, and ramfs. It does not include reclaimable slab memory, which can take up a large fraction of the system memory on mostly idle systems with lots of files. Currently, the amount of memory that is available for a new workload, without pushing the system into swap, can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the “low” watermarks from /proc/zoneinfo. However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory. It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place…” – Source.

Original article: Does the screenshot below from New Relic look familiar to you? Let’s say you have a web server with 2GB of RAM or maybe 256GB. Your web apps are running slowly, so you check New Relic and/or other server monitoring and APM tools but unfortunately don’t see any red flags. The swap seen above may worry you a little, but you say… “Hey, there’s plenty of free space left, right!?” Technically, yes, but as it relates to Linux web server performance, no, absolutely not. Let’s discuss.

Free memory? No.


Linux Desktop Memory Usage vs. Linux Web Server Memory Usage

On your Linux-based home computer or laptop, you may be running Ubuntu, Linux Mint, Debian, or maybe Fedora, which is my favorite desktop distro. My laptop’s uptime shows 5 days, 11 hours, and 2 minutes. I use the standby feature a lot and have been up for 30+ days before. I mention this because the average user probably avoids keeping their home systems up for that long. After all, they’re not web servers! :) If you do keep them running, then that’s actually better. Here’s why.

Let’s say you have 8GB of installed ram on your laptop and earlier today; you used GIMP (image editor), Chrome and LibreOffice. Chances are, unless you perform heavy computing using other applications afterward or restart your system, Linux will keep a lot of the required files and paths cached and buffered to RAM (memory). This is very useful because if for some reason you decide to re-edit photos, browse the web again or open a new file in LibreOffice, all of these tasks will open and function noticeably faster the second time around. This is because they were cached (saved temporarily) to memory. Over time you may not use GIMP or LibreOffice for a while and gradually, the Linux kernel will replace those cached files with data from your new apps. This is perfectly fine because you don’t need to keep old and unused files from hours, days, or even weeks ago stored in memory, especially if you don’t have system memory to spare.

However, Linux servers are different, very different. The same files are requested repeatedly at varying rates throughout the day. Oftentimes files are requested several times per minute or, many times per second on busier servers. So how often do you want those files and paths removed from the cache? With Linux web servers, we want cached (and buffered) data to be kept as long as possible. Long enough that the cache pressure for removing files to make room for newer files isn’t causing the Linux kernel to favor serving files from disk (much slower!) rather than from cached memory. Looking at the above memory usage graph again, most of that white space is “useful” cache and buffers.

Cached memory is defined nicely by Jonathan DePrizio of
“Cached [memory] is the amount of memory that’s being used to keep copies of actual files in RAM. When there are files that are constantly being written or read, having these in memory reduces the amount of times you need to operate on the physical hard disk; as a result, you can greatly improve performance by using memory for cache.”

With this in mind, we must find the correct memory size for web servers; otherwise, the kernel will start serving more and more cached data from disk. Disk storage, even SSD (solid-state drive), is tremendously slower than RAM!


The command ‘free’ will never let you down!

From the Linux command line, using the free command or (or free -m or free -h) will often reveal that you are “using” more memory, thank you think! See this example below from Red Hat’s docs:

$ free
              total       used        free    shared    buffers    cached
Mem:        4040360    4012200       28160         0     176628   3571348
-/+ buffers/cache:      264224     3776136
Swap:       4200956      12184     4188772

Notice there’s 28160KB “free.” However, below that line, look at how much memory has been consumed by buffers and cache! Linux always tries to use memory first to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). This helps the system to run faster because disk information is already in memory which saves I/O operations. If more space is required, Linux will free up the buffers and cache to yield memory for the applications. If there’s not enough “free” space, then the cache will be saved (swapped) to disk. It would be wise to monitor this, keep swap and cache contention within an acceptable range that does not affect performance. – Source: Red Hat.

Have a look at the screen capture below. This time remember that the white/unshaded area (under “Physical memory”) is largely used by cache and buffers, which your web server depends on to maintain that blazing fast performance you so crave. Notice the effect of swapping in this case: increased disk IO latency which resulted in blocking the CPU’s performance or io wait. The fix for this webserver at the time didn’t involve a memory upgrade but instead recovering memory by reconfiguring MySQL, which had been misconfigured in the direction of “larger is always better,” and also by removing a bunch of unused MySQL databases.

Swapping memory to disk = IO bottleneck .

Now, please don’t take away that I’m suggesting to “eliminate” swap completely. :) No. Swap has its place and purpose. Instead, it would be best to find that balance whereby swapping does not get in the way of throughput/performance. There are many tools and services out there for monitoring web server memory usage. Although correct, the previous two graphs may be misleading to those who depend on the “percentage used” in decision-making. Whenever there’s constant swapping, make sure to investigate server health.


Swapping is not always bad!

Opportunistic swapping is helpful to performance! This is when the system has nothing better to do, so it saves cached data that hasn’t been used for a long time to disk swap space. The cool thing about opportunistic swapping is that the server still keeps and serves a copy of the data in physical memory. But if later things get hectic and the server needs that memory for something else, it can remove them without performing additional untimely writes to disk. That’s healthy! That said, if you have tons of RAM, there will be less swapping. For example, the below 64GB server has been up for 60+days with 5GB of free memory (approximately 8%), no swap used:

root@server [~]# free -m
             total       used       free     shared    buffers     cached
Mem:         62589      57007       5582          0       1999      31705
-/+ buffers/cache:      23302      39287
Swap:         1999          0       1999

Here’s what 8% free memory looks like on New Relic:

8% "free" memory - New Relic

…disk IO utilization is beautiful.


Graphing Linux web server memory usage with Munin

Lastly, let’s look at Munin’s open-source monitoring tool. In the graph below, cached memory is actually labeled as “cache” by Munin. In this example – click the image to enlarge – the server is healthy and swapping has no effect on performance because the size of cache and buffers is fairly large enough for the kernel to selectively swap out to disk. The ratio of used memory to cached/buffered memory shows that this server is caching more than 3 times the memory “used.” On web servers, there’s no hard rule for recommended size of the cache. This will vary case by case. That said, 50% or more of your RAM being used by cache is great for performance!

Healthy swap

Yet another example, 30GB webserver no swap. We could probably play with kernel sysctl configuration here…
no disk swap

Remember, unlike Linux desktop, with web servers, a much higher percentage of previously accessed files are repeatedly requested. So while considering cached memory as “free” on a desktop install is ok, that’s not the case with web servers. Let’s keep em’ cached!

As far as server monitoring solutions go, InstrumentalSemaText SPM and DataDog and others all count cached memory as used. Check them out!


Another example of counting cached/buffered memory as free vs. used

Last month, a server suffered MySQL failure due to memory being exhausted, which included all of the available limited swap. Here’s the Munin report of memory usage (and also commit):


Here’s New Relic’s report on memory usage for the same period. It shows swapping but also what would seem like a healthy amount of white space and only about 50% “used”:


The server owner was thus unaware of the extent of the problem. If Linux only used swap when out of memory, this New Relic graph might not have been being as misleading to some, but as we previously discussed above, that is not the case, thus making it difficult for non-admins relying on tools that constantly shows 50% free space. So, Are You Measuring Linux Web Server Memory Usage “Correctly”?

First posted: 2014 / Last updated: Nov 11th, 2018

Tags: , , , ,