Measure Linux web server memory usage correctly

Important update: This article was originally posted back in 2014. However, as I later posted in 2017 in the blog post Does your Linux server need a RAM upgrade? Lets check with free, top, vmstat and sar …there was a Linux kernel change to address this in 2016. Hopefully, this change will motivate New Relic and others to follow suit in how memory usage is reported. There’s now memory marked as available by the Linux kernel: “Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free).”
/proc/meminfo: provide estimated available memory
“Many load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up “free” and “cached”, which was fine ten years ago, but is pretty much guaranteed to be wrong today. It is wrong because Cached includes memory that is not freeable as page cache, for example shared memory segments, tmpfs, and ramfs, and it does not include reclaimable slab memory, which can take up a large fraction of system memory on mostly idle systems with lots of files. Currently, the amount of memory that is available for a new workload, without pushing the system into swap, can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the “low” watermarks from /proc/zoneinfo. However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory. It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place…” – Source.

Does the screenshot below, from New Relic, look familiar to you? Lets say you have a web server with 2GB of RAM or maybe 256GB. Your web apps are running slowly so you check New Relic and/or other server monitoring and APM tools but unfortunately don’t see any red flags. The swap seen above may worry you a little, but you say… “Hey, there’s plenty of free space left right!?” Well, technically yes, but as it relates to Linux web server performance no, absolutely not. Let’s discuss.

Free memory? No.

 

Linux Desktop Memory Usage vs. Linux Web Server Memory Usage

On your Linux based home computer or laptop you may be running Ubuntu, Linux Mint, Debian or maybe Fedora, which is my favorite desktop distro. My laptop’s uptime shows 5 days 11 hours and 2 minutes. I use the standby feature a lot and have been up for 30+ days before. I mention this, because the average user probably avoids keeping their home systems up for that long. After all, they’re not web servers! :) If you do keep them running, then that’s actually better. Here’s why.

Let’s say you have 8GB of installed ram on your laptop and earlier today you used GIMP (image editor), Chrome and LibreOffice. Chances are, unless you perform heavy computing using other applications afterwards, or restart your system, Linux will keep a lot of the required files and paths cached and buffered to RAM (memory). This is very useful because if for some reason you decide to re-edit photos, browse the web again or open a new file in LibreOffice, all of these tasks will open and function noticeably faster the second time around. This is because they were cached (saved temporarily) to memory. Over time you may not use GIMP or LibreOffice for a while and gradually the Linux kernel will replace those cached files with data from your new apps. This is perfectly fine because you don’t need to keep old and unused files from hours, days or even weeks ago stored in memory, especially if you don’t have system memory to spare.

However, Linux servers are different, very different. The same files are requested repeatedly at varying rates throughout the day. Often times files are requested several times per minute, or, many times per second on busier servers. So how often do you want those files and paths removed from cache? With Linux web servers, we want cached (and buffered) data to be kept as long as possible. Long enough that the cache pressure for removing files to make room for newer files isn’t causing the Linux kernel to favor serving files from disk (much slower!) rather than from cached memory. Looking at the above memory usage graph again, most of that white space is “useful” cache and buffers.

Cached memory is defined nicely by Jonathan DePrizio of techthrob.com:
“Cached [memory] is the amount of memory that’s being used to keep copies of actual files in RAM. When there are files that are constantly being written or read, having these in memory reduces the amount of times you need to operate on the physical hard disk; as a result, you can greatly improve performance by using memory for cache.”

With this in mind, we must find the correct memory size for web servers otherwise the kernel will start serving more and more cached data from disk. Disk storage, even SSD (solid state drive) is tremendously slower than RAM!

 

The command ‘free’ will never let you down!

From Linux command line, using the free command or (or free -m or free -h) will often reveal that you are “using” more memory thank you think! See this example below from Red Hat’s docs:

$ free
              total       used        free    shared    buffers    cached
Mem:        4040360    4012200       28160         0     176628   3571348
-/+ buffers/cache:      264224     3776136
Swap:       4200956      12184     4188772

Notice there’s 28160KB “free”. However, below that line, look at how much memory has been consumed by buffers and cache! Linux always tries to use memory first to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). This helps the system to run faster because disk information is already in memory which saves I/O operations. If more space is required, Linux will free up the buffers and cache to yield memory for the applications. If there’s not enough “free” space, then the cache will be saved (swapped) to disk. It would be wise to monitor this, keep swap and cache contention within an acceptable range that does not affect performance. – Source: Red Hat.

Have a look at the screen capture below. This time remember that the white/unshaded area (under “Physical memory”) is largely used by cache and buffers which your web server depends on to maintain that blazing fast performance you so crave. Notice the effect of swapping in this case: increased disk IO latency which resulted in blocking the CPU’s performance or io wait. The fix for this web server at the time didn’t involve a memory upgrade but instead recovering memory by reconfiguring MySQL which had been misconfigured in the direction of “larger is always better” and also by removing a bunch of unused MySQL databases.

Swapping memory to disk = IO bottleneck .

Now, please don’t take away that I’m suggesting to “eliminate” swap completely. :) No. Swap has it’s place and purpose. Instead, you must find that balance whereby swapping does not get in the way of throughput/performance. There are many tools and services out there for monitoring web server memory usage. Although correct, the previous two graphs may be misleading to those who depend on “percentage used” in decision making. Whenever there’s constant swapping, make sure to investigate server-health.

 

Swapping is not always a bad!

Opportunistic swapping is helpful to performance! This is when the system has nothing better to do, so it saves cached data that hasn’t been used for a long time to disk swap space. The cool thing about opportunistic swapping is that the server still keeps and serves a copy of the data in physical memory. But if later things get hectic and the server needs that memory for something else, it can remove them without having to perform additional untimely writes to disk. That’s healthy! That said, if you have tons of RAM there will be less swapping. For example, the below 64GB server has been up for 60+days with 5GB of free memory (approximately 8%), no swap used:

root@server [~]# free -m
             total       used       free     shared    buffers     cached
Mem:         62589      57007       5582          0       1999      31705
-/+ buffers/cache:      23302      39287
Swap:         1999          0       1999

Here’s what 8% free memory looks like on New Relic:

8% "free" memory - New Relic

…disk IO utilization is beautiful.

 

Graphing Linux web server memory usage with Munin

Lastly, let’s look at Munin’s open source monitoring tool. In the graph below, cached memory is actually labeled as “cache” by Munin. In this example – click the image to enlarge – the server is healthy and swapping has no effect on performance because the size of cache and buffers is fairly large enough for the kernel to selectively swap out to disk. The ratio of used memory to cached/buffered memory shows that this server is caching more than 3 times the memory “used”. On web servers there’s no hard rule for recommended size of cache. This will vary case by case. That said, 50% or more of your RAM being used by cache is great for performance!

Healthy swap

Yet another example, 30GB web server no swap. We could probably play with kernel sysctl configuration here…
no disk swap

Remember unlike Linux desktop, with web servers, a much higher percentage of previously accessed files are repeatedly requested. So while considering cached memory as “free” on a desktop install is ok, that’s not the case with web servers. Lets keep em’ cached!

As far as server monitoring solutions go, InstrumentalSemaText SPM and DataDog and others all count cached memory as used. Check them out!

 

Another example of counting cached/buffered memory as free vs used

Last month a server suffered MySQL failure due to memory being exhausted which included all of the limited swap that was available. Here’s the Munin report of memory usage (and also commit):

munin-memory-week_shows_OOM

Here’s New Relic’s report on memory usage for the same period. Shows swapping but also what would seem like a healthy amount of white space and only about 50% “used”:

new_relic-memory-week__free_space

The server owner was thus unaware of the extent of the problem. If Linux only used swap when out of memory this New Relic graph might not have been be as misleading to some, but as we previously discussed above that is not the case thus making it difficult for non-admins relying tools which constantly shows 50% free space. So, Are You Measuring Linux Web Server Memory Usage “Correctly”?

First posted: 2014 / Last updated: Nov 11th, 2018

Tags: , , , , ,