Are You Measuring Web Server Memory Usage “Correctly”?

Free memory? No.Does the screen capture above from New Relic look familiar to you? Lets say you have a web server with 2 gigs of RAM or maybe 256 gigs. But your web applications are running slowly so you check New Relic and unfortunately don’t see any red flags. The swap may worry you a little bit but you say… “Hey, there’s plenty of free space left. Right?” Well, technically yes, but as it relates to web server performance no, absolutely not. Let’s discuss.

Linux Desktop Memory Usage vs. Linux Web Server Memory Usage

On your Linux based home computer or laptop you may be running Ubuntu, Mint, Debian, or maybe Arch which is my favorite. My current laptop’s uptime is now 5 days 11 hours and 2 minutes. I use standby a LOT and have been up for 30+ days before. I mention this because the average user probably avoids keeping their home systems up for that long. After all, they’re not web servers! :) If you do keep them running, then that’s actually better for the following points.

Lets say you have 4gb – 8gb of installed ram on your laptop and earlier today you used GIMP (image editor), Chrome and LibreOffice. Chances are, unless you perform heavy computing using other applications afterwards or restart your system, Linux will keep a lot of the required files and paths cached and buffered to RAM (I’ll also refer to as ‘memory’). This is ever so useful because if for some reason you decide to re-edit photos, browse the web again or open a new file in LibreOffice, all of these tasks will open and function faster this time around because of being cached to memory. Over time you may not use GIMP or LibreOffice for a while and gradually the Linux kernel will replace those cached files with data from your new apps. This is perfectly fine because you don’t need old files from hours, days or even weeks stored in memory, especially if you don’t have memory to spare.

However, servers are different, very different. The same files are requested repeatedly at varying rates throughout the day. Some several times per second, per minute, or maybe a big search/indexing every few hours. So how often do you want those files and paths removed from cache? With web servers, we want cached and buffered data to be kept as long as possible. Long enough that the cache pressure of removing files to make room for newer files isn’t causing the kernel to save, then serve too many of these cached files from disk rather than memory just to make room (slow). Looking at the above graph again, most of that white space was “useful” cache and buffers.

Related:  Server Density: Ops are not machines. Introducing HumanOps

Cached memory is defined nicely by Jonathan DePrizio of techthrob.com:
“Cached [memory] is the amount of memory that’s being used to keep copies of actual files in RAM. When there are files that are constantly being written or read, having these in memory reduces the amount of times you need to operate on the physical hard disk; as a result, you can greatly improve performance by using memory for cache.”

With this in mind, we must find the correct memory size for web servers otherwise the kernel will start saving and serving more and more cached data slowly from disk. Disk storage, even SSD (solid state drive) is tremendously slower than memory!

The command ‘free’ will never let you down!

Using the command free or free -m will often reveal that you are “using” more memory thank you think. See this example below from Red Hat’s docs:

$ free
              total       used        free    shared    buffers    cached
Mem:        4040360    4012200       28160         0     176628   3571348
-/+ buffers/cache:      264224     3776136
Swap:       4200956      12184     4188772

Notice there’s 28160kb “free” but below that line look at how much memory has been consumed by buffers and cache! Linux always tries to use memory first to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). This helps the system to run faster because disk information is already in memory which saves I/O operations. If more space is required, Linux will free up the buffers and cache to yield memory for the applications. If there’s not enough “free” space, then the cache will be saved to disk. It would be wise to monitor this, keep swap and cache contention within an acceptable range that does not affect performance. – via Red Hat.

Have a look at the screen capture below. This time remember that the white/unshaded area (under “Physical memory”) is largely used by cache and buffers which your web server depends on to maintain that blazing fast performance you so crave. Notice the effect of swapping in this case: increased disk IO latency which resulted in blocking the CPU’s performance. The fix for this web server at the time didn’t involve a memory upgrade but instead recovering memory by reconfiguring MySQL which had been misconfigured in the direction of “larger is always better” and also removing a bunch of unused MySQL databases (another topic).

Related:  20 Top Server Monitoring & Application Performance Monitoring (APM) Solutions

Swapping memory to disk = IO bottleneck . Now, please don’t take away that I’m suggesting to “eliminate” swap completely. :) No. Swap has it’s place and purpose. Instead, you must find that balance whereby swapping does not get in the way of throughput/performance. There are many tools and services out there for monitoring web server memory usage. Although correct, the previous two graphs may be misleading to those who depend on “percentage used”. Whenever there’s constant swapping, make sure to investigate server-health.

Swaping is not always a bad!

Opportunistic swapping is helpful. This is when the system has nothing better to do, so it saves cached data that hasn’t been used for a very long time to disk swap space. The cool thing about opportunistic swapping is that the server still keeps and serves a copy of the data in physical memory. But if later things get hectic and the server needs that memory for something else, it can remove them without having to perform additional untimely writes to disk. That’s healthy. That said, if you have tons and tons of RAM there will be no swapping by default. For example, this 64GB server that’s been up for 60+days with 5GB of free memory (approximately 8%):

root@server [~]# free -m
             total       used       free     shared    buffers     cached
Mem:         62589      57007       5582          0       1999      31705
-/+ buffers/cache:      23302      39287
Swap:         1999          0       1999

Here’s what 8% free memory looks like on New Relic:

Related:  Avoid This When Tuning MySQL Query Cache for Performance

8% …disk IO utilization is beautiful.

Graphing web server memory usage with Munin

Lastly, let’s look at Munin’s open source monitoring tool. In the graph below, cached memory is actually labeled as “cache” by Munin. In this example – click the image to enlarge – the server is healthy and swapping has no effect on performance because it’s opportunistic swapping. The ratio of used memory to cached/buffered memory shows that this server is caching more than 3 times the memory “used”. On web servers there’s no hard rule as it varies case by case, but 50% or more of your RAM being used by cache is great for performance.

Healthy swap

Yet another example 30GB web server no swap. We could probably play with kernel sysctl configuration here…
no disk swap

Remember unlike Linux desktop installs, with web servers, a much higher percentage of previously accessed files are repeatedly requested. So while considering cached memory as “free” on a desktop install is ok, that’s not the case with web servers. Lets keep em’ cached!

As far as alternatives SemaText SPMDataDogCopperEgg and others all count cached memory as used. Check them out!

 

Update (Jan 9th 2015): Last month a server suffered MySQL failure due to memory being exhausted which included all of the limited swap that was available. Here’s the Munin report of memory usage (and also commit):

munin-memory-week_shows_OOM

Here’s New Relic’s report on memory usage for the same 7 day period. Nice healthy amount of white space and only about 50% “used”.

new_relic-memory-week__free_space

 

The server owner was thus unawares of the problem. Maybe if Linux only used swap when out of memory but as I discussed above that is not the case thus making it difficult for non-admins relying their New Relic subscription to recognize memory problems. If curious, the OOM issue was caused by growing PHP memory usage with over 700 FastCGI processes not closing. The kernel’s OOM killer didn’t close PHP processes but instead killed MySQL.

Are You Measuring Web Server Memory Usage “Correctly”?

Tags: , , , , ,

20 Shares
Tweet14
Share4
+1
Reddit2