Linux: ncdu and mc to manage large directories

Creating, modifying and deleting files are common tasks performed in any operating system. Even more so by Sysadmins, Developers and Programmers. For the most part, these tasks are fast enough when managing a handful of files. However, on Linux and especially with servers, you may at sometime have to manage millions or even billions of files in a single directory. For example, three weeks ago, I had to delete a directory containing 711,057,408 files because the StackLinux VPS hosting client was unable to delete them. I made a note then, to blog about that issue. However, there are already many blog posts and QA’s with alternative commands for deleting large numbers of files. So although we will touch on rm command alternatives briefly, I want to focus on two command line tools for viewing and managing the Linux file system. They are: mc and ncdu. More on those later.

“cannot execute [Argument list too long]” when using rm

Have you tried to delete the contents of a directory using rm command but receive the following error? cannot execute [Argument list too long]. This limitation occurs when the rm command is used to delete a directory containing a large number of files. In short, the shell fails to invoke the command if the ARG_MAX limit is exceeded.

Check limit with:

getconf ARG_MAX

With Linux 2.6.23+, ARG_MAX is not hardcoded anymore. See the git entry. It is limited to a 1/4-th of the stack size (ulimit -s), which ensures that the program still can run at all.  See also the git diff of fs/exec.c. This limit will also be hit when using cp, ls, mv, etc. So you can use alternative commands that do not hit this limit. For example, find -exec or find -delete (faster!) :

find . -type f -delete

To delete files with only specific extension:

find . -name "*.log" -type f -delete

My favorite method, which is about twice as fast as using the find -delete, is to use rsync. rsync is commonly used for synchronizing files between two different locations, usually remote, but can also be on the same system. What we want to do is sync the target directory (the directory with large # of files), with an empty directory. In my case, the /path/to/var/session/ directory had over seven hundred million files (a symptom of sessions management). So we would first create an empty directory: empty_dir/. (can be named anything).

mkdir empty_dir/

Next, we will use the same -delete option used with find, also with rsync. Example:

rsync -a --delete [empty directory] [target directory]

Or, in the case mentioned in the outset with:

rsync -a --delete /path/to/empty_dir/ /path/to/var/session/

-a or –archive is equivalent to using -rlptgoD. It is a quick way of saying you want recursion and want to preserve almost everything (with -H being a notable omission). The only exception to the above equivalence is when –files-from is specified, in which case -r is not implied. Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H. (see: man rsync)

What would have usually taken hours, completed in about 10 to 15 mins using rsync.

ncdu (NCurses Disk Usage)

ncdu

ncdu

ncdu is a curses-based version of the well-known ‘du’. It provides a fast way to view and manage directories using disk space. Users can navigate using the arrow keys and delete files that are taking up too much space by pressing the ‘d’ key.

To install on Debian or Ubuntu run:

apt install ncdu

On CentOS enable epel repo, then install:

yum install epel-release
yum install ncdu

To delete a directory or file, select and press d. Type ? for list of shortcuts.

mc (Midnight Commander) file manager

mc (Midnight Commander)

A much more powerful and feature-filled alternative would be Midnight Commander (mc). mc is a directory browser/file manager for Unix-like operating systems. Midnight Commander’s features include the ability to view the contents of RPM package files, work with archive formats, and as an FTP client. Midnight Commander includes mcedit a standalone editor called.

To install on Debian or Ubuntu run:

apt install mc

On CentOS:

yum install mc

As per screenshot, press 8 on keyboard to delete. mc and ncdu are some additional command line tools for use with already well known commands such as ls, du and df.

 

Reference:
mc guide (pdf): http://nawaz.org/media/docs/mc/mc.pdf
man mc: https://linux.die.net/man/1/mc
man ncdu: https://linux.die.net/man/1/ncdu
arg_max: https://www.in-ulm.de/~mascheck/various/argmax/

Tags: , , , , ,