Improve responsiveness with cgroups

From Gentoo Linux Wiki
Jump to: navigation, search

The cgroup subsystem (short for "control groups") is a kernel feature which is used by various other subsystems in order to apply certain policies to groups of processes. For example, the Completely Fair Scheduler CFS (CPU scheduler) and the IO scheduler CFQ can use these to schedule more "fairly" among groups of processes so that one group cannot "starve" another group from all resources. Other use cases are bandwidth guarantees for disk and network I/O. More information on these use cases can be found here: Manage your performance with cgroups and projects

The setup presented in this article will use cgroups so that different interactive shells (and all processes created from the shells) end up in different cgroups. This helps isolate resource intensive processes started from these shells (such as "emerge" or "make") from other shells, services and graphical applications. It is based on the setup proposed here: Alternative to 200 lines kernel patch

Contents

[edit] Kernel setup

In order to use cgroups for CPU scheduling, you have to enable the following options:

Linux Kernel Configuration: Group CPU scheduling
General setup  --->
	[*] Control Group support  --->
		[*] Group CPU scheduler  --->
		[*] Group scheduling for SCHED_OTHER

If you also want to use it for disk I/O scheduling, enable the following options, too:

Linux Kernel Configuration: Group I/O scheduling
[*] Enable the block layer  --->
	-*-   Block cgroup support
		IO Schedulers  --->
			<*> CFQ I/O scheduler
				[*] CFQ Group Scheduling support
		Default I/O scheduler (CFQ)  --->
			(X) CFQ
			( ) No-op

[edit] The cgroup filesystem

cgroups are created and controlled using a virtual filesystem (similar to /dev, /proc and /sys). The filesystem is not mounted automatically. You have to do it manually.

Since cgroup support is a very new feature for the kernel, there is no consensus where the cgroup filesystem is to be mounted. Depending on your kernel version, there might already be an empty directory at /sys/fs/cgroup which is supposed to be used. If it does not exist, you have to decide on a suitable mount point on your own.

[edit] Preparing the filesystem

The following script checks for the presence of /sys/fs/cgroup and if not found creates a mount point at /dev/cgroup. The script then mounts the cgroup filesystem and prepares it so that common users can use it.

File: /usr/local/sbin/cgroup_start
#!/bin/sh
if [ -d /sys/fs/cgroup ] ; then
	cdir=/sys/fs/cgroup
else
	cdir=/dev/cgroup
	mkdir $cdir
fi

kern_version=`/usr/bin/uname -r|/usr/bin/cut -d\- -f1|/usr/bin/tr -d '.'`
if [ $kern_version -lt 2638 ] ; then
	mount -t cgroup cgroup $cdir -o cpu
else
	mount -t cgroup cgroup $cdir -o cpu,blkio
fi

mkdir -m 0777 $cdir/user
/bin/echo '/usr/local/sbin/cgroup_clean' > $cdir/release_agent

This script must be executed at every system start by root. Add it to /etc/conf.d/local.start (baselayout1) or /etc/conf.d/local (baselayout2) :

File: /etc/conf.d/local.start or /etc/conf.d/local
/usr/local/sbin/cgroup_start

Or copy it as /etc/local.d/cgroup.start

Now create a second script at /usr/local/sbin/cgroup_clean:

File: /usr/local/sbin/cgroup_clean
#!/bin/sh
if [ -d /sys/fs/cgroup ] ; then
	cdir=/sys/fs/cgroup
else
	cdir=/dev/cgroup
fi

rmdir $cdir/$*

The first script instructs the kernel to call the second script when the cgroup is empty because the last process in it has terminated. It will then delete the cgroup.

Warning: Both scripts are executed by the root user. Make sure that no common user can alter these files!
chown root:root /usr/local/sbin/cgroup_start /usr/local/sbin/cgroup_clean chmod 700 /usr/local/sbin/cgroup_start /usr/local/sbin/cgroup_clean
Note: Every script presented here uses /bin/echo instead of the shell built-in echo. The reason for this is that the shell built-in does not check against write errors. The cgroup subsystem uses these in order to report errors back to the user. Therefore if you use the shell built-in, you cannot tell whether your command succeeded or not.

[edit] Use cgroups automatically

In order to use the cgroup filesystem in the way that was outlined above, every interactive shell has to create its own cgroup. This is done by adding the following code to your ~/.bashrc file:

File: ~/.bashrc
if [ "$PS1" ] ; then
	if [ -d /sys/fs/cgroup ] ; then
		cdir=/sys/fs/cgroup
	else
		cdir=/dev/cgroup
	fi
	mkdir -p -m 0700 $cdir/user/$$ > /dev/null 2>&1
	/bin/echo $$ > $cdir/user/$$/tasks
	/bin/echo '1' > $cdir/user/$$/notify_on_release
	unset -v cdir
fi

You enable this for all users by adding it to /etc/bashrc. A more modular alternative is to create the directory like /etc/bash/local and move the code into a dedicated file like /etc/bash/local/cgrouprc. Then every user who decides to use cgroups can include this code into her ~/.bashrc file by adding the following line:

File: ~/.bashrc
source /etc/bash/local/cgrouprc

This is particularly useful if you want to create several such bashrc "modules".

You can also enable it per default for new users by editing /etc/skel/.bashrc accordingly.

[edit] Monitoring cgroups

Find out the pid of the tty-cgroup whose tasks you like to watch:

echo $$

Open another terminal (or screen window) and change directory to the cgroup mount(assuming /sys/fs/cgroup here):

cd /sys/fs/cgroup/user/<pid from above command>

And run:

watch -n.1 'cat tasks'

Start a -j4 make job (or something similar) in the first tty.

[edit] Drawbacks

  • The above solution only works when resource intensive tasks are started from a shell. It does not help when the application is started from a window manager (terminals within a window manager work, though).
  • There is a slight scheduling overhead associated with cgroups. If throughput is more important than interactivity and responsiveness, do not enable it.
  • The above solution only works for bash shell users. You have to provide similar scripts for users of other shells like tcsh or ksh.

[edit] Extend for disk I/O scheduling

Currently, the solution presented above only supports the cpu subsystem. In theory, the same solution is applicable to the disk I/O scheduler. However, that subsystem does not support hierarchies. This will change in kernel 2.6.38 when the following patch is applied: [blk-cgroup: Allow creation of hierarchical cgroups]

The script above attempts to handle this automatically by checking the kernel version and altering the mount command accordingly.

Personal tools