Load (computing)

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.

htop displaying a significant computing load (top right: Load average:).

Unix-style load calculation

All Unix and Unix-like systems generate a dimensionless metric of three "load average" numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:

$ uptime
 14:34:03 up 10:43,  4 users,  load average: 0.06, 0.11, 0.09

The w and top commands show the same three load average numbers, as do a range of graphical user interface utilities. In Linux, they can also be accessed by reading the /proc/loadavg file.

An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1. Each process that terminates decrements it by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.[1] This, for example, includes processes blocking due to an NFS server failure or too slow media (e.g., USB 1.x storage devices). Such circumstances can result in an elevated load average which does not reflect an actual increase in CPU use (but still gives an idea of how long users have to wait).

Systems calculate the load average as the exponentially damped/weighted moving average of the load number. The three values of load average refer to the past one, five, and fifteen minutes of system operation.[2]

Mathematically speaking, all three values always average all the system load since the system started up. They all decay exponentially, but they decay at different speeds: they decay exponentially by e after 1, 5, and 15 minutes respectively. Hence, the 1-minute load average consists of 63% (more precisely: 1 - 1/e) of the load from the last minute and 37% (1/e) of the average load since start up, excluding the last minute. For the 5- and 15-minute load averages, the same 63%/37% ratio is computed over 5 minutes and 15 minutes respectively. Therefore, it is not technically accurate that the 1-minute load average only includes the last 60 seconds of activity, as it includes 37% of the activity from the past, but it is correct to state that it includes mostly the last minute.

Interpretation

For single-CPU systems that are CPU bound, one can think of load average as a measure of system utilization during the respective time period. For systems with multiple CPUs, one must divide the load by the number of processors in order to get a comparable measure.

For example, one can interpret a load average of "1.73 0.60 7.98" on a single-CPU system as:

  • during the last minute, the system was overloaded by 73% on average (1.73 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).
  • during the last 5 minutes, the CPU was idling 40% of the time on average.
  • during the last 15 minutes, the system was overloaded 698% on average (7.98 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).

This means that this system (CPU, disk, memory, etc.) could have handled all of the work scheduled for the last minute if it were 1.73 times as fast.

In a system with four CPUs, a load average of 3.73 would indicate that there were, on average, 3.73 processes ready to run, and each one could be scheduled into a CPU.

On modern UNIX systems, the treatment of threading with respect to load averages varies. Some systems treat threads as processes for the purposes of load average calculation: each thread waiting to run will add 1 to the load. However, other systems, especially systems implementing so-called M:N threading, use different strategies such as counting the process exactly once for the purpose of load (regardless of the number of threads), or counting only threads currently exposed by the user-thread scheduler to the kernel, which may depend on the level of concurrency set on the process. Linux appears to count each thread separately as adding 1 to the load.[3]

CPU load vs CPU utilization

The comparative study of different load indices carried out by Ferrari et al.[4] reported that CPU load information based upon the CPU queue length does much better in load balancing compared to CPU utilization. The reason CPU queue length did better is probably because when a host is heavily loaded, its CPU utilization is likely to be close to 100% and it is unable to reflect the exact load level of the utilization. In contrast, CPU queue lengths can directly reflect the amount of load on a CPU. As an example, two systems, one with 3 and the other with 6 processes in the queue, are both very likely to have utilizations close to 100% although they obviously differ.

Reckoning CPU load

On Linux systems, the load-average is not calculated on each clock tick, but driven by a variable value that is based on the HZ frequency setting and tested on each clock tick. This setting defines the kernel clock tick rate in Hertz (times per second), and it defaults to 100 for 10ms ticks. Kernel activities use this number of ticks to time themselves. Specifically, the timer.c::calc_load() function, which calculates the load average, runs every LOAD_FREQ = (5*HZ+1) ticks, or about every five seconds:

unsigned long avenrun[3];

static inline void calc_load(unsigned long ticks)
{
   unsigned long active_tasks; /* fixed-point */
   static int count = LOAD_FREQ;

   count -= ticks;
   if (count < 0) {
      count += LOAD_FREQ;
      active_tasks = count_active_tasks();
      CALC_LOAD(avenrun[0], EXP_1, active_tasks);
      CALC_LOAD(avenrun[1], EXP_5, active_tasks);
      CALC_LOAD(avenrun[2], EXP_15, active_tasks);
   }
}

The avenrun array contains 1-minute, 5-minute and 15-minute average. The CALC_LOAD macro and its associated values are defined in sched.h:

#define FSHIFT   11		/* nr of bits of precision */
#define FIXED_1  (1<<FSHIFT)	/* 1.0 as fixed-point */
#define LOAD_FREQ (5*HZ+1)	/* 5 sec intervals */
#define EXP_1  1884		/* 1/exp(5sec/1min) as fixed-point */
#define EXP_5  2014		/* 1/exp(5sec/5min) */
#define EXP_15 2037		/* 1/exp(5sec/15min) */

#define CALC_LOAD(load,exp,n) \
   load *= exp; \
   load += n*(FIXED_1-exp); \
   load >>= FSHIFT;

The "sampled" calculation of load averages is a somewhat common behavior; FreeBSD, too, only refreshes the value every five seconds. The interval is usually taken to not be exact so that they do not collect processes that are scheduled to fire at a certain moment.[5]

A post on the Linux mailing list considers its +1 tick insufficient to avoid Moire artifacts from such collection, and suggests an interval of 4.61 seconds instead.[6] This change is common among Android system kernels, although the exact expression used assumes an HZ of 100.[7]

Other system performance commands

Other commands for assessing system performance include:

  • uptime  the system reliability and load average
  • top  for an overall system view
  • vmstat  vmstat reports information about runnable or blocked processes, memory, paging, block I/O, traps, and CPU.
  • htop  interactive process viewer
  • dstat  helps correlate all existing resource data for processes, memory, paging, block I/O, traps, and CPU activity.
  • iftop  interactive network traffic viewer per interface
  • nethogs  interactive network traffic viewer per process
  • iotop  interactive I/O viewer[8]
  • iostat  for storage I/O statistics
  • netstat  for network statistics
  • mpstat  for CPU statistics
  • tload  load average graph for terminal
  • xload  load average graph for X
  • /proc/loadavg  text file containing load average

See also

  • Brendan Gregg (8 August 2017). "Linux Load Averages: Solving the Mystery". Retrieved 22 January 2018.
  • Neil J. Gunther. "UNIX Load Average  Part 1: How It Works" (PDF). TeamQuest. Retrieved 12 August 2009.
  • Andre Lewis (31 July 2009). "Understanding Linux CPU Load  when should you be worried?". Retrieved 21 July 2011. Explanation using an illustrated traffic analogy.
  • Ray Walker (1 December 2006). "Examining Load Average". Linux Journal. Retrieved 21 July 2011.
  • Karsten Becker. "Linux OSS load monitoring toolset". LoadAvg.

References

  1. http://linuxtechsupport.blogspot.com/2008/10/what-exactly-is-load-average.html
  2. Walker, Ray (1 December 2006). "Examining Load Average". Linux Journal. Retrieved 13 March 2012.
  3. See http://serverfault.com/a/524818/27813
  4. Ferrari, Domenico; and Zhou, Songnian; "An Empirical Investigation of Load Indices For Load Balancing Applications", Proceedings of Performance '87, the 12th International Symposium on Computer Performance Modeling, Measurement, and Evaluation, North Holland Publishers, Amsterdam, The Netherlands, 1988, pp. 515–528
  5. "How is load average calculated on FreeBSD?". Unix & Linux Stack Exchange.
  6. Ripke, Klaus (2011). "Linux-Kernel Archive: LOAD_FREQ (4*HZ+61) avoids loadavg Moire". lkml.iu.edu. graph & patch
  7. "Patch kernel with the 4.61s load thing · Issue #2109 · AOSC-Dev/aosc-os-abbs". GitHub.
  8. http://man7.org/linux/man-pages/man8/iotop.8.html
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.