This is a very brief user guide. A more extensive one will
be available soon.
RunQ is a computer performance management tool. It is designed to provide IT professionals with a simple tool to keep track of the usage of their computer resources. It can be used to keep track of daily performance data and/or to do specific performance or capacity studies.
More information on how to use RunQ for specific tasks can be found in a not yet published document.
Currently it is only available on Sparc Solaris 2.6 and Intel Linux 2.4 platform. Future versions are in development.
RunQ uses a two phase approach. First you have the data gathering
phase, where performance metrics are collected and stored in a binary file
for further processing. Secondly you have the data processing phase
where the collected metrics are analyzed and presented. The data
processing output can then be used directly or stored into a Performance
Database for historical follow-up.
This version of RunQ is command line driven. It is our believe that to be a good performance management tool, you don't need a good looking GUI, but rather a compact, quick and functional utility. But the output of RunQ can be used directly by tools who can read comma separated value files (CSV) like for example spreadsheets. By this means you can produce any good looking graph you would like to see. Also once the data is stored into a database, you can use a lot of third party tools to explore the data.
RunQ is driven by a single executable for both data gathering and
data analyzing. This reduces the overall disk space needed for the
tool. A trimmed down version for collection is also available for
sites, which wants so.
To invoke the data collector you need to start RunQ as follows:
$ runq collect -m number_of_minutues [-d datafile] [-S span time] [-N]or$ rqcollect -m number_of_minutues [-d datafile] [-S span time] [-N]
Option | Default value | Type and format | Description |
-m | No default | integer | Number of minutes the data collection process has to to run. |
-d | perf.dat | string | The name of file where the performance metrics of this collection will be stored. |
-S | 60 | integer | The number of seconds before a sample record would be written to the performance data file. |
-N | No | - | Keeps collecting data without yielding the CPU.
User this option with care, as in this mode, the collector hogs the systeml and consumes all CPU cycles it can get. |
To have brief report on global system usage during the data collection,
you can invoke RunQ with the report option.
RunQ will show one line for each sample found the the performance data
file.
$ runq report [-d datafile] [-s start time] [-e end time] [-C] [-S]
Option | Default Value | Type and format | Description |
-d | perf.dat | string | The name of the file where the performance metrics are stored. |
-C | No | - | Produce output in CSV format |
-S | No | - | Report only the summary for the interval in CSV mode. |
-s | 00:00 | HH:MM | Start time stamp for the report |
-e | 24:00 | HH:MM | End time stamp for the report (that time not included) |
You can list all processes which were present on the system during
the data collection period.
There will be one line for each process record for each sample.
The data show in the listing are raw data and most of them are cumulative.
This means that to know the resource consumption for user cpu for a single
process you need to calculate the delta between two samples.
$ runq procs [-d datafile] [-s start time] [-e end time]
Option | Default Value | Type and format | Description |
-d | perf.dat | string | The name of the file where the performance metrics are stored. |
-T | No | - | Output date and Time in DD-MM-YY ; HH:MM:SS format. |
-D | No | - | Output time stamps as delta seconds s from the start. This option is not used if -T is also given. |
-s | 00:00 | HH:MM | Start time stamp for the report |
-e | 24:00 | HH:MM | End time stamp for the report (that time not included) |
One of the key ideas behind RunQ is to analyze gathered performance
metrics.
The purpose of analyzing the data is to reduce the amount into
key metric data and format it into an user understandable way. The
outcome of the analyze phase is a system model describing the system at
a certain point of time. Also once a system model has been build
you can apply mathematics to that model to answer many questions capacity
planners mostly ask.
Currently RunQ does a basic job on analyzing and modeling the system. RunQ only reports CPU usage and the derivatives of it like service time and waiting time (which gives the response time). The waiting time is calculated according to the M/M/m queuing formula.
The data reduction is done by grouping processes into groups and report the performance metrics against those groups. The purpose of the groups is also to group system processes into a single business unit. Also it is not uncommon that the execution of task on a Unix system, for example, involves many little processes. When using conventional tools like top or ps, you don't see the aggregate performance of all those little processes.
During this analysis phase RunQ verifies the global CPU usage and
the per process CPU usage. Due to the sampling technology RunQ misses
some of the short lived processes. Short lived processes are processes
who are created and died between two samples. In certain situations
(like compiles of a lot of small sources on a powerful machine) this part
can be a major part of the processes. RunQ has an option to activate
an algorithm to interpolate this data. This algorithm is current
very beta but helped already a lot. Also when running the collector
over longer periods of time this unaccounted data tend to become a smaller
portion of the activity.
$ runq analyze [-d datafile] |-w workload_definition_file] [-s start time] [-e end time] [-C] [-F]
Option | Default Value | Type and format | Description |
-d | perf.dat | string | The name of the file where the performance metrics are stored. |
-w | workloads.wkl | string | The name of the workload definition file. |
-C | No | - | Produce output in CSV format |
-s | 00:00 | HH:MM | Start time stamp for the report |
-e | 24:00 | HH:MM | End time stamp for the report (that time not included) |
-F | No | - | Use a interpolation algorithm to fix unaccounted CPU usage. This option is very beta but can be useful, as it is able to recover most of unaccounted CPU cycles. |
The workload definition files follow a very simple syntax.
For defining the process names RunQ uses the regexp API of Unix.
By using "man regexp" or "man -s3 regexp" you can find more information
on
using wild cards, repeaters, etc...
RunQ always prepend the expression with an "^" and appends a "$".
This is to enforce complete matching of the given expression.
The evaluation of the matching rules is done in order of the coding
sequence in the source file. When a match has been found by the include
rule or by PPID (in case of the with children clause), the process name
is checked against the exclude rule. If a exclude matches, RunQ doesn't
use that process group and carries on with next process group.
If the optional argument is given in the regular expression, that argument is also taken into account.
Arguments are given by using a plus sign "+" followed by a regular expression.
To even have a more precise selection a username and a group name can be specified.
This is done by appending a colon ":" followed by a regular expression for the user name, which
is then followed by an optional plys sign "+" and regular expression for the group name.
Also keep in mind that RunQ will check the child parent relationship
before starting matching the expressions, but if there is a matching
parent child relation ship it will always check the exclude list
to ensure that the match may be used.
Below you find the structure of a definition file. Mind that [with
children] means that the "with children clause" is optional and when
used it has to be written without the brackets.
workload wkl-id
{
processgroup pg-id [with children]
include
{
"regular expression" [ + "regular expression" ] [ : "regular expression" [ + "regular expression" ] ]
...
}
exclude
{
"regular expression" [ + "regular expression" ] [ : "regular expression" [ + "regular expression" ] ]
}
....
}
....
Be sure to define a ending process group with the ".*" wild
card as catch-all!
workload Development
{
processgroup compile with children
include
{
"gcc"
"g'++'"
"cpp"
}
processgroup tools
include
{
"make"
"vi"
}
}workload Office
{
processgroup StarOffice with children
include
{
"soffice.*"
}
processgroup NetScape with children
include
{
"netscape.*"
}
}workload System
{
processgroup runq
include
{
"runq"
}
processgroup KDE
include
{
"X.*"
"xfs"
"k.*"
}
exclude
{
"k.*d"
}
processgroup Postgres with children
include
{
"postmaster"
}
processgroup Network
include
{
"inetd"
"portmap"
"netserv"
".*ppp.*"
".*ftp.*"
}
processgroup System
include
{
"init"
".*logd"
"cardmgr"
"autom.*"
"lpd"
"cron"
"getty"
"gpm"
"sendmail"
"exim"
".*pkg.*"
"modprobe"
"rmmod"
"get_it"
}
processgroup OtherRoots
include
{
".*" : "root"
}
processgroup WildGroup
include
{
".*"
}
}