Copyright © 2001, 2002 by Douglas Gilbert
Revision History | ||
---|---|---|
Revision 1.2 | 2002-05-03 | Revised by: dpg |
ENOMEM, EPERM; DRIVER_SENSE->CHECK_CONDITION | ||
Revision 1.1 | 2002-01-26 | Revised by: dpg |
corrections, host_status, odd dxfer_len | ||
Revision 1.0 | 2001-12-21 | Revised by: dpg |
original, displace SCSI-PROGRAMMING-HOWTO |
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
For an online copy of the license see www.fsf.org/copyleft/fdl.html.
This is the third major version of the sg driver. A summary of the sg driver history is as follows:
sg version 1 (original) from 1992 to early 1999 (lk 2.2.5) . A copy of the original HOWTO (in plain text) is at www.torque.net/sg/p/original/SCSI-Programming-HOWTO.txt
sg version 2 from lk 2.2.6 in the 2.2 series. Its documentation is available in abridged form [www.torque.net/sg/p/scsi-generic.txt] and a longer form [www.torque.net/sg/p/scsi-generic_long.txt].
sg version 3 in the linux kernel 2.4 series.
A more general description of the Linux SCSI subsystem of which sg is a part can be found in the SCSI-2.4-HOWTO.
This document was last modified on 3rd May 2002.
The sg driver permits user applications to send SCSI commands to devices that understand them. SCSI commands are 6, 10, 12 or 16 bytes long [1]. The SCSI disk driver (sd), once device initialization is complete, only sends SCSI READ and WRITE commands. There a several other interesting things one might want to do, for example, perform a low level format or turn on write caching.
Associated with some SCSI commands there is data to be written to the device. A SCSI WRITE command is one obvious example. When instructed, the sg driver arranges for data to be transferred to the device along with the SCSI command. It is possible that the lower level driver (often known as the "Host Bus Adapter" [HBA] or simply "adapter" driver) is unable to send the command to the device. An example of this occurs when the device does not respond in which case a 'host_status' or 'driver-status' error will be conveyed back to the user application.
All going well the SCSI command (and optionally some data) are conveyed to the device. The device will respond with a single byte value called the 'scsi_status'. GOOD is the scsi status indicating everything has gone well. The most common other status is CHECK CONDITION. In this latter case, the SCSI mid level issues a REQUEST SENSE SCSI command The response of the REQUEST SENSE is 18 bytes or more in length and is called the "sense buffer". It will indicate why the original command may not have been executed. It is important to realize that a CHECK CONDITION may vary in severity from informative (e.g. command needed to be retried before succeeding) to fatal (e.g. "medium error" which often indicates it is time to replace the disk).
So in all cases a user application should check the various status values. If necessary the "sense buffer" will be copied back to the user application. SCSI commands like READ convey data back to the user application (if they succeed). The sg driver arranges for this data transfer from the device to the user space, if necessary.
The description so far has concentrated on a disk device, but in reality the sg driver is not needed very often for disks because there already is a purpose built device driver for that: sd. The same is true of reading audio and data CDs (sr [scd]) and tapes (st). However scanners that understand the SCSI command set and CDR "burning" programs tend to use the sg driver. Other applications include tape "robots" and music CD "ripping".
To find out more about SCSI (draft) standards and resources visit www.t10.org. To use the sg device driver you should be familiar with the SCSI commands supported by the device that you wish to control. Getting hold of such information for devices like scanners can be quite challenging (if the vendor does not provide it).
The first SCSI command sent to a SCSI device when it is initialized is an INQUIRY. All SCSI devices should respond promptly to an INQUIRY supplying information such as the vendor, product designation and revision. Appendix C shows the sg driver being used to send an INQUIRY and print out some of the information in the response.
Earlier versions of the sg device driver either have no version number (e.g. the original driver) or a version number starting with "2". The drivers that support this new interface have a major version number of "3". The sg version numbers are of the form "x.y.z" and the single number given by the SG_GET_VERSION_NUM ioctl() is calculated by (x * 10000 + y * 100 + z). The sg driver discussed here will yield a number greater than or equal to 30000 from SG_GET_VERSION_NUM. The version number can also be seen using cat /proc/scsi/sg/version in the new driver. This document describes sg version 3.1.24 for the lk 2.4 series. Where some facility has been added during the lk 2.4 series (e.g. mmap-ed IO) and hence is not available in all versions of the lk 2.4 series, this is noted. [2]
Here is a list of sg versions that have appeared to date during the lk 2.4 series.
lk 2.4.0 : sg version 3.1.17
lk 2.4.7 : sg version 3.1.19 [see include/scsi/sg.h in that or a later version for the changelog]
lk 2.4.10 : sg version 3.1.20 [This version had several changes put into it by third parties over the next 6 release kernel versions.]
lk 2.4.17 : sg version 3.1.22
lk 2.4.19 : sg version 3.1.24 [lk 2.4.19 hasn't been released at the time of writing. It will most likely contains sg version 3.1.24 .]
A user application accesses the sg driver by using the open() system call on sg device file name. Each sg device file name corresponds to one (potentially) attached SCSI device. These are usually found in the /dev directory. Here are some sg device file names:
$ ls -l /dev/sg[01] crw-rw---- 1 root disk 21, 0 Aug 30 16:30 /dev/sg0 crw-rw---- 1 root disk 21, 1 Aug 30 16:30 /dev/sg1 |
$ cd /dev/scsi/host1/bus0/target0/lun0 $ ls -l generic crw-r----- 1 root root 21, 1 Dec 31 1969 generic |
A significant addition in sg v3 is an ioctl() called SG_IO which is functionally equivalent to a write() followed by a blocking read(). In certain contexts the write()/read() combination have advantages over SG_IO (e.g. command queuing) and continue to be supported.
The existing (and original) sg interface based on the sg_header structure is still available using a write()/read() sequence as before. The SG_IO ioctl will only accept the new interface based on the sg_io_hdr_t structure.
The sg v3 driver thus has a write() call that can accept either the older sg_header structure or the new sg_io_hdr_t structure. The write() calls decides which interface is being used based on the second integer position of the passed header (i.e. sg_header::reply_len or sg_io_hdr_t::dxfer_direction). If it is a positive number then the old interface is assumed. If it is a negative number then the new interface is assumed. The direction constants placed in 'dxfer_direction' in the new interface have been chosen to have negative values.
If a request is sent to a write() with the sg_io_hdr_t interface then the corresponding read() that fetches the response must also use the sg_io_hdr_t interface. The same rule applies to the sg_header interface.
This document concentrates on the sg_io_hdr_t interface introduced in the sg version 3 driver. For the definition of the older sg_header interface see the sg version 2 documentation. A brief description is given in Appendix B.
The path of a request through the sg driver can be broken into 3 distinct stages:
For more information about normal (or indirect), direct and mmap-ed IO see Chapter 9 .
Currently the sg driver uses one Linux major device number (char 21) which in the lk 2.4 series limits it to handling 256 SCSI devices. Any attempt to attach more than this number will rejected with a message being sent to the console and the log file. [3]
typedef struct sg_io_hdr { int interface_id; /* [i] 'S' (required) */ int dxfer_direction; /* [i] */ unsigned char cmd_len; /* [i] */ unsigned char mx_sb_len; /* [i] */ unsigned short iovec_count; /* [i] */ unsigned int dxfer_len; /* [i] */ void * dxferp; /* [i], [*io] */ unsigned char * cmdp; /* [i], [*i] */ unsigned char * sbp; /* [i], [*o] */ unsigned int timeout; /* [i] unit: millisecs */ unsigned int flags; /* [i] */ int pack_id; /* [i->o] */ void * usr_ptr; /* [i->o] */ unsigned char status; /* [o] */ unsigned char masked_status;/* [o] */ unsigned char msg_status; /* [o] */ unsigned char sb_len_wr; /* [o] */ unsigned short host_status; /* [o] */ unsigned short driver_status;/* [o] */ int resid; /* [o] */ unsigned int duration; /* [o] */ unsigned int info; /* [o] */ } sg_io_hdr_t; /* 64 bytes long (on i386) */ |
The type of dxfer_direction is int. This is required to be one of the following:
The value SG_DXFER_TO_FROM_DEV is only relevant to indirect IO (otherwise it is treated like SG_DXFER_FROM_DEV). Data is moved from the user space to the kernel buffers. The command is then performed and most likely a READ-like command transfers data from the device into the kernel buffers. Finally the kernel buffers are copied back into the user space. This technique allows application writers to initialize the buffer and perhaps deduce the number of bytes actually read from the device (i.e. detect underrun). This is better done by using 'resid' if it is supported.
The value SG_DXFER_UNKNOWN is for those (rare) situations where the data direction is not known. It may be useful for backward compatibility of existing applications when the relevant direction information is not available in the sg interface layer. There is a (minor) performance "hit" associated with choosing this option (e.g. on the PCI bus). Some recent pseudo device drivers (e.g. USB mass storage) may have problems handling this value (especially on vendor-specific SCSI commands).
N.B. 'dxfer_direction' must have one of the five indicated values and cannot be uninitialized or zero.
If 'dxfer_len' is zero then all values are treated like SG_DXFER_NONE.
This is the length in bytes of the SCSI command that 'cmdp' points to. As a SCSI command is expected an EMSGSIZE error number is produced if the value is less than 6 or greater than 16. Further, if the SCSI mid level has a further limit then EMSGSIZE is produced in this case as well. [4] The type of cmd_len is unsigned char.
typedef struct sg_iovec { void * iov_base; /* starting address */ size_t iov_len; /* length in bytes */ } sg_iovec_t; |
This is the number of bytes to be moved in the data transfer associated with the command. The direction of the transfer is indicated by 'dxfer_direction'. If 'dxfer_len' is zero then no data transfer takes place. [5]
If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of iov_len lengths. If not, the minimum of the two is the transfer length. The type of dxfer_len is unsigned int.
a SCSI device reset is attempted
a SCSI bus reset is attempted. Note this may have an adverse effect on other devices sharing that SCSI bus.
a SCSI host (bus adapter) reset is attempted. This is an attempt to re-initialize the adapter card associated with the SCSI device that has the timed out command.
The two error statuses containing the word "TIME(_)OUT" are typically _not_ related to a command timing out. DID_TIME_OUT in the 'host_status' usually means an (unexpected) device selection timeout. DRIVER_TIMEOUT in the 'driver_status' byte means the SCSI adapter is unable to control the devices on its SCSI bus (and has given up).
The type of timeout is unsigned int (and it represents milliseconds).
These are single or multi-bit values that can be "or-ed" together:
This is the SCSI status byte as defined by the SCSI standard. Note that it can have vendor information set in bits 0, 6 and 7 (although this is uncommon). Further note that this 'status' data does _not_ match the definitions in <scsi/scsi.h> (e.g. CHECK_CONDITION). The following 'masked_status' does match those definitions. [7] The type of status is unsigned char .
Note that SCSI 3 defines some additional status codes. [8] The type of masked_status is unsigned char .
These codes potentially come from the firmware on a host adapter or from one of several hosts that an adapter driver controls. The 'host_status' field has the following values whose #defines mimic those which are only visible within the kernel (with the "SG_ERR_" removed from the front of each define). A copy of these defines can be found in sg_err.h (see Appendix A):
SG_ERR_DID_OK [0x00] NO error
SG_ERR_DID_NO_CONNECT [0x01] Couldn't connect before timeout period
SG_ERR_DID_BUS_BUSY [0x02] BUS stayed busy through time out period
SG_ERR_DID_TIME_OUT [0x03] TIMED OUT for other reason (often this an unexpected device selection timeout)
SG_ERR_DID_BAD_TARGET [0x04] BAD target, device not responding?
SG_ERR_DID_ABORT [0x05] Told to abort for some other reason. From lk 2.4.15 the SCSI subsystem supports 16 byte commands however few adapter drivers do. Those HBA drivers that don't support 16 byte commands will yield this error code if a 16 byte command is passed to a SCSI device they control.
SG_ERR_DID_PARITY [0x06] Parity error. Older SCSI parallel buses have a parity bit for error detection. This probably indicates a cable or termination problem.
SG_ERR_DID_ERROR [0x07] Internal error detected in the host adapter. This may not be fatal (and the command may have succeeded). The aic7xxx and sym53c8xx adapter drivers sometimes report this for data underruns or overruns. [9]
SG_ERR_DID_RESET [0x08] The SCSI bus (or this device) has been reset. Any SCSI device on a SCSI bus is capable of instigating a reset.
SG_ERR_DID_BAD_INTR [0x09] Got an interrupt we weren't expecting
SG_ERR_DID_PASSTHROUGH [0x0a] Force command past mid-layer
SG_ERR_DID_SOFT_ERROR [0x0b] The low level driver wants a retry
This is the residual count from the data transfer. It is 'dxfer_len' less the number of bytes actually transferred. In practice it only reports underruns (i.e. positive number) as data overruns should never happen. This value will be zero if there was no underrun or the SCSI adapter doesn't support this feature. [10] The type of resid is int .
System calls that can be used on sg devices are discussed in this chapter. The ioctl() system call is discussed in the following chapter [ see Chapter 8 ].
Successfully opening a sg device file name (e.g. /dev/sg0) establishes a link between a file descriptor and an attached SCSI device. The sg driver maintains state information and resources at both the SCSI device (e.g. exclusive lock) and the file descriptor (e.g. reserved buffer) levels.
A SCSI device can be detached while an application has a sg file descriptor open. An example of this is a "hotplug" device such as a USB mass storage device that has just been unplugged. Most subsequent system calls that attempt to access the detached SCSI device will yield ENODEV. The close() call will complete silently while the poll() call will "or" in POLLHUP to its result. A subsequent attempt to open() that device name will yield ENODEV.
open(const char * filename, int flags). The filename should be a sg device file name as discussed in the Chapter 4. Flags can be a number of the following or-ed together:
O_RDONLY restricts operations to read()s and ioctl()s (i.e. can't use write() ).
O_RDWR permits all system calls to be executed.
O_EXCL waits for other opens on the associated SCSI device to be closed before proceeding. If O_NONBLOCK is set then yields EBUSY when someone else has the SCSI device open. The combination of O_RDONLY and O_EXCL is disallowed.
O_NONBLOCK Sets non-blocking mode. Calls that would otherwise block yield EAGAIN (e.g. read() ) or EBUSY (e.g. open() ). This flag is ignored by ioctl(SG_IO) .
Note that multiple file descriptors may be open to the same SCSI device. [This is a way of side stepping the SG_MAX_QUEUE limit.] At the sg level separate state information is maintained. This means that even if multiple file descriptors are open to a single SCSI device their write() read() sequences are essentially independent.
Open() calls may be blocked due to exclusive locks (i.e. O_EXCL). An exclusive lock applies to a single SCSI device and only to sg's use of that device (i.e. it has no effect on access via sd, sr or st to that device). If the O_NONBLOCK flag is used then open() calls that would have otherwise blocked, yield EBUSY. Applications that scan sg devices trying to determine their identity (e.g. whether one is a scanner) should use the O_NONBLOCK flag otherwise they run the risk of blocking.
The driver will attempt to reserve SG_DEF_RESERVED_SIZE bytes (32KBytes in the current sg.h) on open(). The size of this reserved buffer can subsequently be modified with the SG_SET_RESERVED_SIZE ioctl(). In both cases these are requests subject to various dynamic constraints. The actual amount of memory obtained can be found by the SG_GET_RESERVED_SIZE ioctl(). The reserved buffer will be used if:
it is not already in use (e.g. when command queuing is in use)
a write() or ioctl(SG_IO) requests a data transfer size that is less than or equal to the reserved buffer size.
Returns a file descriptor if >= 0 , otherwise -1 implies an error.
write(int sg_fd, const void * buffer, size_t count). The action of write() with a control block based on struct sg_header is discussed in the earlier document: www.torque.net/sg/p/scsi-generic.txt (i.e the sg version 2 documentation). This section describes the action of write() when it is given a control block based on struct sg_io_hdr.
The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the write() call succeeds then the 'count' is returned as the result.
Up to SG_MAX_QUEUE (16) write()s can be queued up before any finished requests are completed by read(). An attempt to queue more than that will result in an EDOM error. [11] The write() command should return more or less immediately. [12]
The version 2 sg driver defaulted the maximum queue length to 1 (and made available the SG_SET_COMMAND_Q ioctl() to switch it to SG_MAX_QUEUE). So for backward compatibility a file descriptor that only receives sg_header structures in its write() will have a default "max" queue length of 1. As soon as a sg_io_hdr_t structure is seen by a write() then the maximum queue length is switched to SG_MAX_QUEUE on that file descriptor.
The "const" on the 'buffer' pointer is respected by the sg driver. Data is read in from the sg_io_hdr object that is pointed to. Significantly this is when the 'sbp' and the 'dxferp' are recorded internally (i.e. not from the sg_io_hdr object given to the corresponding read() ).
read(int sg_fd, void * buffer, size_t count). The action of read() with a control block based on struct sg_header is discussed in the earlier document: www.torque.net/sg/p/scsi-generic.txt (i.e. the sg version 2 documentation). This section describes the action of read() when it is given a control block based on struct sg_io_hdr.
The 'buffer' should point to an object of type sg_io_hdr_t and 'count' should be sizeof(sg_io_hdr_t) [it can be larger but the excess is ignored]. If the read() call succeeds then the 'count' is returned as the result.
By default, read() will return the oldest completed request that is queued up. A read() will not interfere with any request associated with the SG_IO ioctl() on this file descriptor except in a special case when a SG_IO ioctl() is interrupted by a signal.
If the SG_SET_FORCE_PACK_ID,1 ioctl() is active then read() will attempt to fetch the packet whose pack_id (given earlier to write()) matches the sg_io_hdr_t::pack_id given to this read(). If not available it will either wait or yield EAGAIN. As a special case, -1 in sg_io_hdr_t::pack_id given to read() will match the request whose response has been waiting for the longest time. Take care to also set 'dxfer_direction' to any valid value (e.g. SG_DXFER_NONE) when in this mode. The 'interface_id' member should also be set appropriately.
Apart from the SG_SET_FORCE_PACK_ID case (and then only for the 3 indicated fields), the sg_io_hdr_t object given to read() can be uninitialized. Note that the 'sbp' pointer value for optionally outputting a sense buffer was recorded from the earlier, corresponding write().
When close() leaves outstanding SCSI commands still awaiting responses, the sg driver maintains its internal structures for the now defunct file descriptor. These internal structures are maintained until all outstanding responses (some might be timeouts) are received. When the sg driver is loaded as a module and has any open file descriptors or "defunct" file descriptors then it cannot be unloaded. An attempt to call rmmod sg will report the driver is busy. Defunct file descriptors that remain for some time, perhaps awaiting a timeout, can be observed with the cat /proc/scsi/sg/debug command. In this case "closed=1" will be set on the defunct file descriptor [see Section 11.1]. Defunct file descriptors do not impede attempts by applications to open() new file descriptors on the same SCSI device.
The kernel arranges for only the last close() on a file descriptor to be seen by a driver (and to emphasize this, the corresponding sg driver call is named sg_release() rather than sg_close()). This is only significant when an application uses fork() or dup().
Returns 0 if successful, otherwise -1 implies an error.
Mmap-ed IO is requested by setting (or or-ing in) the SG_FLAG_MMAP_IO constant into the flag member of the the sg_io_hdr structure prior to a call to write() or ioctl(SG_IO). The logic to do mmap-ed IO _assumes_ that an appropriate mmap() call has been made by the application. In other words it does not check. [13]
open("/dev/sg0", O_RDONLY | O_NONBLOCK) /* check device, EBUSY means some other process has O_EXCL lock on it */ /* when the device you want is found then ... */ flags = fcntl(sg_fd, F_GETFL) fcntl(sg_fd, F_SETFL, flags & (~ O_NONBLOCK)) /* since, with simple apps, it is easier to use normal blocked io */ |
sigemptyset(&sig_set) sigaddset(&sig_set, SIGPOLL) sigaction(SIGPOLL, &s_action, 0) fcntl(sg_fd, F_SETOWN, getpid()) flags = fcntl(sg_fd, F_GETFL); fcntl(sg_fd, F_SETFL, flags | O_ASYNC) |
errno which_calls Meaning ----- ----------- ---------------------------------------------- EACCES <some ioctls> Root permission (more precisely CAP_SYS_ADMIN or CAP_SYS_RAWIO) required. Also may occur during an attempted write to /proc/scsi/sg files. EAGAIN r The file descriptor is non-blocking and the request has not been completed yet. EAGAIN w,SG_IO SCSI sub-system has (temporarily) run out of command blocks. EBADF w File descriptor was not open()ed O_RDWR. EBUSY o Someone else has an O_EXCL lock on this device. EBUSY w With mmap-ed IO, the reserved buffer already in use. EBUSY <some ioctls> Attempt to change something (e.g. reserved buffer size) when the resource was in use. EDOM w,SG_IO Too many requests queued against this file descriptor. Limit is SG_MAX_QUEUE active requests. If sg_header interface is being used then the default queue depth is 1. Use SG_SET_COMMAND_Q ioctl() to increase it. EFAULT w,r,SG_IO Pointer to user space invalid. <most ioctls> EINVAL w,r Size given as 3rd argument not large enough for the sg_io_hdr_t structure. Both direct and mmap-ed IO selected. EIO w Size given as 3rd argument less than size of old header structure (sg_header). Additionally a write() with the old header will yield this error for most detected malformed requests. EIO r A read() with the older sg_header structure yields this value for some errors that it detects. EINTR o While waiting for the O_EXCL lock to clear this call was interrupted by a signal. EINTR r,SG_IO While waiting for the request to finish this call was interrupted by a signal. EINTR w [Very unlikely] While waiting for an internal SCSI resource this call was interrupted by a signal. EMSGSIZE w,SG_IO SCSI command size ('cmd_len') was too small (i.e. < 6) or too large ENODEV o Tried to open() a file with no associated device. [Perhaps sg has not been built into the kernel or is not available as a module?] ENODEV o,w,r,SG_IO SCSI device has detached, awaiting cleanup. User should close fd. Poll() will yield POLLHUP. ENOENT o Given filename not found. ENOMEM o [Very unlikely] Kernel was not even able to find enough memory for this file descriptor's context. ENOMEM w,SG_IO Kernel unable to find memory for internal buffers. This is usually associated with indirect IO. For mmap-ed IO 'dxfer_len' greater than reserved buffer size. Lower level (adapter) driver does not support enough scatter gather elements for requested data transfer. ENOSYS w,SG_IO 'interface_id' of a sg_io_hdr_t object was _not_ 'S'. ENXIO o "remove-single-device" may have removed this device. ENXIO o, w,r,SG_IO Internal error (including SCSI sub-system busy doing error processing - e.g. SCSI bus reset). When a SCSI device is offline, this is the response. This can be bypassed by opening O_NONBLOCK. EPERM o Can't use O_EXCL when open()ing with O_RDONLY EPERM w,SG_IO File descriptor open()-ed O_RDONLY but O_RDWR <some ioctls> access mode needed for this operation. |
The ability of the SG_IO ioctl() to issue certain SCSI commands has led to some relaxation on file descriptors open()ed "read-only" compared with the version 2 sg driver. The open() call will now attempt to allocate a reserved buffer for all newly opened file descriptors. The ioctl(SG_SET_RESERVED_SIZE) will now work on "read-only" file descriptors.
SG_GET_NUM_WAITING 0x227d. Assumes 3rd argument points to an int and places the number of packets waiting to be read in it. Only those requests that have been issued by a write() and are now available to be read() are counted. In other words any ioctl(SG_IO) operations underway on this file descriptor will not effect this count [14].
req_state 0 -> request not in use 1 -> request has been sent, but is not finished (i.e. it is between stages 1 and 2 in the "theory of operation") 2 -> request is ready to be read() (i.e. it is between stages 2 and 3 in the "theory of operation") orphan 0 -> normal request 1 -> request sent by SG_IO ioctl() which has been interrupted by a signal sg_io_owned 0 -> request sent by a write() 1 -> request sent by a SG_IO ioctl() problem 0 -> no problem (or 1 == req_state) 1 -> req_state is 2 and either masked_status, host_status or driver_status is non-zero duration [if 1 == req_state] time since request was sent (in millisecs) [if 2 == req_state] duration of request (in millisecs). Clock is stopped when stage 2 in "theory of operation" is reached pack_id usr_ptr these are user provided values in the sg_io_hdr_t (or struct sg_header) that sent the request |
typedef struct sg_scsi_id { /* used by SG_GET_SCSI_ID ioctl() */ int host_no; /* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */ int channel; int scsi_id; /* scsi id of target device */ int lun; int scsi_type; /* TYPE_... defined in scsi/scsi.h */ short h_cmd_per_lun;/* host (adapter) maximum commands per lun */ short d_queue_depth;/* device (or adapter) maximum queue length */ int unused[2]; /* probably find a good use, set 0 for now */ } sg_scsi_id_t; |
Some seldom used ioctl()s introduced in the sg 2.x series drivers have been withdrawn. They are:
typedef struct my_scsi_idlun { int four_in_one; /* 4 separate bytes of info compacted into 1 int */ int host_unique_id; /* distinguishes adapter cards from same supplier */ } My_scsi_idlun; |
(scsi_device_id | (lun << 8) | (channel << 16) | (host_no << 24)) |
The advantage of this ioctl() is that it can be called on any SCSI file descriptor.
The structure that we are passed should look like:
struct sdata { unsigned int inlen; [i] Length of data written to device unsigned int outlen; [i] Length of data read from device unsigned char cmd[x]; [i] SCSI command (6 <= x <= 16) [o] Data read from device starts here [o] On error, sense buffer starts here unsigned char wdata[y]; [i] Data written to device starts here }; |
The SCSI command length is determined by examining the 1st byte of the given command [15] . There is no way to override this.
Data transfers are limited to PAGE_SIZE (4K on i386, 8K on alpha).
The length (x + y) must be at least OMAX_SB_LEN bytes long to accommodate the sense buffer when an error occurs. The sense buffer is truncated to OMAX_SB_LEN (16) bytes so that old code will not be surprised.
If a Unix error occurs (e.g. ENOMEM) then the user will receive a negative return and the Unix error code in 'errno'. If the SCSI command succeeds then 0 is returned. Positive numbers returned are the compacted SCSI error codes (4 bytes in one int) where the lowest byte is the SCSI status. See the drivers/scsi/scsi.h file for more information on this.
The normal action of the sg driver for a read operation (from a device) is to request the lower level (adapter) driver to DMA [16] data into kernel buffers that the sg driver manages. The sg driver will then copy the contents of its buffers into the user space. [This sequence is reversed for a write operation (towards a device)]. While this double handling of data is obviously inefficient it does decouple some hardware issues from user applications. For these and historical reasons the "double-buffered" IO remains the default for the sg driver.
Both "direct" and "mmap-ed" IO are techniques that permit the data to be DMA-ed directly from the lower level (adapter) driver into the user application (vice versa for write operations). Both techniques result in faster speed, smaller latencies and lower CPU utilization but come at the expense of complexity (as always). For example the Linux kernel must not attempt to swap out pages in a user application that a SCSI adapter is busy DMA-ing data into.
Direct IO uses the kiobuf mechanism [see the Linux Device Drivers book] to manipulate memory allocated within the user space so that a lower level (adapter) driver can DMA directly to or from that user space memory. Since the user can give a different data buffer to each SCSI command passed through the sg interface then the kiobuf mechanism needs to setup its structures (and undo that setup) for each SCSI command. [17] Direct IO is available as an option in sg 3.1.18 (before that the sg driver needed to be recompiled with an altered define). Direct IO support is designed in such a way that if it is requested and cannot be performed then the command will still be performed using indirect IO. If direct IO is requested and has been performed then the SG_INFO_DIRECT_IO bit will be set in the 'info' member of the sg_io_hdr_t control structure after the request has been completed. Direct IO is not supported on ISA SCSI adapters since they only can address a 24 bit address space.
One limit on direct IO is that sg_io_hdr_t::iovec_count==0. So the user cannot (currently) use application level scatter gather and direct IO on the same request.
For direct IO to be worthwhile, a reasonable amount of data should be requested for data transfer. For transfers less than 8 KByte it is probably not worth the trouble. On the other hand "locking down" a multiple 512 KB blocks of data for direct IO could adversely impact overall system performance. Remember that for the duration of a direct IO request, the data transfer buffer is mapped to a fixed memory location and locked in such a way that it won't be swapped out. This can "cramp the style" of the kernel if it is overdone.
Prior to sg 3.1.18 the direct IO code was commented out with the "SG_ALLOW_DIO" define. In sg 3.1.18 (available for lk 2.4.2 and later) the direct IO code is active but is defaulted off by a run time value. This value can be accessed via the "proc" file system at /proc/scsi/sg/allow_dio . Direct IO is enabled when a user with root permissions writes "1" to that file: echo 1 > /proc/scsi/sg/allow_dio . If SG_FLAG_DIRECT_IO is set in sg_io_hdr::flags but /proc/scsi/sg/allow_dio holds "0" then indirect IO will be performed (and this is indicated by ((sg_io_hdr::info & SG_INFO_DIRECT_IO_MASK) == SG_INFO_INDIRECT_IO) after the request is completed).
Memory-mapped IO takes a different approach from direct IO to removing the extra data copy performed by normal ("indirect") IO. With mmap-ed IO the application calls the mmap() system call to memory map sg's reserved buffer. The sg driver maintains one reserved buffer per file descriptor. The default size of the reserved buffer is 32 KB and it can be changed with the ioctl(SG_SET_RESERVED_SIZE). The mmap() system call only needs to be called once prior [18] to doing mmap-ed IO. For more details on the mmap() see Section 7.6. An application indicates that it wants mmap-ed on a SCSI request by setting the SG_FLAG_MMAP_IO value in 'flags'.
Since there is only reserved buffer per sg file descriptor then only one mmap-ed IO command can be active at one time. In order to perform command queuing with mmap-ed IO, an application will need to open() multiple file descriptors to the same SCSI device. With mmap-ed IO the various status values and the sense buffer (if required) are conveyed back to an application in the same fashion as normal ("indirect") IO.
Mmap-ed has very low per command latency since the reserved buffer mapping only needs to be done once per file descriptor. Also the reserved buffer is set up by the sg driver to aid the efficient construction of the internal scatter gather list used by the lower level (adapter) driver for DMA purposes. This tends to be more efficient than the user memory that direct IO requires the sg driver to process into an internal scatter gather list. So on both these counts, mmap-ed IO has the edge over direct IO.
sg_def_reserved_size=<n> |
If sg is a module, it can be loaded with modprobe in either manner:
modprobe sg modprobe sg def_reserved_size=<n> |
If sg is a module, it can be unloaded with rmmod like this:
rmmod sg |
The sg driver provides information about the SCSI subsystem and the current internal state of the sg driver in the /proc/scsi/sg directory. Some sg driver defaults can be changed by super user writing values to these "pseudo" files [19].
The following files which are readable by all:
allow_dio 0 indicates direct IO disable, 1 for enabled debug debug information including active request data def_reserved_size default buffer size reserved for each file descriptor devices one line of numeric data per device device_hdr single line of column names corresponding to 'devices' device_strs one line of vendor, product and rev info per device hosts one line of numeric data per host host_hdr single line of column names corresponding to 'hosts' host_strs one line of host information (string) per host version sg version as a number followed by a string representation |
Each line in 'devices' and 'device_strs' corresponds to an sg device. For example the first line corresponds to /dev/sg0. The line number (origin 0) also corresponds to the sg minor device number. This mapping is local to sg and is normally the same as given by th cat /proc/scsi/scsi command which is reported by the SCSI mid level driver. The two mappings may diverge when 'remove-single-device' and 'add-single-device' are used (see the SCSI-2.4-HOWTO for more information).
Each line in 'hosts' and 'host_strs' corresponds to a SCSI host. For example the first line corresponds to the host normally represented as "scsi0". This mapping is invariant across the SCSI sub system. [So these entries could arguably be migrated to the mid level.]
The column headers in 'device_hdr' are given below. If the device is not present (and one is present after it) then a line of "-1" entries is output. Each entry is separated by a whitespace (currently a tab):
host host number (indexes 'hosts' table, origin 0) chan channel number of device id SCSI id of device lun Logical Unit number of device type SCSI type (e.g. 0->disk, 5->cdrom, 6->scanner) opens number of opens (by sd, sr, sr and sg) at this time depth maximum queue depth supported by device busy number of commands being processed by host for this device online 1 indicates device is in normal online state, 0->offline |
The column headers in 'host_hdr' are given below. Each entry is separated by a whitespace (currently a tab):
uid unique id (non-zero if multiple hosts of same type) busy number of commands being processed for this host cpl maximum number of command per lun (may be 0 if "device depth" is given sgat maximum elements of scatter gather the adapter (pseudo) DMA can accommodate isa 0 -> non-ISA adapter, 1 -> ISA adapter. ISA adapters are assumed to have a 24 bit address bus limit (16 MB). emu 0 -> real SCSI adapter, 1 -> emulated SCSI adapter (e.g. ide-scsi device driver) |
The 'def_reserved_size' is both readable and writable. It is only writable by root. It is initialized to the value of DEF_RESERVED_SIZE in the "sg.h" file. Values between 0 and 1048576 (which is 2 ** 20) are accepted and can be set from the command line with the following syntax:
$ echo "262144" > /proc/scsi/sg/def_reserved_size |
The 'allow_dio' is both readable and writable. It is only writable by root. When it is 0 (default) any request to do direct IO (i.e. by setting SG_FLAG_DIRECT_IO) will be ignored and indirect IO will be done instead.
$ cat /proc/scsi/sg/debug dev_max(currently)=7 max_active_device=1 (origin 1) scsi_dma_free_sectors=416 sg_pool_secs_aval=320 def_reserved_size=32768 >>> device=sg0 scsi0 chan=0 id=0 lun=0 em=0 sg_tablesize=255 excl=0 FD(1): timeout=60000ms bufflen=65536 (res)sgat=2 low_dma=0 cmd_q=1 f_packid=1 k_orphan=0 closed=0 fin: id=3949312 blen=65536 dur=10ms sgat=2 op=0x28 act: id=3949440 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28 rb>> act: id=3949568 blen=65536 t_o/elap=60000/10ms sgat=2 op=0x28 act: id=3949696 blen=65536 t_o/elap=60000/0ms sgat=2 op=0x28 |
Each line indented with 5 spaces represents a SCSI command. The state of the command is either:
If sg has lots of activity then the "debug" output may span many lines and in some cases appear to be corrupted. This occurs because procfs requests fixed buffer sizes of information and, if there is more data to output, returns later to get the remainder. The problem with this strategy is that sg's internal state may have changed. Rather than double buffering, the sg driver just continues from the same offset. While procfs is very useful, ioctl()s (such as SG_GET_REQUEST_TABLE) still have their place.
variants of the Unix dd command: sg_dd, sgp_dd, sgq_dd and sgm_dd,
scanning and mapping utilities: sg_scan, sg_map and scsi_devfs_scan,
SCSI support: sg_inq, scsi_inquiry, sginfo, sg_readcap, sg_start and sg_reset,
timing and testing: sg_rbuf, sg_test_rwbuf, sg_read, sg_turs and sg_debug,
The "dd" family of utilities take a sg device file name as input (i.e. if=<sg_dev_filen_name>), as output of both. They can also take raw device file names [20] instead of sg device file names. One important difference from the standard dd command is that the value given to the block size (bs=) argument must be the exact block size of that device and not a integral multiple as allowed by dd. These "dd" variants are suitable for SCSI Direct Access Devices such as disk and CDROMs (but are not suitable for SCSI tape devices).
The sg3_utils package is designed to be used with the sg version 3 driver found in the lk 2.4 series. There is also a sg_utils package that supports a subset of these commands for the sg version 2 driver (with some support for the original sg driver) which is found in the lk 2.2 series (from and after lk 2.2.6). There are links to the most recent sg3_utils (and sg_utils) packages at the sg website at www.torque.net/sg. There are tarballs and both source and binary rpm packages. At the time of writing the latest sg3_utils tarball is at www.torque.net/sg/p/sg3_utils-0.97.tgz. There is a README file in that tarball that should be examined for up to date information. The more important utility commands (e.g. sg_dd) have "man" pages. [21]
Almost all of the sg device driver capabilities discussed in this document appear in code in one or more of these programs. For example the recently added mmap-ed IO can be found in sgm_dd, sg_read and sg_rbuf.
The sg3_utils package also provides some functions that may be useful for applications that use sg. The functions declared in sg_err.h and defined in sg_err.c categorize SCSI subsystem errors that are returned to an application in a read() or a ioctl(SG_IO). In the case of sense buffers, they are decoded into text message (as per SCSI 2 definitions). There is also a function to do a 64 bit seek (llseek.h).
struct sg_header { int pack_len; /* [o] */ int reply_len; /* [i] */ int pack_id; /* [i->o] */ int result; /* [o] */ unsigned int twelve_byte:1; /* [i] */ unsigned int target_status:5; /* [o]+ */ unsigned int host_status:8; /* [o]+ */ unsigned int driver_status:8; /* [o]+ */ unsigned int other_flags:10; /* unused */ unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] */ }; /* This structure is 36 bytes long on i386 */ |
This interface is fully described in the www.torque.net/sg/p/scsi-generic.txt file which documents the sg version 2 driver.
Since many Linux applications use this interface, it is still supported in this version (i.e. version 3) of the driver. Only its most perverse idiosyncrasies have been modified and no major applications have reported any problems running old applications atop this newer driver.
This appendix contains an example program. It is an abridged version of sg_simple2.c found in the sg3_utils package. It send a SCSI INQUIRY command to the nominated sg device and prints out some of the response or outputs error information. Hopefully showing the error processing does not cloud what is being illustrated.
#include <unistd.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/ioctl.h> #include <scsi/sg.h> /* take care: fetches glibc's /usr/include/scsi/sg.h */ /* This is a simple program executing a SCSI INQUIRY command using the sg_io_hdr interface of the SCSI generic (sg) driver. * Copyright (C) 2001 D. Gilbert * This program is free software. Version 1.01 (20020226) */ #define INQ_REPLY_LEN 96 #define INQ_CMD_CODE 0x12 #define INQ_CMD_LEN 6 int main(int argc, char * argv[]) { int sg_fd, k; unsigned char inqCmdBlk[INQ_CMD_LEN] = {INQ_CMD_CODE, 0, 0, 0, INQ_REPLY_LEN, 0}; /* This is a "standard" SCSI INQUIRY command. It is standard because the * CMDDT and EVPD bits (in the second byte) are zero. All SCSI targets * should respond promptly to a standard INQUIRY */ unsigned char inqBuff[INQ_REPLY_LEN]; unsigned char sense_buffer[32]; sg_io_hdr_t io_hdr; if (2 != argc) { printf("Usage: 'sg_simple0 <sg_device>'\n"); return 1; } if ((sg_fd = open(argv[1], O_RDONLY)) < 0) { /* Note that most SCSI commands require the O_RDWR flag to be set */ perror("error opening given file name"); return 1; } /* It is prudent to check we have a sg device by trying an ioctl */ if ((ioctl(sg_fd, SG_GET_VERSION_NUM, &k) < 0) || (k < 30000)) { printf("%s is not an sg device, or old sg driver\n", argv[1]); return 1; } /* Prepare INQUIRY command */ memset(&io_hdr, 0, sizeof(sg_io_hdr_t)); io_hdr.interface_id = 'S'; io_hdr.cmd_len = sizeof(inqCmdBlk); /* io_hdr.iovec_count = 0; */ /* memset takes care of this */ io_hdr.mx_sb_len = sizeof(sense_buffer); io_hdr.dxfer_direction = SG_DXFER_FROM_DEV; io_hdr.dxfer_len = INQ_REPLY_LEN; io_hdr.dxferp = inqBuff; io_hdr.cmdp = inqCmdBlk; io_hdr.sbp = sense_buffer; io_hdr.timeout = 20000; /* 20000 millisecs == 20 seconds */ /* io_hdr.flags = 0; */ /* take defaults: indirect IO, etc */ /* io_hdr.pack_id = 0; */ /* io_hdr.usr_ptr = NULL; */ if (ioctl(sg_fd, SG_IO, &io_hdr) < 0) { perror("sg_simple0: Inquiry SG_IO ioctl error"); return 1; } /* now for the error processing */ if ((io_hdr.info & SG_INFO_OK_MASK) != SG_INFO_OK) { if (io_hdr.sb_len_wr > 0) { printf("INQUIRY sense data: "); for (k = 0; k < io_hdr.sb_len_wr; ++k) { if ((k > 0) && (0 == (k % 10))) printf("\n "); printf("0x%02x ", sense_buffer[k]); } printf("\n"); } if (io_hdr.masked_status) printf("INQUIRY SCSI status=0x%x\n", io_hdr.status); if (io_hdr.host_status) printf("INQUIRY host_status=0x%x\n", io_hdr.host_status); if (io_hdr.driver_status) printf("INQUIRY driver_status=0x%x\n", io_hdr.driver_status); } else { /* assume INQUIRY response is present */ char * p = (char *)inqBuff; printf("Some of the INQUIRY command's response:\n"); printf(" %.8s %.16s %.4s\n", p + 8, p + 16, p + 32); printf("INQUIRY duration=%u millisecs, resid=%d\n", io_hdr.duration, io_hdr.resid); } close(sg_fd); return 0; } |
The sg_simple4.c program is an example of using mmap-ed IO in the sg3_utils package. An example of using direct IO can be found in sg_rbuf.c in the same package.
system("cat /proc/scsi/sg/debug"); |
$ echo "scsi log timeout 7" > /proc/scsi/scsi |
$ echo "scsi log timeout 0" > /proc/scsi/scsi |
The primary site for SCSI information, standards (draft and emerging) and related reseources is www.t10.org.
The most recent news on the sg driver can be found at: www.torque.net/sg .
Some notes on the sg v3 driver can be found at: www.torque.net/sg/s_packet.html . For some timings (and CPU utilizations) comparisons between direct and indirect IO see: www.torque.net/sg/rbuf_tbl.html
The Linux Documentation Project's SCSI-2.4-HOWTO may help to put this driver into perspective: linuxdoc.org/HOWTO/SCSI-2.4-HOWTO . The most recent version of that document can be found at www.torque.net/scsi/SCSI-2.4-HOWTO .
To understand the inner workings of device drivers there is a fine book called "Linux Device Drivers", second edition by Alessandro Rubini and Jonathan Corbet published by O'Reilly [ISBN 0-596-00008-1]. The authors and the publisher have unselfishly made this book available under the GNU Free Documentation License (version 1.1). It can be found in html at www.oreilly.com/catalog/linuxdrive2/chapter/book .
[1] | SCSI command opcode 0x7f does allow for variable length commands but that is not supported in Linux currently. | |
[2] | There is an sg version 3.0.19 which is an optional driver for the lk 2.2 series. It has the following limitations:
| |
[3] | Patches exist for sg to extend the number of SCSI devices past the 256 limit when the device file system (devfs) is being used. | |
[4] | Linux kernel prior to 2.4.15 limited SCSI commands to a length of 12 bytes. In lk 2.4.15 this was raised to 16 bytes. However unless lower level drivers (e.g. aic7xxx) indicate that they can handle 16 byte commands (and few currently do) then the command is aborted with a DID_ABORT host status. | |
[5] | Some HBA - SCSI device combinations have difficulties with an odd valued dxfer_len . In some cases the operation succeeds but a DID_ERROR host status is returned. So unless there is a good reason, applications that want maximum portability should avoid an odd valued dxfer_len . | |
[6] | Whether aborting individual commands is supported or not is left to the adapter. Many adapters are unable to abort SCSI commands "in flight" because these details are handled in silicon by embedded processors in hardware. SCSI device or bus resets are required. | |
[7] | Some lower level drivers (e.g. ide-scsi) clear this status field even when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred. However they do set DRIVER_SENSE in driver_status field. Also a (sb_len_wr > 0) indicates there is a sense buffer. | |
[8] | Some lower level drivers (e.g. ide-scsi) clear this masked_status field even when a CHECK_CONDITION or COMMAND_TERMINATED status has occurred. However they do set DRIVER_SENSE in driver_status field. Also a (sb_len_wr > 0) indicates there is a sense buffer. | |
[9] | In some cases the sym53cxx driver reports a DID_ERROR when it internally rounds up an odd transfer length by 1. This is an example of a "non-error". | |
[10] | Unfortunately some adapters drivers report an incorrect number for 'resid'. This is due to some "fuzziness" in the internal interface definitions within the Linux scsi subsystem concerning the _exact_ number of bytes to be transferred. Therefore only applications tied to a specific adapter that is known to give the correct figure should use this feature. Hopefully this will be cleared up in the near future. | |
[11] | The command queuing capabilities of the SCSI device and the adapter driver should also be taken into account. To this end the sg_scsi_id::h_cmd_per_lun and sg_scsi_id::d_queue_depth values returned bu ioctl(SG_GET_SCSI_ID) may be useful. Also some devices that indicate in their INQUIRY response that they can accept command queuing react badly when queuing is actually attempted. | |
[12] | There is a small probability it will spend some time waiting for a command block to become available. In this case the wait is interruptible. If O_NONBLOCK is active then this scenario will cause a EAGAIN. | |
[13] | The sg driver does record that the mmap() system call has been invoked at least once on a file descriptor. This is not sufficient because the given 'length' may be too short for the current IO. Also the driver is unaware of munmap() calls so it could easily be tricked. | |
[14] | If ioctl(SG_SET_KEEP_ORPHAN) is set to 1 and a ioctl(SG_IO) operation is interrupted (e.g. by control-C by the user) then when the response arrives then the "num_waiting" will be incremented to indicate a read() can now pick up the response. | |
[15] | Here is the mapping from the SCSI opcode "group" (top 3 bits of opcode) to the assumed length (in lk 2.4.15):
| |
[16] | Older SCSI adapters and some pseudo adapter drivers don't have DMA capability in which case the CPU is used to copy the data. | |
[17] | Unfortunately that setup time is large enough in some versions of the lk 2.4 series to adversely impact direct IO performance. Also memory malloc()-ed in the user space tends to be made up of discontinuous pages seen from the SCSI adapter. This requires the sg driver to build heavily splintered scatter gather lists which is less than desirable. This limits the maximum transfer size to [(max_scsi_adapter_scatter_gather_elements - 1) * PAGE_SIZE]. [This is a _different_ scatter gather mechanism to that which the user sees in the sg interface based on iovec.] | |
[18] | When a write() or ioctl(SG_IO) attempts mmap-ed IO there is no check performed that a prior mmap() system call has been performed. If no mmap() has been issued then random data is written to the device or data read from the device in inaccessible. Also once mmap() has been called on a file descriptor then all subsequent calls to ioctl(SG_SET_RESERVED_SIZE) will yield EBUSY. | |
[19] | One strange quirk is that the /proc/scsi/sg directory will not appear if there are no SCSI devices (or pseudo devices such as USB mass storage) attached to the system. The reason for this is that in the absence of SCSI devices, the SCSI mid level does not initialize the sg driver (even if it has been loaded as a module). When the sg driver is a module and the rmmod sg is successfully executed then the /proc/scsi/sg directory and its contents are removed. | |
[20] | Raw device names are of the form /dev/raw/raw<n> and can be bound to block devices (e.g. an IDE disk partition such as /dev/hda3). The binding is done with the raw command (see "man raw"). | |
[21] | Although the author wrote most of these programs, initially to test facilities within the sg driver, some have been contributed by others. See www.torque.net/sg/u_index.html for more information. |