Other articles in this series:

A brief introduction to SCSI

Introduction. SCSI is an I/O subsystem architecture to allow I/O peripherals to be attached to a system. It has over the years been used to attach devices such as hard disk drives, SSDs, floppy drives, optical drives, tape drives, media changers (for e.g. optical or tape media), RAID controllers and storage enclosures containing other SCSI devices. In the past it has also been used to attach scanners, printers and network interfaces. Today, it is primarily used to attach hard drives, SSDs, optical drives, tape drives and devices which support these (like medium changers for tape drives, RAID controllers and storage enclosures). Other applications, such as scanners, printers and network interfaces have long since moved to interfaces like USB or PCI(e). SCSI itself evolved from, and is substantially influenced by, the channel I/O system of IBM System/360 mainframes.

Since SCSI itself is purely a logical architecture and command set and does not imply any specific physical interconnect, SCSI itself has been adapted to as many interconnects as it has device types. Originally used with parallel ribbon cables, SCSI has subsequently transitioned to serial interfaces (Serially Attached SCSI (SAS)), just as PATA transitioned to SATA. However, SCSI mappings have also been defined for countless other interconnects, including Fibre Channel, TCP/IP, Firewire, USB, PCIe, RDMA (e.g. Infiniband) and even serial lines. Ironically, it is even tunneled over ATA (ATAPI) for accessing PATA and SATA optical drives, as the ATA command set itself is unsuitable for such applications.

SCSI logical architecture. Different I/O subsystems differ radically in the paradigm of I/O they expose. For example, interfaces like PCI or PCIe provide a memory-mapped I/O model, and interfaces like Ethernet send and receive frames of limited size asynchronously; whereas SCSI is command-based. A host submits commands to devices (more specifically logical units (LUs)) via means of an SCSI subsystem connecting a host and some number of devices, and the LUs return responses to these commands.

An initiator is a device which submits SCSI commands, and a target is a device which receives them. Though the host will usually be an initiator, many SCSI HBAs allow themselves to be configured in target mode, meaning that the host acts as a target not an initiator, allowing it to serve e.g. virtual block devices to other machines. Nowadays, a device is allowed to be both an initiator and a target, though this is uncommon.

Naming and numbering. A target is split into one or more logical units (LUs). This is similar to how PCI devices are split into one or more “functions” (such as an Ethernet card with multiple ports), or how USB devices can be composite devices which are e.g. both audio and video devices (such as a webcam with a built in microphone). Each LU has one or more logical unit numbers (LUNs) and zero or more logical unit names (in practice, any normal device will have one). Commands are submitted to LUs by addressing them by their logical unit number (LUN), combined with an initiator port ID and target port ID, thereby denoting the route to be used. In modern SCSI, LUNs are usually 64 bits, though older variants may support only 16-bit or 3-bit LUNs. Somewhat like network addresses, modern 64 bit LUNs can have hierarchical substructure within their bits; LUs may contain LUs.

Logical unit numbers (LUNs) and logical unit names should not be confused. Logical unit names are globally unique; LUNs are not. Logical unit names can be allocated from a number of globally-unique namespaces, but will generally be either a Network Address Authority (NAA) identifier or an EUI-64. A NAA is an identifier format similar to an EUI-48 (e.g. a MAC address) or EUI-64 in that it combines a three-byte IEEE OUI with an integer assigned beneath that OUI to form a globally unique identifier. Like a MAC address, it also supports use of locally assigned values, which might not be globally unique.

Commands. An SCSI command consists primarily of a Command Descriptor Block (CDB), which is a binary structure usually not longer than 16 or 32 or so bytes. (Older versions of SCSI didn't support CDBs longer than 16 bytes. Modern versions of SCSI support CDBs of arbitrary length, referred to as variable-length CDBs.) The first byte is the major opcode and indicates the operation to perform. (Some commands access multiple leaf functions accessed via an additional minor opcode field called “Service Action”.)

The final byte in any CDB is the Control byte. This contains flags, all but one of which are now obsolete. The Control byte is often set to zero.

Every SCSI command can (optionally) have data associated with it. Similarly to ATA, the architecture of SCSI logically separates commands and data, and will typically be transported via logically separate means, even if ultimately over the same wiring.

“Data In” always refers to data transfer from a device to the host; “Data Out” always refers to data transfer from a host to a device. Older SCSI implementations only allowed a command to have Data In or Data Out, or neither, but not both; support for bidirectional commands came later.

SCSI consists of a base command set which all SCSI devices must support (the SCSI Primary Commands (SPC)), and various further command sets specific to different types of peripherals. For example, the SCSI Block Commands (SBC) command set is implemented by hard disk drives, SSDs and floppy drives; the SCSI Multimedia Commands (MMC) command set is implemented by optical drives.

Commands can take a potentially unbounded amount of time; for example, the REWIND command on a tape drive. There is no requirement for fast completion, and multiple commands can be queued simultaneously.

Commands are identified by a command ID or tag, an integer assigned by the initiator submitting the command. Commands can be aborted while in progress. This is a significant advantage SCSI has over ATA. Commands are aborted by invoking a task management function, one of which is “ABORT TASK”, with the command ID as an argument. Task management functions are not SCSI commands but a separate mechanism provided parallel to the mechanism which a given SCSI subsystem provides for submitting commands; an SCSI subsystem must have a way to execute commands and to invoke task management functions. Other task management functions include “LOGICAL UNIT RESET”, which allows a LU to be reset, and “TARGET RESET”, which resets an entire target.

The names of commands are written in uppercase by convention. Some commands have multiple variants, each using a different length of CDB, and are disambiguated by writing the CDB length in parentheses. For example, you can use READ (6), READ (10), READ (12), READ (16) or READ (32) to read blocks from a block device using a 6, 10, 12, 16 or 32-byte CDB respectively. Variants using larger CDBs generally offer greater functionality; for the READ command, for example, READ (16) offers support for 64-bit LBAs, whereas READ (10) and READ (12) offer support only for 32-bit LBAs, and READ (6) only supports 21-bit LBAs. Variants using larger CDBs may be less well-supported as these commands tend to be newer.

Multiple initiators, multiple ports. One of the most “enterprise” features of SCSI is that a single LU can have commands submitted to it from multiple initiators concurrently. This is not so relevant when a SCSI device is connected directly to a machine, but if an SCSI device, such as a hard disk, is attached to an iSCSI or Fibre Channel SAN, it could be used by multiple initiators simultaneously.

Furthermore, SAS hard drives contain two PHYs, exposed on their connector (comparable terminology in PCIe would be two links, each of one lane). This allows a single hard drive to be connected to two different SCSI subsystems for redundancy purposes (a multiport configuration). Thus, a failure of one SCSI subsystem still allows the drive to be accessed via the other subsystem. Alternatively, the two PHYs can be ganged as a single x2-wide link to a single SCSI subsystem, or only one PHY may be used.

Since SCSI devices are often designed to be accessed concurrently from multiple initiators, the SCSI Primary Commands (SPC) define commands allowing reservations to be acquired and released on devices. This is essentially a locking mechanism to synchronise concurrent access by multiple initiators. Devices keep track of the reservations they have given out. Some devices may even support persistent reservations which are retained over power loss.

Command results. When a command completes, a status byte is returned. There are only about a dozen possible values, the most common of which are:

However, the status byte alone is not very informative. A LU maintains sense information, which is a binary structure describing the current status of the device, generally reflecting any errors which occurred after the execution of the last command. Originally this sense information had to be requested manually (after a command completed unsuccessfully, in other words with a status of CHECK CONDITION) by executing the command REQUEST SENSE, which would return the sense information as Data In. Since this was inconvenient, the notion of autosense was implemented. If a initiator requested autosense for a command, sense information would be returned along with the status byte if a command completes with CHECK CONDITION. (Autosense is now so normal that the ability to disable it has been removed from the most recent versions of the standard.)

Within the sense data, the most important fields are the 4-bit sense key, the one-byte additional status code (ASC) and the one-byte additional status code qualifier (ASCQ). The sense key provides very coarse information on the kind of error; the ASC is much more detailed and is then further qualified by the ASCQ, the meaning of which depends on (i.e., is scoped under) the ASC.

The most common sense keys are:

Here is a random sample of some of the (ASC, ASCQ) values which have been assigned, to give an idea of what information is communicated:

The sense data will also contain other information about the nature and cause of the condition. For example, for “INVALID FIELD IN CDB”, it may contain a field indicating the offset of the invalid field in the CDB which was submitted.

Unit attentions. A LU can generate a unit attention condition. This can be vaguely thought of as the virtual equivalent of a blinking problem light, and provides a means for a device to notify a initiator of a condition which has arisen which should interrupt normal command processing. Examples of circumstances which might cause a unit attention condition include ejecting a disc from an optical drive, or inserting a disc.

When a unit attention condition occurs, any command other than INQUIRY, REPORT LUNS or REQUEST SENSE will fail with a status code of CHECK CONDITION and a sense key of UNIT ATTENTION until the unit attention condition is cleared. This ensures that commands are not executed that a initiator might not want to execute if it were aware of the unit attention condition. For example, suppose that an initiator is reading data from an optical disc via a series of READ commands when the media in the drive is changed; if the initiator were to continue issuing READs oblivious to this, it would receive data which is a mixture of that of the old media and the new media. The initiator could check the status of the device by issuing commands beforehand, but this creates a race condition; the status of the device might change between commands, or after a command is issued but before it is executed. The unit attention condition avoids this issue.

Unit attention conditions can generally be cleared by issuing REQUEST SENSE; an initiator may then (if desired) resubmit whatever commands it was previously trying to execute.

Auto contingent allegiance. Commands may fail with errors, but not all errors during command execution are due to the generation of a unit attention condition. For example, reading a specific block from a hard disk might fail due to a bad block. Since an initiator might want to queue multiple commands at once, to be executed in sequence, this raises the question of how the target device should process subsequent commands should one of the commands in the sequence fail.

There are two modes of processing which can be selected: Contingent Allegiance (CA) and Auto Contingent Allegiance (ACA). These are probably the most confusing and bizarre pieces of terminology in the entire SCSI standard; they would probably be better named something like “Implicit Command Unblocking” and “Explicit Command Unblocking” respectively.

CA or ACA are selected via bit 2 in the Control byte of a command's CDB (which is mysteriously named “Normal ACA” or NACA). If the bit is set, ACA is used, otherwise CA is used.

In short, when a command fails with CHECK CONDITION, any other currently queued commands are temporarily blocked. For CA, this condition lasts until the initiator retrieves sense data using REQUEST SENSE or autosense (or submits any other command), after which point all other queued commands become unblocked. (If autosense is used, note that this means the blocked condition is cleared as soon as it occurs.) This gives the initiator the opportunity to retrieve sense data for the failed command before subsequent commands execute.

For ACA, blocked commands are not unblocked automatically upon receiving a subsequent command. They remain blocked until the ACA condition is explicitly cleared by invoking the CLEAR ACA task management function. Attempts to execute new commands also fail with a status value of ACA ACTIVE (unless the ACA task attribute is used, see below).

This gives the initiator the opportunity to interrogate the device arbitrarily about a failure before allowing execution of subsequent commands to continue, or to decide to abort subsequent commands using task management functions without executing them.

Task attributes. When commands are submitted by an initiator, they are marked with a task attribute, which is either simple, ordered, head of queue (HOQ), or auto-contingent allegience (ACA). An ordered command executes after all previously submitted tasks have completed; in other words, ordered commands complete in sequence. A head of queue command is executed next, even if other commands are queued. A simple command executes after all queued ordered and head of queue commands, but may execute in parallel with other simple commands.

The usage of the ACA task attribute is as follows: normally, when an ACA condition exists, attempts to execute any new commands will fail with the status of ACA ACTIVE (see above); using the ACA task attribute bypasses this and allows commands to be executed anyway. This allows an initiator to retrieve sense data using REQUEST SENSE and otherwise interrogate the device before clearing an ACA condition.

The default attribute is simple.

SCSI as a lingua franca. SCSI is sometimes used as a lingua franca to internally represent commands for I/O devices, especially block devices, which do not directly support SCSI. For example, FreeBSD represents all block device commands internally as SCSI commands, and simply translates these to ATA commands when dispatching to ATA block devices. The SCSI/ATA Translation (SAT) standard defines nthe mapping between SCSI and ATA commands. It can likewise be mapped to NVMe.

Most important commands. The only commands which must be implemented by all devices of all types are INQUIRY, REPORT LUNS and TEST UNIT READY. In practice, REQUEST SENSE is also universally available.

INQUIRY provides VPD-type information on a device as Data-In, including its device type, vendor ID, model number, etc., and will usually be one of the first commands issued to a LU. It serves a similar purpose to the read-only fields in PCI Configuration Space (vendor ID, product ID, etc.) or USB device descriptors.

REPORT LUNS returns a list of the LUNs of the LUs contained within a target, returned as Data-In.

TEST UNIT READY takes no operands and no data and simply completes successfully if the device is “ready” (for a block device, this means ready to read blocks). If a device is not ready, it returns CHECK CONDITION.

Other (optional) commands common to all device types include:

SCSI in IDL pseudocode

Service definition: command execution. We can now look at the core mechanic of SCSI — command submission — as it is formally defined in the standard. The SCSI standard defines command execution in terms of the following procedure call:

    Service Response = Execute Command(
      IN (I_T_L_Q Nexus,
          Task Attribute,
          [Data-In Buffer Size],
          [Data-Out Buffer],
          [Data-Out Buffer Size],
          [Command Priority]),
      OUT ([Data-In Buffer],
           [Sense Data],
           [Sense Data Length],
           [Status Qualifier]))

Service Response is either "Command Complete" or "Service Delivery or Target Failure".

Parameters in brackets are optional. Note that the Service Response indicates whether the SCSI command could be executed at all (even if unsuccessfully). For example, executing a READ command on a block device with an LBA which is out of range would result in the command completing with a status of CHECK CONDITION, a sense key of ILLEGAL REQUEST and an (ASC, ASCQ) of LOGICAL BLOCK ADDRESS OUT OF RANGE; whereas if the SCSI subsystem connecting initiator and target were to fail when submitting a command, this would instead be a service delivery failure. (An SCSI subsystem connecting initiators and targets, interconnects and all, is properly known in the SCSI standard as a Service Delivery Subsystem.)

The parameters in the above procedure call are as follows:

Service definition: task management functions. The execution of task management functions is defined in terms of the following procedure call:

    Service Response = {Function Name}(
      IN (Nexus,
          Command ID)
      OUT ([Additional Response Information])

The Service Response is one of:

Obsolete features

Linked commands. SCSI used to support a now-obsolete feature called linked commands; setting bit 0 of the Control byte in a CDB denoted a linked command. A sequence of linked commands all form part of the same task, all have the same command ID, and constitute a single I/O operation.

When linked commands were supported, the normal Service Response for command execution was called “Task Complete” instead of “Command Complete”, and execution of a linked command, other than the final command in the chain, would result in an alternate Service Response called “Linked Command Complete”. Moreover, the status byte would not contain “GOOD” but instead contain a value of “INTERMEDIATE” or “INTERMEDIATE — CONDITION MET”.

Since linked commands have been obsoleted, the distinction between commands and tasks is no longer as important as it once was.

Summary of command sets

Universal Commands
SPCSCSI Primary CommandsCommands not specific to a device type, and which may or must be implemented by all devices. Also previously included commands for communicating with network adapter and processor devices (now disused)
Normal Devices
SBCSCSI Block CommandsCommands for accessing block devices, such as hard drives (including drive-managed SMR hard drives), ordinary SSDs, and floppy drives1
MMCMultimedia CommandsCommands for accessing optical drives (usually used over ATA via ATAPI)
ZBCZoned Block CommandsCommands for “zoned” block devices which expose more of their internal implementation details to the host; used by host-managed SMR hard drives (relatively new)
SSCSCSI Stream CommandsCommands for accessing stream devices, such as tape drives (and, long ago, printers)
RBCReduced Block CommandsA subset of the SBC command set, for use when the full SBC command set is too complex (current usage: dubious)
OSDObject-Based Storage Device CommandsAn unusual command set which allows an SCSI storage device to be accessed more like a filesystem or key-value store, rather than as a block device (unclear what implementations ever existed, other than some software simulators)
SGCSCSI Graphic CommandsCommands for accessing scanners (current usage: totally disused)
Supporting Devices
SESSCSI Enclosure ServicesCommands which can be implemented by storage controllers (such as SAS expanders) in order to allow chassis management (that is, an entire standard to let you blink the ident LED on a drive bay, amongst other intense banalities)
SCCSCSI Controller CommandsCommands which can be implemented by RAID controllers appearing as SCSI devices, allowing for standardised control of hardware RAID controllers (current usage: dubious?)
SMCSCSI Media Changer CommandsCommands which can be used to control media (i.e., tape) autochangers
ADCAutomation/Drive Interface CommandsCommands which can be used by a tape autochanger to talk to a tape drive over a special interconnect called ADT (which uses either TCP/IP or a serial line) to tell it when to eject

Summary of interconnects

General-Purpose Interfaces
iSCSISCSI over TCP/IP. Somewhat unusual amongst SCSI interconnects for its extensive use of strings as identifiers rather than integers.
SPISCSI Parallel Interface; i.e., the old ribbon cables. (Note: Earlier versions of the SPI standard must be read together with SCSI Interlocked Protocol (SIP) standard, which defines the higher-level protocol; later versions of the SPI standard merge these into one document.)
SASSerially Attached SCSI; the replacement for SPI. Electrically related to SATA but uses higher voltage levels. SAS hosts are backwards compatible with SATA devices, though not vice versa. (Note: The SAS specification is mainly taken up by analog and electrical concerns, and pictures of the absurdly large number of different physical connectors SAS supports. For the SAS protocol which runs on top of the physical interconnect, see the SAS Protocol Layer (SPL) specification.)
FCPFibre Channel Protocol; allows SCSI to be transported over Fibre Channel SANs.
UASUSB-Attached SCSI; allows SCSI to be transported over USB.2
SRPSCSI RDMA Protocol; allows SCSI to be transported over RDMA interconnects (in practice, Infiniband).
SBPSerial Bus Protocol; allows SCSI to be transported over IEEE 1394 (Firewire). (Note that the official title of the IEEE 1394 standard is “High-Performance Serial Bus”. Note also that “bus” in this context appears to mean a bus not in the electrical sense, but in the sense of an interconnect implementing a memory-mapped load/store paradigm; IEEE 1394 is an example of such a system.)
SOP-PQISCSI over PCIe — PCIe Queueing Interface; a standardised protocol for SCSI HBAs. The PQI specification defines a generic queueing interface over PCIe, and SOP defines a message-passing protocol built on top of it for carrying SCSI. Sadly never became popular, as it would have eliminated the need for myriad different drivers for SAS HBAs. Unclear if this was ever implemented. (This was being proposed by a number of companies around 2012 as a new way to connect SSDs directly to PCIe; another set of companies simultaneously backed NVMe, an alternative host controller interface designed for SSDs only, and which does not use the SCSI command set. NVMe won.)
SSASerial Storage Architecture; a proposed replacement for SPI defined by (and, as far as I am aware, only ever implemented by) IBM. You might still be able to find SSA drives in old 90s IBM big iron. Long obsolete in favour of SAS.
UFSUniversal Flash Storage; a standard for NAND flash defined by JEDEC in 2011 and designed around the SCSI architecture. Not an “official” interconnect.
Special-Purpose Interfaces
ATAPIATA Packet Interface; allows SCSI commands to be tunneled over ATA interconnects. Used exclusively to communicate with optical drives using the MMC command set. Limited to 16-byte CDBs in practice.
ADTAutomation/Drive Interface — Transport Protocol; allows SCSI commands to be sent over either TCP/IP, or over RS-422 serial lines (this is effectively SCSI over a UART). RS-422 is similar to RS-232 but uses differential signalling. Used for ADC; see above.

Hypervisors often define their own interfaces for virtual machines to act as SCSI initiators or targets. For example, the virtio-scsi standard (as implemented by qemu/KVM) defines a standardised virtual PCI interface which can be implemented by any hypervisor.

Further reading

For further reading, the essential first document to read is the SCSI Architecture Model (SAM) standard. The document is not nearly as boring as it sounds and provides an essential architectural overview of all aspects of SCSI. SCSI is unusually well-architected, which comes across fully in SAM; reading a properly thought out architecture document is something of a welcome change. Any version will do (the current version is SAM-6); sometimes it can help to read older versions as they are often less complex.

I often find it helps to understand things by getting away from abstract notions of objects and looking at finite byte representations. For this, the UAS2 standards and the iSCSI RFCs are probably the most readable specifications which give SCSI a concrete representation. (I don't recommend trying to read the SAS standards, you will not have a fun time.)

For a more hands on approach, you can also try opening iSCSI packet captures in Wireshark, which has a complete iSCSI dissector. You can find some sample iSCSI captures on the Wireshark wiki. Note that Wireshark can also capture USB traffic, so observing UAS traffic is also easy.

See also Data Storage on Unix for an introduction to storage architectures from a Unix perspective.

The author welcomes improvements or comments to this article.

1. The SBC command set includes support for removable media, which is usually used for floppy drives. However, this does mean it is theoretically possible to have a hard drive with removable media(!) and express this via the SBC command set.

2. Note that the USB-Attached SCSI (UAS) standard is split into two documents: USB-Attached SCSI Protocol (UASP), published by USB-IF and freely available, and the USB-Attached SCSI (UAS) standard itself, which is maintained by the SCSI committee... and thus behind the ISO paywall. Both of these documents are required to make any sense of either; the meat of the UAS standard is in the ISO document. Search engines are your friend.

Other articles in this series: