The Linux 2.6 kernel was released by the Linux community via the kernel.org website in late 2003, and is now part of a growing number of distributions including SUSE/Novell (SLES 9) and Red Hat (RHEL 4). This kernel introduced significant changes that affected all I/O drivers including Fibre Channel transports.
Of particular interest is a reworked SCSI layer, enhanced I/O scheduler and queue management, new sysfs interface for status and event collection, and driver–agnostic persistent naming modules.
The Linux 2.6 kernel introduces a number of changes, including SCSI command scheduling and queuing, and automatic LUN scanning. These changes, and a number of other 2.6 kernel innovations, provide significant improvement over the 2.4 kernel, allowing the Emulex 8.x driver implementation to evolve into a robust driver, focused on providing a solid Fibre Channel transport underneath the 2.6 kernel s enhanced SCSI subsystem.Operating systems are becoming more storage competent Most operating systems, including Linux, have been initially designed with direct attached SCSI or IDE/ATA storage in mind. The role of a Fibre Channel driver was to manage all the complexities brought by FC based storage networking and sophisticated storage arrays, and present the operating system with a simplified view of a number of SCSI strings. The driver had to provide five main functions:" Interface with the physical adapter (which manages the physical and lower FC protocol layers)." Timing and flow control management, very different for networked FC arrays from local SCSI drives. This includes queue management and timers." Hiding FC discovery and temporary device loss. The driver must make the FC fabric appear as stable and static as the older parallel SCSI bus." Management of a number of error cases." Reporting of a number of administrative parameters (configuration, runtime events).The remarkable growth in FC adoption has resulted in development of more sophisticated storage management by most operating systems, including Linux with the 2.6 kernel and the addition of the Fibre Channel transport in the SCSI subsystem. This is leading to more Linux 2.6 Kernel Changes and Emulex 8.x Driverwww.emulex.comTechnology BriefUntitled Documentfunctions being handled by the SCSI mid-layer, and thinner, more compact FC drivers.Linux 2.6 kernel changes The Linux 2.6 kernel was released by the Linux community via the kernel.org website in late 2003, and is now part of a growing number of distributions including SUSE/Novell (SLES 9) and Red Hat (RHEL 4). This kernel introduced significant changes that affected all I/O drivers including Fibre Channel transports. Of particular interest is a reworked SCSI layer, enhanced I/O scheduler and queue management, new sysfs interface for status and event collection, and driver agnostic persistent naming modules. Four of the driver s five functions identified in the previous paragraph are affected:" Physical adapter interface: this is the least affected, except by PHP (the PCI Hot Plug capability, that Emulex supports on all current adapters)." Timing and flow control management: this driver function is drastically scaled back as these functions are properly handled in the SCSI subsystem." Error management: all Fibre Channel transport error cases remain handled by the driver, but SCSI error cases (discussed later in this document) are now handled by the SCSI subsystem." Administrative reporting/management interface: the events and parameters reported are largely unchanged, but they are now exposed through a new Linux sysfs interface making them more accessible to third party applications as well as user-developed scripts and tools.The positive aspects of this change are very visible:" Unified architecture spanning distributions and specific drivers." Convergence of vendor qualified enterprise drivers and kernel.org and distribution drivers (subject to additional enterprise kitting for utilities and libraries)." Consistent interface for third party applications." Reduced testing and qualification complexity for each individual driver or device.Emulex has taken a leadership role in Linux FC implementation:" Emulex sponsored the Fibre Channel transport patch that provided target block/unblock, fc_host, fc_rport (remote port), and target rescan capabilities for lower layer device drivers (LLDD) that implement a Fibre Channel transport; the transport is vendor agnostic and is currently implemented in the Q-logic and IBM kernel drivers. The Fibre Channel transport is part of kernels 2.6.11 and higher.Untitled Document" Emulex is implementing the Common HBA API (SNIA API) as well as its own command line and graphical management tools on top of the Linux SYSFS; this community recommended method eliminates the need for proprietary IOCTLs, makes the driver easier to review and maintain, and allows software vendors and end users to easily access Fibre Channel configuration and status information with the tools of their choice." Emulex is actively testing and providing production enhancing requests to the developers of Device Mapper and its multipath agent.Emulex 2.6 kernel driver strategy Emulex has worked in cooperation with the Linux/SCSI community to deliver:" Optimized driver SCSI midlayer interaction." Insertion of the 8.x driver into the 2.6 Linux reference kernel as maintained by Linux community and made available via kernel.org, effective with kernel version 2.6.12 rc3." Insertion of the latest driver updates into the enterprise distributions we currently support (RHEL, SLES, Red Flag). The 8.x driver forms the basis of the Emulex Fibre Channel driver going forward with 2.6 kernel based distributions. Emulex will continue its policy of providing close direct support to its tier 1 OEMs, and of providing kits extending and completing our product beyond the confines of standard kernel components: HBA API library, adapter diagnostic utility, graphical management application, etc.Emulex, in working with maintainers of the Linux operating system and SCSI midlayer, has made modifications designed to create consistent, vendor agnostic behavior in many areas, leverage the advances within the 2.6 kernel and, in the end, to create a more robust FCP driver that takes full advantage of 2.6 kernel capabilities. As a result of these changes, much functionality has been transferred out of the device driver, deferring instead to kernel facilities that provide similar or identical functions. These differences are described below. Note that the Emulex 8.x series driver continues to provide:" SCSI/FCP Support." Boot support on all supported architectures (x86, x64, ia64 and ppc64)." Driver support for the Emulex HBA API (v1) library." Driver support for Emulex utilities including lputil, HBAnyware, and AutoPilot Manager.Lastly, the Emulex Linux driver and host bus adapters support the enhanced 2.6 kernel PHP (PCI Hot Plug) capability.Detailed review of 2.6 interface and Emulex 8.x driver changes:1. The driver defers all LUN handling to the SCSI midlayer. In the 2.6 kernel, the SCSI midlayer handles management of all storage devices, including SCSI LUNs. The philosophy of LUN management is different, with user controlled driver parameters now replaced by system controlled dynamic variables:www.emulex.comUntitled Document" lpfc_lun_skip Retired from driver implementation; driver now defers to the midlayer s whitelist/blacklist for devices and the BLIST_SPARSELUN fag; target devices not completing SCSI device discovery properly because of compliance issues with SCSI-2 or SCSI-3 are now required to use the BLIST database to correct SCSI device discovery." lpfc_max_lun Specifies the maximum number of LUNs per target supported by the driver; this parameter no longer sizes tables internal to the driver; the value is provided to the midlayer where it is used as a maximum value for probing and adding luns during report luns and sequential scan; the default value is 256; this parameter is global, thus it affects all ports, and is not writable after driver load." lpfc_dqfull_throttle_up_inc, lpfc_dqfull_throttle_up_time These little used variables have been retired at SCSI maintainer request; the kernel does export a routine to alter the lun_queue_depth based on queue full return values from the target, but instead defers to the SCSI subsystem to manage the queue full responses." lpfc_lun_queue_depth Specifies the maximum number of outstanding SCSI commands per lun; this number defaults to 30 in the driver and is communicated to the midlayer for its lun queue management; this parameter is global, thus it affects all ports, and is not writable after driver load; in addition, the Emulex 2.6 kernel driver now takes advantage of all the midlayer queuing, and queue full handling policy. This provides more consistent behavior among SCSI products.2. The driver no longer sets per target management parameters:" lpfc_max_target Retired from driver implementation; used to specify the maximum number of targets supported per HBA; this number has become a fixed value of 256 in the driver; the choice of 256 is to accommodate JBODs as opposed to target arrays since production environments of 10 or more target arrays are uncommon." lpfc_tgt_queue_depth Retired from the driver implementation; the 2.6 midlayer s lun queue depth handling and queue full handling made this parameter obsolete.3. Per adapter settings are not supported via modprobe.conf Modprobe.conf is an operating system provided fle, as opposed as an Emulex driver file. Driver configuration at boot time, when defined by modprobe.conf is global for all Emulex HBAs. The user retains the option of changing per port settings for dynamic parameters after initial boot. The Emulex default global configuration for boot purposes continues to support auto topology and auto speed configuration as a default, thus accommodating the most common per-adapter differences automatically. 4. The SCSI midlayer performs all SCSI error recovery. This provides consistent error handling across all transports binding to the SCSI layer and consistent Untitled Documentwww.emulex.comerror messages provided to relevant applications. As a result, the following configuration variables are no longer supported in the 8.x driver:" lpfc_check_cond_err Specifies the treatment of certain check conditions as SCSI-FCP response (RSP) errors; Emulex does not believe this parameter is currently in use." lpfc_delay_rsp_err Specifies whether the driver delays SCSI-FCP response errors and certain check conditions before returning them to the SCSI layer; Emulex is not aware of any vendor currently making use of this parameter." lpfc_extra_io_tmo Specifies an extra timeout value added to the individual SCSI-FCP command s operating system assigned timeout; retry logic and Fibre Channel block semantics in the SCSI midlayer make this parameter obsolete." lpfc_no_device_delay Specifies the time interval in seconds between detecting a failed I/O and actually failing the I/O resulting from a Fibre Channel target device loss event; retry logic and Fibre Channel block semantics in the SCSI midlayer make this parameter obsolete." lpfc_nodev_holdio Specifies whether the driver holds I/O errors if a device disappears; the hold is cancelled when the device returns; retries and the Fibre Channel block semantics in the SCSI midlayer make this parameter obsolete.With the removal of these parameters and the midlayer s error handling comes some new behavior in the 2.6 Linux kernel with scsi device (lun) state. Distributions based on the 2.6.9 kernel introduced a SCSI device attribute called state that represents that devices current operational state. For example, a device could be running, blocked, or offline. These states are new to 2.6.9+ in particular and represent enhanced error handling by the SCSI midlayer. Of all the new states, the offline state has caused the most confusion and concern for internal qualification efforts and for external customers. A SCSI device (lun) transitions to the offline state whenever an error has occurred on a lun from the midlayer s point of view, but only after various retry attempts have been made (including basic abort/retry, LUN resets, and Bus resets). When this decision is made, the midlayer s error handling takes the device offline forcing the involvement of a system administrator. Once the problem is cleared, the system administrator can put that device back on line either manually or by using an Emulex script called lun_change_state.sh.Distributions using kernels preceding 2.6.9 have very similar behavior, but don t have this state attribute. These distributions are also correctable via the lun_change_state.sh script.5. Persistent Naming formerly called Persistent Binding is managed by 2.6 Linux hotplug subsystem. The 2.6 kernel includes two primary mechanisms for persistent naming:" udev (described at http://www.kroah.com/linux/talks/ols_2003_udev_paper/Reprint-Kroah-Hartman-OLS2003.pdf).Untitled Document" devlabel (described at http://linux.dell.com/devlabel/devlabel.html).Emulex recommends udev for operation with the 2.6 kernel because it is the mainstream choice, although the Emulex driver interoperates with both udev and devlabel. The introduction of udev into the 2.6 kernel stream represents a major function change to the Emulex 8.x driver. The driver no longer provides persistent bindings as it did in 7.x because it defers to the distribution s udev implementation. Udev can run at boot time for boot from SAN configurations or during OS load time. Either configuration provides persistent naming for SCSI devices in the SAN.Given a SCSI device with a unique identifier, udev provides system administrators with the ability to name that SCSI device and have it persist across reboots.The udev mechanism provides binding at the device/lun level. This resolves the issue that lun reordering posed to previous target ID binding methods.6. The 8.x driver no longer handles Peripheral Set Addressing (PSA) or Volume Set Addressing (VSA) for lun addresses The SCSI midlayer now handles these addressing modes during the post report luns processing. This provides a common, vendor agnostic mechanism for management of this function. Reference for VSA addressing: SCSI Architecture Model -2 section 4.9.3 Single Level logical Unit Numbers (address methods 00b and 01b).7. Link Event Management While the driver still responds to and provides messaging for link events, the driver no longer manages a link down timeout value. Therefore, the driver has the following changes:" lpfc_linkdown_tmo Retired from driver implementation; the link down functionality has been moved into per target timeout value; the new parameter is called no_dev_tmo and is manageable at driver load time and during runtime.8. Discovery Changes With the retiring of the lpfc_linkdown_tmo parameter mentioned above, the driver focused its event processing around the disappearance and reappearance of targets. The driver now provides a timeout value for targets to disappear and recover before returning connect errors to the SCSI midlayer:" lpfc_nodev_tmo Specifies the amount of time in seconds a discovered target can disappear before the driver treats it as no connection; the current default is 30 seconds; the timeout value is global to all targets managed on a particular host instance, but is not global to all ports in a system.To aid in faster initial discovery times and larger SAN events involving multiple targets, the driver modified the discovery threads parameter:" lpfc_discovery_threads Specifies the number of outstanding ELS events the driver allows at any time; the new default is 32; earlier programs like 7.x defaulted this value to 1; with the default setting of 32, it is likely that targets will be Untitled Documentwww.emulex.comPARAMETERlpfc_fdmi_onlpfc_log_verboselpfc_nodev_tmolpfc_use_adisclpfc_ack0lpfc_fcp_classlpfc_link_speedlpfc_lun_queue_depthlpfc_max_lunslpfc_scan_downlpfc_fcp_bind_methodlpfc_topologylpfc_cr_countlpfc_cr_delaylpfc_discovery_threadslpfc_scsi_req_tmoACCESS RIGHTR WRRRAVAILABLE VIA SYSFSYesYesNoNoAVAILABLE AS MODULE PARAMETERYesYesYesYesRUNTIME/LOAD-TIME CONFIGURABLERuntimeLoad-timeLoad-timeLoad-time via lpfcdfcAPPENDIX 1 8.0 Driver Parameter ListAPPENDIX 2 Driver Attributes Removed from 8.x Driver1231231: Accessible via sysfs; runtime (dynamic) configurable2: Accessible via sysfs; load-time (static) configurable only3: Not accessible via sysfs; load-time (static) configurable only7.x PARAMETERlpfc_check_cond_errlpfc_delay_rsp_errlpfc_dqfull_throttle_up_timelpfc_dqfull_throttle_up_inclpfc_extra_io_tmolpfc_lun_skiplpfc_no_device_delaylpfc_nodev_holdiolpfc_fcp_bind_DIDlpfc_fcp_bind_WWNNlpfc_fcp_bind_WWPN2.6 KERNEL HANDLINGManaged by midlayerReplaced by one parameter:lpfc_fcp_bind_method7.x PARAMETERlpfc_linkdown_tmolpfc_automaplpfc_inq_pqb_filterlpfc_network_onlpfc_ip_classlpfc_post_ip_buflpfc_xmt_que_sizelpfc_max_targetlpfc_tgt_queue_dept2.6 KERNEL HANDLING[Driver] lpfc_nodev_tmoAutomap always onWas workaround for bug in a distributionIP/FC removedN/AThe following parameters, available with the current 7.x drivers for the 2.4 kernel, have been removed from the 78.x drivers for the 2.6 kernel.Untitled DocumentThis report is the property of Emulex Corporation and may not be duplicated without permission from the Company. | September, 2005www.emulex.comCorporate HQ 3333 Susan Street Costa Mesa CA 92626 714 662.5600 | Wokingham U.K. (44) 118 977 2929 | Paris France (33) 41 91 19 90 | Beijing China (86 10) 6849 954706-078 9/05The Emulex 8.0 driver is an Open Source driver published under the Gnu Public License (GPL) under the same terms as other driver versions. Applications and libraries are published under the Emulex license.8.0 driver source code is published and available for inspection on the sourceforge web site (http://sourceforge.net/projects/lpfcxxxx). Versions of the 8.0 driver are also included in the Linux kernel (starting with 2.6.12 RC3) and distributed as part of Red Hat Enterprise Linux 4 and SUSE Linux Enterprise Server 9.discovered in a different order from boot to boot; as a result, udev is highly encouraged to maintain target/lun persistent naming. The user can divert to the former, slower, discovery by setting this variable down to 1.To simply the type of binding to nodes, the driver condensed the many bind methods into a single parameter:" lpfc_fcp_bind_method Specifies WWNN, WWPN, DID, or ALPA method of binding for each nport; consistent bindings and mapped bindings use this method.9. Boot from SAN Much of the work required to build a boot from SAN configuration remains the same as in previous driver programs. What is different is the use of udev in the initial ram disk to provide the persistent name for mapping the unique lun identifier on a storage device to a name that persists across reboots in the initiator. The use of udev in the initial ram disk caused problems for Linux users on some Linux distributions. The problem was tracked down to an incomplete file set used to compile the initial ram disk specifically, some of the more important udev files, such as some rules files, and the applications necessary to execute the rules were missing. Also, particular to fibre channel, the new fibre channel transport module was missing causing driver load errors in boot from SAN configurations.The distributions that had this problem have been repaired with later updates.