RSS FeedWhite Papers

White Paper Download

Lotus Domino 7 on Linux for IBM System z

Capacity planning and performance updates

Category: Servers

Date: , 14:00

Company: IBM

About 18 months ago, the Scalability Center for Linux on zSeries (SCL) in Poughkeepsie performed a study on Domino 6.5 for Linux on IBM System z. During that study, capacity planning and performance data was collected using an industry-standard benchmarking tool to drive both NRPC (Notes) and Domino Web Access (HTTP) e-mail users. The results of this benchmark showed two bottlenecks that prevented linear scalability of throughput for Domino 6.5 for Linux on System z.

The first bottleneck was due to the fact that Domino 6.5 does not support the 64-bit Linux kernel on either zSeries or Intel®. To scale Domino within an OS image, multiple Domino servers or partitions (also called DPARs) are typically deployed to extend Domino scalability. Each DPAR has its own set of virtual address spaces to support its users. The SCL’s testing showed that a Linux instance with multiple DPARs and many users per DPAR caused the Linux kernel to swap heavily as the number of users was increased past a certain threshold because of the 2 GB real memory limitation of the 31-bit kernel. To prevent any one kernel from swapping heavily, multiple Linux images, each with only a single production Domino 6.5 server, must be deployed for production environments.

ibm.com/redbooksRedpaperFront coverLotus Domino 7 on Linux for IBM System zCapacity Planning and Performance Updates Don CorbettJuergen DoelleBarbara FilippiWu HuangMike WojtonScalability under LinuxPerformance under z/VMMultiple Domino serversUntitled DocumentUntitled DocumentInternational Technical Support OrganizationDomino 7 for IBM System z: Capacity Planning and Performance Updates December 2006Untitled Document Copyright International Business Machines Corporation 2006.   All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.First Edition (December 2006)This edition applies to Release 7 of IBM Lotus Domino for System z.Note: Before using this information and the product it supports, read the information in  Notices  on page v.Untitled Document Copyright IBM Corp. 2006. All rights reserved.iiiContentsNotices  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viPreface  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiThe team that wrote this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiChapter 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11.1 Background for Domino 7.0 capacity and performance study . . . . . . . . . . . . . . . . . . . . 21.2 Study objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 How to use this paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Chapter 2. Domino and the benchmarking tool used . . . . . . . . . . . . . . . . . . . . . . . . . . .  52.1 Benchmark driver and workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Benchmark versus production workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Chapter 3. Test environments and scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  93.1 Hardware used to test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Software used to test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Domino network configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.1 DPARs within a single Linux LPAR or guest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.2 DPARs within multiple z/VM Linux guests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Chapter 4. Background information and terminology for interpreting data runs . . . .  154.1 ETR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 CPU data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 ITR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Chapter 5. Study results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  195.1 Domino workload processor scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.1.1 NRPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.1.2 DWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 CPU cost of single versus multiple DPARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2.1 NRPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2.2 DWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 z/VM Linux guests versus Linux LPARs cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.1 NRPC single guest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.2 NRPC multiple guests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.3.3 DWA single guest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.3.4 DWA multiple guests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 Impact of virtual-to-real CP ratios for z/VM guests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.5 Maximum number of NRPC users in Linux LPAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Chapter 6. Conclusions and recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  376.1 Linux memory and z/VM bottlenecks removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 Domino scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.3 Running with multiple Domino servers (DPARs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.4 Domino and z/VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Untitled Documentiv     Domino 7 for IBM System z: Capacity Planning and Performance Updates Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  43Related publications  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  45IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  47Untitled Document Copyright IBM Corp. 2006. All rights reserved.vNoticesThis information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE:This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. Untitled Documentvi     Domino 7 for IBM System z: Capacity Planning and Performance Updates TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Domino Informix IBM Lotus Lotusphere Redbooks"Redbooks (logo)"System z"System z9"z/OS z/VM zSeries z9"The following terms are trademarks of other companies:SAP R/3, SAP , and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries.Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.Intel, Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.Linux is a trademark of Linus Torvalds in the United States, other countries, or both.Other company, product, or service names may be trademarks or service marks of others. Untitled Document Copyright IBM Corp. 2006. All rights reserved.viiPrefaceIBM Lotus Domino 7, which first became available in the Fall of 2005, includes many enhancements in the areas of capacity and performance. Foremost among them is Domino 7 support of a 64-bit Linux kernel on IBM for System z", which improves Domino s throughput and offers more vertical scaling than previous releases. Additionally, some I/O performance enhancements became available with z/VM 5.2, which also benefit customers who wish to implement Domino to run in z/VM Linux guests. Testing has shown that Domino 7 and z/VM 5.2 together have some clear advantages over their predecessor releases, and greatly improve the total cost of ownership for running Domino. This IBM Redpaper discusses the results of those tests and provides some recommendations.The team that wrote this RedpaperThis IBM Redpaper was produced by a team of specialists from around the world.Don Corbett is a Senior Software Engineer in the System z9" Development Software Performance Department in Poughkeepsie, New York. Don has more than 40 years of experience in IT working at IBM. His areas of expertise include operating system performance analysis and design for Linux on System z9 and z/OS . He has led multiple performance benchmarking and capacity planning projects, including Domino for Linux on System z9 and other middleware products.Dr. Juergen Doelle is the project leader of the Linux end-to-end performance project, which examines the performance of various middleware products and applications on Linux on System z. Since 1994, he has worked as a developer for products on z/OS and USS, and since 2001 as performance analyst for Linux on System z. His areas of performance experience are SAP R/3, Informix , and Oracle databases, in addition to Linux kernel performance with the focus on disk I/O, storage servers, and the FCP attachment for SCSI.Barbara Filippi is a Consulting IT Specialist with the Domino for System z9 Team in the Washington Systems Center. She has worked at IBM for 27 years and has been involved with Domino on System z9 since it initially became available on that platform. Her areas of expertise include Domino installation and administration, capacity planning, performance analysis, and migration to System z9 from other Domino platforms. Wu Huang is a member of the Lotus Domino Performance team in Poughkeepsie. His main focus is on Domino performance for Linux on System z9, z/OS, and Rhel 4. He joined the IBM System z9 Domino Performance Team in 1998. He is a member of the NotesBench Consortium. Mike Wojton is a Certified Senior IT Specialist. He has more than 23 years of experience in the IT industry working at IBM. He currently works at the Washington System Center supporting Domino for System z9. He has been involved with Domino for System z9 since the first Beta release with 4.5.1 in 1997 and performed the first install with a Beta customer. He has presented on and written about Domino installation, benchmarking, performance, capacity planning, problem determination and problem source identification. He has also participated in the Performance Zone at Lotusphere since 2000.Thanks to the following people for their contributions to this project:Untitled Documentviii     Domino 7 for IBM System z: Capacity Planning and Performance Updates Mike EbbersInternational Technical Support Organization, Poughkeepsie Center, IBMThe team that did the system setup and performed the benchmark measurements:Stephen McGarril   IBM System z Benchmark Center, PoughkeepsieEugene OngIBM System z Benchmark Center, PoughkeepsieJudy ViccicaIBM System z Benchmark Center, PoughkeepsieChris WilliamsIBM Project Manager for zSeries Benchmark Center, PoughkeepsieOther people who supported our efforts:Cherie Barnes IBM Endicott z/VM development performance team - z/VM sizing toolEvanne Bernardo   IBM Size390 team Gaithersburg- sizing toolsBill BitnerIBM Endicott z/VM development performance team - z/VM sizing toolJohn CampbellIBM Size390 team Gaithersburg - sizing toolsJon EntwistleIBM System z Performance, PoughkeepsieClark GoodrichFormer Domino for IBM System z Development team leader, PoughkeepsieBecome a published authorJoin us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You will team with IBM technical professionals, Business Partners, and/or customers. Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you will develop a network of contacts in IBM development labs, and increase your productivity and marketability. Find out more about the residency program, browse the residency index, and apply online at:ibm.com/redbooks/residencies.htmlComments welcomeYour comments are important to us!We want our papers to be as helpful as possible. Send us your comments about this Redpaper or other Redbooks" in one of the following ways:Use the online Contact us review redbook form found at:ibm.com/redbooksSend your comments in an e-mail to:redbook@us.ibm.comMail your comments to:IBM Corporation, International Technical Support OrganizationDept. HYJA Mail Station P0992455 South RoadPoughkeepsie, NY 12601-5400Untitled Document Copyright IBM Corp. 2006. All rights reserved.1Chapter 1.   IntroductionThis chapter describes the background behind this testing study, the objectives, and how to put the results to use in your installation.1Untitled Document2     Domino 7 for IBM System z: Capacity Planning and Performance Updates 1.1 Background for Domino 7.0 capacity and performance studyAbout 18 months ago, the Scalability Center for Linux on zSeries (SCL) in Poughkeepsie performed a study on Domino 6.5 for Linux on IBM System z. During that study, capacity planning and performance data was collected using an industry-standard benchmarking tool to drive both NRPC (Notes) and Domino Web Access (HTTP) e-mail users. The results of this benchmark showed two bottlenecks that prevented linear scalability of throughput for Domino 6.5 for Linux on System z. The first bottleneck was due to the fact that Domino 6.5 does not support the 64-bit Linux kernel on either zSeries or Intel . To scale Domino within an OS image, multiple Domino servers or partitions (also called DPARs) are typically deployed to extend Domino scalability. Each DPAR has its own set of virtual address spaces to support its users. The SCL s testing showed that a Linux instance with multiple DPARs and many users per DPAR caused the Linux kernel to swap heavily as the number of users was increased past a certain threshold because of the 2 GB real memory limitation of the 31-bit kernel. To prevent any one kernel from swapping heavily, multiple Linux images, each with only a single production Domino 6.5 server, must be deployed for production environments. The second bottleneck was tied to z/VM4.4, the release that was used to test Domino 6.5 running in Linux guests. The I/O support modules for z/VM 5.1 and earlier releases still use 31-bit addressing, which implies that any I/O that is done by a guest running above the 2 GB line must be moved below the 2 GB line to be completed. The fact that the available storage for I/O below the 2 GB line is limited in size is a major issue. If there is enough I/O being driven by Linux guests, it causes z/VM to page this area, which can cause z/VM to page heavily under high disk I/O demands. During the SCL s testing, they were not able to scale Domino 6.5 beyond 13,000 active NRPC users (spread across multiple z/VM guests) running under a single z/VM LPAR. For larger Domino 6.5 production deployments with robust I/O requirements, this implies that multiple z/VM LPARs might have to be deployed to support Domino in order to avoid z/VM paging issues. Domino 7.0 and z/VM 5.2, which both became generally available in the latter half of 2005, offer relief for the two bottlenecks that are associated with their predecessor products. Domino 7 supports a 64-bit kernel which allows a Linux instance to support multiple Domino DPARs as long as enough memory is available to Linux to minimize swapping. With z/VM 5.2, I/O processing has been changed to not move the target page for I/O below the 2 GB line. With this release of z/VM, disk I/O no longer contributes to z/VM paging. For more information about z/VM 5.2 changes that improve performance, see:http://www.vm.ibm.com/perf/reports/zvm/html/A few months after Domino 7.0 became available, a study was organized by the Linux End-to-End Performance Group in Germany to verify the extent to which the bottlenecks had been removed and to provide updated capacity and performance information for Domino 7 for IBM System z running on Linux. Untitled DocumentChapter 1. Introduction     31.2 Study objectives The primary focus of the Linux End-to-End Performance Group s study was to show improvements and differences between running Domino 7.0 and Domino 6.5 in a Linux for System z environment. Although all testing was done at the Domino 7.0 level, the results that were reported for this study would be applicable to follow-on maintenance releases for Domino 7. The high-level objectives of the study were as follows: Show that prior Linux memory and z/VM I/O bottlenecks have been removed.Determine how linearly Domino scales as the number of CPs and users are increased.Measure the overhead of running multiple DPARs versus a single DPAR in a Linux LPAR.Measure the overhead of running Domino within Linux z/VM guests versus Linux LPARs.Measure the impact of virtual-to-real CP ratios for Domino in a z/VM Linux guest environment.Define new guidelines for running Domino 7 in Linux images:    Number of active 15-minute users per DPAR (This means users who have been active in a 15-minute period, which should not be confused with connected users who are not necessarily active during each 15-minute period.)    Number of DPARs per Linux image    Memory usage per DPAR and Linux image    Any other performance and tuning recommendations for Linux, z/VM, and Domino1.3 How to use this paperThis Redpaper presents the latest capacity and performance guidelines for Domino 7 and is intended to be used in conjunction with the IBM Redbook IBM Lotus Domino 6.5 for Linux on zSeries Implementation, SG24-7021. This book has valuable implementation information applicable to both Domino 6.5 and 7 on Linux. It covers a broad range of topics for deploying a Domino for IBM System z server on Linux, including: Planning the operating system, disk, and network environmentInstalling and administering Linux and DominoCapacity planning and performanceThe capacity and performance discussion in the Domino 6.5 implementation book has two components: how to monitor capacity and performance, and capacity and performance configuration guidelines for running Domino 6.5 on Linux. The discussion about monitoring capacity and performance is valid for both Domino 6.5 and 7 releases. In this paper, we cover the new capacity and performance guidelines specific to Domino 7. Another IBM Redbook that has valuable information about tuning Domino is Best Practices for Lotus Domino on System z: z9 and zSeries, SG24-7209. Again, the recommendations documented in this book are applicable to both Domino 6.5 and 7, unless otherwise stated.When installing a new release of Domino, always review the latest installation documentation and release notes for the most up-to-date procedures and recommended configuration settings. Untitled Document4     Domino 7 for IBM System z: Capacity Planning and Performance Updates Untitled Document Copyright IBM Corp. 2006. All rights reserved.5Chapter 2.   Domino and the benchmarking tool usedThis chapter discusses the benchmarking tool and specific workloads that are used to test Domino 7.0. It also describes the differences between production and benchmark workloads, and offers some guidelines for interpreting data derived from benchmark testing. 2Untitled Document6     Domino 7 for IBM System z: Capacity Planning and Performance Updates 2.1 Benchmark driver and workloadsThe same industry-standard benchmarking tool and specific workloads were used for testing Domino 7.0 as were used for Domino 6.5 to guarantee valid comparisons between the two releases. For each run, this tool generated a transactions-per-minute (TPM) metric, not to be confused with the Domino transaction statistic, along with a value for the maximum capacity (number of active users) supported and their average response time. From a tool perspective, a workload is a defined script that is used to simulate user activity through specific applications. Workloads typically cover a variety of protocols such as IMAP , NRPC, and HTTP . Workload scripts provide a common method to apply a consistent, repeatable load against the Domino server in order to assess the effects of various operating systems, hardware, and configuration changes. For this study, only NRPC and HTTP workloads were considered. For the NRPC client testing, a workload script called R6Mail was used, which emulates Notes users with their own mail file systems. It repeats a set of user actions every 90 minutes. For the DWA testing, a script called R6iNotes was used to emulate HTTP users with their own mail file systems. Again, this script repeats a series of actions every 90 minutes. Table 2-1 lists the user actions that were invoked by both workloads. The R6Mail and R6iNotes scripts were also used in previous testing for Domino 6.5. Table 2-1 User actions invoked by workloadsFor these scripted workloads, the size of each inbox is around 20 MB. Keep in mind that the scripted actions and inbox and message sizes reflect benchmark workloads. Production workloads would be quite different with much larger inboxes and message sizes, movement of documents from the inboxes to folders, and an increased frequency of creating memos and replying to received mail.Before each run for this study, all new copies of the mail files were allocated and initialized to minimize variability. Then users were ramped up (logged in) at 1000 to 1500 at a time until the desired number of users was attained for a particular test case. After steady-state had been reached (all users actively executing the test scripts/workloads and no more users logging in), the test runs were allowed to execute for a period of time during which performance and capacity data were collected. Runs with average response times of greater than one second were discarded as unacceptable and not included as input into the results of the study. Actions every 90 minutesR6Mail NRPC workload R6iNotes DWA workload Open Inbox66Read Message3030Delete Message1212Add Message to Inbox2 (50 KB)2 (100 KB average)Send Message to 3 Recipients    1 (100 KB average)1 (100 KB average)Send Calendar Invitation to 3 Recipients11Send RSVP11Close Inbox66Untitled DocumentChapter 2. Domino and the benchmarking tool used     72.2 Benchmark versus production workloadsThe user workloads that were simulated through the tool showed that Domino servers running in Linux LPARs and z/VM guests can support several thousand users if properly configured. However, as discussed earlier, simulated users are not equivalent to production users. The processing scenarios for the simulated clients did not include all the functions, features, and third-party software that production users might access, such as Domino administration tasks, view/full-text indexing, anti-virus, and others. Also, simulated users are constantly accessing the Domino server with a steady amount of work, which is not comparable to the ebb and flow of a production workload.Domino servers that are used for benchmark testing are typically brought up with a minimal configuration of two to four Domino tasks (Server, Router and HTTP tasks, and in the tests under discussion Logasio for Domino transaction logging). Most Domino production servers will be running at least 10 to 20 tasks or more. Besides the tasks brought up on benchmark servers, production environments might include Update to index views and databases, AMgr to invoke event-driven or scheduled processing, AdminP to perform administrative tasks on the Domino server, Collect to collect Domino statistics, and many others. Production users have different CPU needs, ranging from very light users in, for example, a manufacturing facility, to very heavy users at a corporate headquarters. Although benchmark users frequently have lighter CPU requirements than production users, this does not negate the value of benchmark testing. Because benchmark testing guarantees a standard workload, it is a valid way of driving CPU utilization for measuring things such as workload scalability, differences in CPU costs when configuration changes are made, and some CPU improvements between Domino releases. Untitled Document8     Domino 7 for IBM System z: Capacity Planning and Performance Updates Untitled Document Copyright IBM Corp. 2006. All rights reserved.9Chapter 3.   Test environments and scenariosThis chapter describes the hardware, software, and network configuration (LPAR and z/VM guests) used to test Domino 7.0. Additionally, it shows how we were able to configure the Linux and network environments to run with multiple Domino partitions or servers, a benefit of the new support for a 64-bit Linux kernel in Domino 7. 3Untitled Document10     Domino 7 for IBM System z: Capacity Planning and Performance Updates 3.1 Hardware used to testThe following hardware was used for this study:z9 system (2094-S18) with up to eight CPs for most of the measurements    Four additional CPs as needed.    The number of available CPs varied according to workload and test scenario. 42 GB memoryThe amount of memory used varied according to workload and test scenario. 1 DS 8000 DASD unit with 6 TB of space Workload driver workstations (2-way Intel Xeon boxes at 3.06 GHz with 4 GB RAM)    Nine x-345 Linux workstations, including one spare, to drive benchmark clients    One x-345 Windows workstation to act as the master controller Cisco 6500 series switch between workstations and z9 server One OSA-Express2 1000Base-T Ethernet card for the z9 server Figure 3-1 depicts the hardware configuration and client network connectivity to the z9.Figure 3-1 Hardware configuration and connectivity3.2 Software used to testThe following products and software levels were installed:Domino 7.0 Linux SUSE SLES 9 SP 2 Kernel without Fixed I/O Buffers patch on the Linux serverSUSE SLES 9 SP 2 on driver workstationsEXT3 file systemsz/VM 5.2  8 - 12 CP z9 Processor 42 GB Memory  Switch  x-  4   Lin  x   li  n  w  rki  n 1 x-345 Windows Master Controller  Untitled DocumentChapter 3. Test environments and scenarios     113.3 Domino network configurationA host system in the System z environment has the ability to run multiple copies of a Domino server. Domino partitions (DPARs) can run in a native logical partition (LPAR) or in a guest machine inside a z/VM LPAR. This section gives an example of each.3.3.1 DPARs within a single Linux LPAR or guestFigure 3-2 shows the network setup for the Domino Linux LPAR tests. Using this configuration, multiple Domino partitions were created to share the same Domino set of executables. Each DPAR had its own set of processes and each process had a 2 GB virtual address space, which were accessed by a predefined number of users.Figure 3-2 Network setup for Linux LPARs with multiple DPARs OSASwitch Client nClient 1 Domino DPAR nDPAR 1Linux Kernel TCP/IP Linux LPAR Untitled Document12     Domino 7 for IBM System z: Capacity Planning and Performance Updates Figure 3-3 shows the configuration for multiple DPARs in a single z/VM guest. Again, the guest had a single set of Domino executables shared by multiple Domino partitions.Figure 3-3 Network setup for z/VM guests with multiple DPARs OSASwitch Client nClient 1Domino DPAR n DPAR 1 Linux Kernel TCP/IP z/VM Guestz/VM LPAR Untitled DocumentChapter 3. Test environments and scenarios     133.3.2 DPARs within multiple z/VM Linux guestsFor the z/VM multi-guest runs, OSA routing by IP address was implemented. Figure 3-4 shows the configuration that was used with multiple z/VM guests. Each z/VM guest had a dedicated connection to the OSA card. Only one OSA card was required to handle all of the network traffic that was generated during the benchmarks. In a production environment, you might want to use multiple OSA cards if the throughput is more than what the OSA card can support. If multiple OSA cards are deployed, consider using VIPAs (virtual IP addresses) to allow load balancing of the network traffic. With the availability of multiple cards, VIPA also provides the added benefit of failover. Even if one of the cards should fail, traffic can be rerouted automatically through the other card.Figure 3-4 OSA routing for a multi-guest environment Linux Kernel  DominoDPAR nDPAR 1 Linux Kernel Domino DPAR nDPAR 1OSAz/VM zSeries LPAR Switch Guest 1Guest nClient nClient 1 OSA Routing Untitled Document14     Domino 7 for IBM System z: Capacity Planning and Performance Updates Untitled Document Copyright IBM Corp. 2006. All rights reserved.15Chapter 4.   Background information and terminology for interpreting data runsThis chapter provides definitions and background discussions about External Transaction Rates (ETRs), CPU data, and Internal Transaction Rates (ITRs), as preparation for understanding the test run data in Chapter 5. It is important to understand what these values represent and how to apply them. A sample scenario, outside the scope of the study under discussion, is used to illustrate what these measurements mean and how to interpret them. This will lay a foundation for the discussion of actual results in the following chapter.4Untitled Document16     Domino 7 for IBM System z: Capacity Planning and Performance Updates 4.1 ETR dataETR is the measurement of some external workload that is being performed during a benchmark. In Figure 4-1, the amount of workload (ETR) that was performed during a benchmark is plotted at five different sample times.Figure 4-1 ETR valuesFor each successive sample point (each point could be from single or multiple benchmark runs), the amount of work increased by 100% when compared to the initial point. In the case of a Domino benchmark, this might indicate that there was twice as much mail being sent between points one and two, and five times as much mail between points one and five. Figure 4-1 indicates that the system was able to handle the increase in workload over these samples, which is a good trend.Note: The discussion in this chapter is for background information. See Chapter 5,  Study results on page 19 for actual data runs.12345Sample Point0123456ETRSample ETR Va luesUntitled DocumentChapter 4. Background information and terminology for interpreting data runs     174.2 CPU dataHowever, ETR growth is only part of the picture. Another factor to consider is CPU. How much CPU does the workload cost? Figure 4-2 shows the amount of CPU that was consumed for the same sample points plotted in Figure 4-1 on page 16. The ETR measurements are also included in this chart to facilitate the comparison of trends between ETR and CPU. Figure 4-2 CPU valuesFigure 4-2 clearly shows that the amount of CPU used for the workload did not grow in the same linear fashion as the workload driving the system. The workload on the right side of the chart is taking proportionally more cycles than the workload on the left side of the chart.12345Sample Point01234567C PUETRSam ple CPU UsedUntitled Document18     Domino 7 for IBM System z: Capacity Planning and Performance Updates 4.3 ITR dataITR is the relationship of how much CPU is required to run specific benchmark workloads. It is calculated by taking the workload (ETR) and dividing it by the CPU percentage times 100. Figure 4-3 shows the ITR values for the same sample points plotted in the charts shown in Figure 4-1 on page 16 and Figure 4-2 on page 17.Figure 4-3 ITR valuesThe downward ITR trend shows is that as the workload was growing, it was getting more expensive to run. An upward trend would mean that as the workload was growing, processing it was becoming more efficient: less cost for the same types of transactions. All of the Domino 7.0 benchmark results were analyzed in terms of ETR and ITR. The ETR was calculated from the benchmark TPM (transactions per minute) divided by 60 to get transactions per second. (TPM is described in 2.1, Benchmark driver and workloads on page 6.) We used the ETR by itself to compare workload consistency between test runs where factors such as number of DPARs, CPs, memory, and other configuration options might be varied. For example, if the ETRs are very close in value between runs or they scale almost linearly when capacity is increased between runs, then the workload is very consistent. This means that users accessed the same functions at the same rate with very comparable response times. We calculated ITR from the ETR divided by the CPU percent utilization times 100. ITRs were used to do all of the analysis for the Domino 7.0 measurements, which included things like determining the scalability factor of workloads as CPs were added, the cost of multiple DPARs compared to a single DPAR, the impact of virtual-to-real CP ratios when running with z/VM, and so far. In summary, although ETR gives you the overall view of how much workload is being performed in benchmarks, ITR lets you know how efficiently that workload is being processed.12345Sample Point00.20.40.60.811.2ITRS am ple ITR V alue sUntitled Document Copyright IBM Corp. 2006. All rights reserved.19Chapter 5.   Study resultsThis chapter describes the various test scenarios that were run with Domino 7.0 in the general areas of: Domino workload processor scalabilityCapacity requirements of single versus multiple Domino partitionsCapacity requirements of running Domino in z/VM Linux guests versus Linux LPARsImpact of virtual-to-real CP ratios for Domino in a z/VM environmentMaximum number of Domino NRPC users supported by single Linux LPAREach section contains the details and results of running a specific test case, and a comparison of Domino 7.0 results to results from a comparable Domino 6.5 test run, if such a run was executed in the past. Compared to Domino 6.5, Domino 7.0 showed significant improvements with its ability to support multiple Domino partitions and up to 50,000 NRPC users within a single Linux image.5Untitled Document20     Domino 7 for IBM System z: Capacity Planning and Performance Updates 5.1 Domino workload processor scalabilityIBM publishes the Large Systems Performance Reference (LSPR) to compare capacity on the various zSeries and z9 processor models. A variety of standard workload mixes running on z/OS and Linux are defined to describe application scalability. A series of tests was run with Domino 7.0 to see how it compared to some standard LSPR workloads.Because the behavior of NRPC and DWA workloads are quite different, tests were run for each type of client to determine Domino s workload behavior. DWA is a much more CPU-intensive workload than NRPC because all of the processing is placed on the Domino server, causing it to consume approximately four times or more as many cycles as NRPC. Consequently, DWA supports significantly fewer users than NRPC for a given number of CPs. DWA clients in general are more memory intensive, which also limits the number of supported users on a Domino server. Within a single LPAR, several scenarios were run with NRPC and DWA clients to determine how linearly the Domino Notes and browser workloads scale as the number of users and CPs were increased. All tests were executed in a single LPAR running four Domino DPARs. A single 3390 mod 9 volume was defined for Linux swap space for all runs that were executed for this study (Linux LPARs and z/VM guests). However, little or no swapping was seen on any of the runs because enough main memory was allocated to the Linux images.5.1.1 NRPCOn zSeries, scalability is stated in terms of the transaction rate (not equivalent to Domino transaction rate!) that can be obtained with a given workload and how that transaction rate behaves when the number of processors is increased. Perfect scalability would have the transaction rate double when the number of CPs is doubled; however, due to a number of factors (among them the overhead associated with managing more CPs), this does not happen. There are two LSPR curves (among others) that characterize workload with very good and efficient scalability, CB-L (Commercial Batch Long Job Steps), and not so efficient scalability, OLTP-T (Traditional On-Line Workload). For this study, these workloads were used as the baseline to show how well Domino scales.MeasurementsTo determine how Domino 7 compares to the LSPR workloads, tests were run on a two-way and four-way processor. These scenarios were specifically designed to meet the following requirements for workload scalability testing: The number of users per CP must be the same across all runs.In other words, the number of NRPC users was doubled between the two-way and four-way runs.The overall processor CPU must be driven to somewhere between 85% and 95% for all runs.There was also an attempt to triple the workload of the two-CP scenario on a six-CP processor. However, this run was unsuccessful because of a dramatic increase in CPU utilization, and consequently was not included in the NRPC scalability analysis. For most applications, there normally would also be a one-CP run. This scenario was not run because of Domino s architecture, which requires access to at least two CPs for optimum performance, which is a cross-platform recommendation not specific to zSeries and System z.Untitled DocumentChapter 5. Study results     21Figure 5-1 shows the scalability behavior for NRPC clients distributed across four DPARs in a single LPAR with two and four CPs. When increasing from two to four CPs, the ITRs varied by a factor of 1.7, indicating that throughput decreased when the workload increased by a factor of two. (See Chapter 4, Background information and terminology for interpreting data runs on page 15, for a description of ITR or Internal Transaction Rate.) From an LSPR point of view, this is not a good scaling factor. Ideally, workloads scaled between a two-way and four-way processor should fall somewhere between 1.89 (OLTP-T) for those workloads with less scalability and 1.95 (CB-L) for those that scale very well. Perfect scalability would be a factor of 2.00, but that is not possible because there is associated overhead in managing multiple CPs. Figure 5-1 NRPC scalability for Domino 7Given these results, the conclusion is that this type of NRPC workload, a very large number of light users (low CPU usage per client), does not scale very well at four CPs. However, we do not believe the NRPC scalability results to be a problem in production environments because production NRPC users, even light ones, will have heavier CPU requirements. In the Domino 7.0 testing, more CPU-intensive workloads, such as DWA, showed a scaling factor very close to 1.95 on a four-way. See 5.1.2, DWA on page 22, in which the results of the DWA testing are described. So, we believe that a more CPU-intensive NRPC workload will scale very well up to at least four CPs, and possibly more, depending on the CPU intensity of the users. Similar tests, documented in Figure 5-2 on page 22, were run for Domino 6.5. At that time, Domino was compared to the LSPR workloads CB-L and CB-S (Commercial Batch Short Job Steps, by definition a workload with less scalability), and had a scaling factor of just over 1.5. The results for Domino 7.0 were somewhat better at 1.7. Also, keep in mind that the Domino 6.5 runs were done with a single DPAR in an LPAR, which meant fewer users than the Domino 7.0 runs. The lower number of users made a third test possible on six CPs.Domino 7.0NRPC Scalability0.00.51.01.52.02.53.03.54.02468# CPsCB-LOLTP-TDomino 7.0Untitled Document22     Domino 7 for IBM System z: Capacity Planning and Performance Updates Figure 5-2 NRPC scalability for Domino 6.5SummaryThe NRPC benchmark workload is characterized by low CPU utilization per user, but high numbers of users, and high I/O rates. Based on LSPR workload ratios, this workload does not scale well on four CPs. Compared to production environments, however, the NRPC benchmark workload is very light. Consequently, we see no issues with recommending up to four CPs (possibly more for more CPU-intensive workloads) for NRPC users in a single LPAR. Although the six-CP tests showed that scalability dropped off significantly after four CPs on a single LPAR, good scalability can be achieved with multiple four-CP LPARs for larger user deployments. Furthermore, Lotus recognizes that the current benchmark workload is very light and is developing new NRPC benchmark workloads more inline with production workloads for the next release of Domino.5.1.2 DWATo measure the DWA workload scalability, similar runs to those for NRPC, but with fewer users to accommodate heavier CPU demands, were executed on two-way, four-way, and eight-way processors. MeasurementsUsing the two-way run as a base, the number of DWA clients was doubled on the four-way run and quadrupled on the eight-way run. From an LSPR point-view, the DWA workload scaled very well: nearly the same as CB-L on the four-way and slightly better than CB-L on the eight-way as shown in Figure 5-3 on page 23. Domino 6.5 NRPC ScalabilityUntitled DocumentChapter 5. Study results     23Figure 5-3 DWA scalability for Domino 7Domino 6.5 also showed good scalability for the DWA workload as shown in Figure 5-4. It closely followed the scalability curve for CB-L, even slightly exceeding it at the upper end. Figure 5-4 DWA scalability for Domino 6.5Summary The DWA workload is characterized by a high CPU utilization per user and fewer active users when compared to NRPC. Based on Domino 7.0 testing, it scales very well up to eight CPs, and we expect would scale well beyond eight CPs if more hardware had been available for additional testing. Domino 7.0DWA Scalability0.00.51.01.52.02.53.03.54.02468# CPsCB-LOLTP-TDomino 7.0Domino 6.5DWA Scalability0.00.51.01.52.02.53.03.54.02468# CPsCB-LCB-SSingle Image DominoUntitled Document24     Domino 7 for IBM System z: Capacity Planning and Performance Updates 5.2 CPU cost of single versus multiple DPARsDomino as an application is architected to run with multiple tasks: some in the foreground to satisfy end-user requests (such as Server, HTTP , and POP3), others in the background to perform tasks such view indexing, event-driven processing, mail routing, anti-virus, and so on. Unlike some applications, Domino continuously polls for work; even an idle Domino server uses some number of cycles. This set of runs was designed to measure the CPU overhead of running a given NRPC and DWA workload in one large versus several smaller DPARs executing in a single LPAR, and to determine the most CPU-efficient number of DPARs for Linux images. Although all runs were executed in Linux native mode, these results would also be applicable to Domino running in z/VM guests.5.2.1 NRPCFor this set of tests, 12,000 NRPC users were used to drive one, two, three, and four DPARs. For the multi-DPAR runs, the 12,000 users were evenly distributed across the DPARs in other words, for the two-DPAR run, half of the users were assigned to each DPAR; for the three-DPAR run, one-third of the users were assigned to each DPAR; and so on. Transaction logging and mail routing were enabled on all of the servers. MeasurementsFigure 5-5 and Figure 5-6 on page 25 show the results of these tests. Figure 5-5 NRPC DPAR CPU utilization for Domino 7Domino 7.0NRPC DPAR CPU010203040506070801234# DPARsUntitled DocumentChapter 5. Study results     25Figure 5-6 DWA DPAR CPU utilization for Domino 7The CPU utilization for the single-DPAR run was higher than initially expected when compared to the two-DPAR, three-DPAR, and four-DPAR runs. But given that previous Development testing has shown that 12,000 active NRPC users is the scalability limit for a single Domino server running on zSeries Linux, this anomaly was not surprising. As Domino approaches this upper limit, it incurs more and more overhead to manage the large number of NRPC clients. The CPU utilization in Figure 5-5 on page 24 shows that when going from two to three DPARs, CPU load increased 3.7%, and when going from two to four DPARs, 12.9%. It should also be noted that the ETR (transactions per second) values were almost the same regardless of the number of DPARs, which proved that the workload was consistent across all four runs. Again and as expected, the ITR (throughput by cost) value was lower for the one-DPAR run, which was constrained by the large number of active users. Compared to the two-DPAR run, the ITR decreased 3.6% with the third DPAR and 11.3% with the fourth DPAR. In the past, a similar set of tests was run with Domino 6.5. As shown in Figure 5-7 on page 26, the Domino 6.5 test results were significantly worse than those obtained with 7.0. Domino 6.5 was tested with 4,200 NRPC users on one, two, and three DPARs. Again, the DPAR costs are expressed in terms of changes to ITR values. When comparing the first with the second DPAR, the ITR degraded by 7.6%; when comparing the first with the third DPAR, the ITR degraded by 25.8%. The large cost of the third DPAR was due to the fact that Linux was experiencing very heavy swapping, which also prevented additional testing with more users and a fourth DPAR. Compared to the one-DPAR scenario, CPU utilization increased 8.3% with the second DPAR and 35.2% with the third. The large discrepancy between change in CPU and ITR on the third DPAR shows the value of going to a cost-based metric such as ITR.Domino 7.0NRPC DPAR Cost01002003004005001234# DPARsETRITRUntitled Document26     Domino 7 for IBM System z: Capacity Planning and Performance Updates Figure 5-7 NRPC DPAR cost for Domino 6.5SummaryBased on the low overhead of the Domino 7.0 3 DPAR run compared to the two-DPAR run, a single Linux image would be able to efficiently support up to three DPARs, provided that sufficient memory is allocated to the image (see 5.2.2, DWA on page 26, for Domino 7 memory recommendations). This number of DPARs is recommended for an NRPC workload characterized by users with light CPU requirements and high active rates. An NRPC workload characterized by more CPU-intensive clients and a lower activity rate should be able to efficiently support more DPARs within a Linux image. However, production DPARs typically run with more Domino tasks and third-party tools than what was implemented during the testing for this study. Therefore, the costs of running multiple smaller DPARs versus a large DPAR may be higher in production environments than what benchmark results indicated. Consolidation of smaller DPARs into larger ones will in most cases result in measurable CPU savings. 5.2.2 DWAIn this set of tests, one to four DPARs were driven with 8,000 DWA users. As was the case for the NRPC tests, the 8,000 users were evenly distributed across each multi-DPAR run. Additionally, some testing was done to determine optimal memory configuration for DPARs. Memory was allocated using the following guidelines: *2 GB + 2 GB (for the kernel). Little or no paging was seen with these memory allocations.MeasurementsBecause the DWA workload scales very well and there were no resource constraints during these runs, varying the number of DPARs had minimal impact on the test results, as shown in Figure 5-8 on page 27 and Figure 5-9 on page 27.171.6158.5127.4050100150200Domino 6.5 NRPCDPAR Overhead  1 DPAR 2 DPAR 3 DPAR55.63%  CPU 60.23%CPU75.21%CPU7.6%DPARcost25.8%DPARcostUntitled DocumentChapter 5. Study results     27Figure 5-8 DWA CPU utilization for Domino 7Figure 5-9 DWA DPAR cost for Domino 7Again, the CPU utilization was somewhat inflated on the one-DPAR run because 8,000 users is close to the DWA user scalability limit for a single DPAR. The CPU increased by 3.7% and the ITR degraded by 3.3% between two and four DPARs. The ETR was flat for all of the runs, showing workload consistency. The small amount of overhead associated with distributing the DWA workload across several DPARs sharply contrasts with the much larger overhead for NRPC users as discussed in the previous section, and can be correlated with the larger number of less-CPU-intensive users. In fact, we would expect DPARs with more-CPU- intensive NRPC clients to behave more like DPARs driven by DWA users.For this set of tests, we cannot make comparisons with Domino 6.5 because no multi-DPAR DWA runs were executed for this release.Domino 7.0DWA DPAR CPU010203040506070801234# DPARsDomino 7.0DWA DPAR Cost0501001502002503001234# of DPARsETRITR

You must have an account to access this white paper. Please register below. If you already have an account, please login.

Already registered?

Login

Forgot password?

New customer?

White paper download

ComputerworldUK Webcast

ComputerworldUK
Share
x
Open
* *