Vertically Scaling InterSystems IRIS
Scaling a system vertically by increasing its capacity and resources is a common, well-understood practice. Recognizing this, InterSystems IRIS includes a number of built-in capabilities that help you leverage the gains. Some operate transparently, while others require specific adjustments on your part to take full advantage.
This chapter discusses how to calculate the memory and CPU requirements of a server hosting an InterSystems IRIS instance and application, both initially and after collecting benchmarking and load testing results and information from existing sites, and how to take the best advantage of vertically scaling by increasing system memory or the CPU core count. In some cases, you may use these guidelines to evaluate whether a system that was chosen based on other criteria (such as corporate standards and cloud budget limits) is roughly sufficient to handle your workload requirements, whereas in others you may use them to plan the system you need based on those requirements. Additional actions that may improve performance are also discussed.
Memory Planning and Management for InterSystems IRIS
The goal of memory planning and management is to provide enough memory on each host for all of the entities running on the host under all normal operating circumstances. This is a critical factor in both performance and availability.
Generally, there are four main consumers of memory on a server hosting an InterSystems IRIS instance, as follows:
The operating system, including the file system cache and the page file or swap space.
Running applications, services, and processes other than InterSystems IRIS and the application based on it.
InterSystems IRIS and application processes.
InterSystems IRIS is process-based. If you look at the operating system statistics while your application is running, you can see numerous processes running as part of InterSystems IRIS.
InterSystems IRIS shared memory, which includes:
The database cache (also known as the global buffer pool), in which data is cached to minimize disk reads; its size is a major factor in performance (see Vertically Scaling for Memory).
The routine cache, in which code is cached to minimize disk reads.
The generic memory heap, out of which shared memory is allocated automatically and manually for various instance purposes.
Other shared memory structures.
For the best possible performance, all of these consumers of InterSystems IRIS shared memory should be maintained in physical (system) memory under all normal operating conditions.Important:
Virtual memory and mechanisms for using it such as swap space and paging are important because they enable the system to continue operating during a transient memory capacity problem, but the highest priority (if resources allow) is to include enough physical memory to avoid the use of virtual memory altogether under normal operating circumstances.
Achieving this involves these three steps:
Estimating memory requirements before deployment.
Allocating shared memory during or after InterSystems IRIS deployment.
Reviewing actual memory usage in operation and making adjustments as needed.
Every application is different and any given system may require a series of adjustments to optimize memory use. The following two sections provide general guidelines to use as a first approximation for sizing system memory and allocating shared memory within InterSystems IRIS. Benchmarking and performance load testing the application will further influence your estimate of the ideal memory sizing and parameters.
If you have not configured sufficient physical memory on a Linux system and thus regularly come close to capacity, you run the risk that the out of memory killer may misidentify long-running InterSystems IRIS processes that touch a lot of memory in normal operation, such as the write daemon and CSP server processes, as the source of the problem and terminate them. This will result in an outage of the InterSystems IRIS instance and require crash recovery at the subsequent startup. Disabling the out of memory killer is not recommended, however, as this safety mechanism keeps your operating system from crashing when memory runs short, giving you a chance to intervene and restore InterSystems IRIS to normal operation. The recommended way to avoid this problem is to configure enough physical memory to avoid any chance of the out of memory killer coming into play. (For a detailed discussion of process memory in InterSystems IRIS, see Process Memory in InterSystems Products.)
Estimating Memory Requirements
As a very general guideline, InterSystems recommends at least 4 to 8 GB of system memory per CPU core for systems hosting InterSystems IRIS-based applications. For example, a 16-core system should have at a minimum of 64 GB of RAM, and preferably up to 128 GB.
This core count should not include any threads such as Intel HyperThreading (HT) or IBM Simultaneous Multi-Threading (SMT) (see General Performance Enhancement on InterSystems IRIS Platforms). So, for example, if you have an IBM AIX logical partition (LPAR) with 8 cores allocated, the calculation would be 4-8 GB * 8 = 32 to 64 GB of total RAM allocated to that LPAR, even with SMT-4 enabled and appearing as 32 logical processors.
You can arrive at a more specific estimate by adding up approximations of the following component memory requirements:
The amount needed for the operating system (including the file system cache and the page file or swap space) and all installed programs other than InterSystems IRIS.
The memory needs of other entities and processes running on the system can vary widely. If possible, make realistic estimates of the memory to be consumed by the software that will be cohosted with InterSystems IRIS.
For swap space or the page file, as a general guideline, plan on configuring either 2 GB (or 25-50% of your physical memory if the total is less than 4 GB) as virtual memory. As stated earlier, swapping and paging degrade performance and should come into play only when transient memory capacity problems (such as the failure of a memory card) require it. Further, you should configure alerts to notify operators when the system uses virtual memory so they can take immediate action to avoid more severe consequences.Note:
When large and huge pages are configured, as is highly recommended, InterSystems shared memory segments are pinned in physical memory and never swapped out; for more information, see Configuring Large and Huge Pages.
The amount needed for InterSystems IRIS processes.
In general, there are no more than 1000 InterSystems IRIS processes running in production, typically consuming 12-16 MB each. Therefore 12-16 GB will be sufficient in most cases. However, the number of InterSystems IRIS processes running and their memory needs can vary significantly, and your evaluation of your application’s requirements for the overall number of processes and memory partition size per process may indicate that you need a larger estimate for this component.
The amount needed for the InterSystems IRIS instance’s shared memory.
Most of an instance’s shared memory allocation is determined by the configuration parameters listed in the following table, which provides general guidelines for the sizing of each:
Parameter Determines the size of the If system memory ≤ 64 GB, allocate If system memory > 64 GB, allocate globals database cache (global buffer pool) 50% of total system memory 70% of total system memory routines routine cache 256 MB minimum 512 MB minimum gmheap generic memory heap * 256 MB minimum 384 MB minimum jrnbufs journal buffers 64 MB (default) 64 MB (default)
* The gmheap setting is specified in KB, but is provided here in MB for ease of comparison.
Once you have determined an initial setting for each of these parameters, you can estimate the instance’s total shared memory requirement using the following formula. (MaxServers + MaxServerConn specify the maximum number of ECP connections allowed from the instance and to the instance respectively, with defaults of 2 for the former and 1 for the latter.)
globals*1.08 + routines*1.02 + gmheap (in MB) + (number of cores)*2 + jrnbufs + (MaxServers + MaxServerConn)*2 + 300
For example, on a system with 128 GB of RAM and 16 cores, for an instance on which the ECP connections parameters will remain at the default settings (because it is not part of a distributed cache cluster), the total shared memory needed according to the provided sizing guidelines would be as follows:
(128 GB*0.7*1.08) = 97 GB + (522 + 384 + 32 + 64 + 6 + 300 = 1308 MB) = 98 GB total
On a 128 GB system, the InterSystems IRIS estimates of 16 GB for processes and 97 GB for shared memory come to 113 GB, leaving 15 GB for other systems purposes.
Actual process and shared memory use may differ from the above estimates. In particular:
The multiplier of 1.08 for the database cache (globals) applies to the default database block size of 8 KB. This factor is lower for larger block sizes; if you allow and allocate database caches to multiple block sizes, the smaller the proportion of the total allocated to the 8 KB block size, the smaller the effective multiplier will be.
The size of the shared memory heap may be automatically increased to reflect the host’s CPU count.
Reviewing Memory Usage describes how to review the instance’s actual use of process and shared memory.
Allocating InterSystems IRIS Shared Memory
The shared memory allocations discussed in the previous section can be made during deployment using the configuration merge feature by including the parameters cited above in a configuration merge file, as shown in the following example:
[config] globals=0,0,5000,0,0,0 gmheap=165888 jrnbufs=64 routines=151
The gmheap value is specified in KB, the others in MB. The multiple fields in the globals value specify the size of the database cache for different database block sizes; typically only the cache for the 8 KB block size is specified, as shown here.
The settings in the file are merged into the default configuration parameter file (CPF) during deployment, so when the instance starts up for the first time it will do so with the specified parameter values in place and its shared memory allocated accordingly. (Configuration merge is very useful in automated deployment, allowing you to deploy differently-configured instances from the same source by applying different merge files.)
If these allocations are not made using configuration merge at deployment, they can be specified immediately after deployment, or at any time, in the following ways:
Use configuration merge by executing the iris merge command.
Use the Management Portal; procedures are documented in the following locations:
Database cache and routine cache — Allocating Memory to the Database and Routine Caches in the System Administration Guide
Generic memory heap — Configuring the Generic Memory Heap in the System Administration Guide
Journal buffers — Configuring Journal Settings in the Data Integrity Guide
Use the appropriate ObjectScript class for the purpose, as described in the class reference and indicated in the parameter’s entry in the Configuration Parameter File Reference; for the memory parameters discussed here, this would be the Config.config class.
Edit the instance’s iris.cpf file (which is located in the install-dir/mgr directory) and change the values of the parameters described in the previous section.
When you have made all the desired changes by any of these methods, restart the instance so they can take effect.
Reviewing Memory Usage
At the end of instance startup, messages summarizing the instance’s shared memory allocations are written to the messages log, similar to this example, which incorporates the values from the example in Allocating InterSystems IRIS Shared Memory:
11/06/21-10:59:37:513 (91515) 0 [Generic.Event] Allocated 5682MB shared memory using Huge Pages 11/06/21-10:59:37:514 (91515) 0 [Generic.Event] 5000MB global buffers, 151MB routine buffers, 64MB journal buffers, 289MB buffer descriptors, 162MB heap, 6MB ECP, 9MB miscellaneous
The term buffer descriptors refers to the control structures associated with the global, routine, and journal buffers. As noted in Estimating Memory Requirements, the heap figure may be more than you specified for gmheap due to an automatic adjustment.
If you use configuration merge to deploy with your desired shared memory allocations, you can confirm that you have the allocation you intended by reviewing these messages. If you allocate shared memory following deployment, you can review them following the restart you initiate after making changes.
Once the system is operating, in testing or production, you can review actual memory usage within InterSystems IRIS as follows:
To view the instance’s shared memory usage, go to Management Portal’s System Usage Monitor page (System Operation > System Usage), then click the Shared Memory Heap Usage button to display the Shared Memory Heap Usage page; for a description of the information displayed on that page, see Generic (Shared) Memory Heap Usage in the Monitoring Guide.
To roughly estimate the maximum memory usage by InterSystems IRIS processes, multiply the peak number of running processes by the default Maximum Per-Process Memory (bbsiz) setting of 262.144 MB. However, if this setting has been changed to -1 for “unlimited” (see Setting the Maximum Per-Process Memory in the System Administration Guide), which is recommended by InterSystems for most production systems, a more detailed analysis is required to estimate the maximum memory usage by these processes is required. To learn more about memory use by InterSystems IRIS processes, see Process Memory in InterSystems Products.
If System Monitor (described in Using System Monitor in the Monitoring Guide) generates the alert Updates may become suspended due to low available buffers or the warning Available buffers are getting low (25% above the threshold for suspending updates) while the system is under normal production workload, the database cache (global buffer pool) is not large enough and should be increased to optimize performance.
Vertically Scaling for Memory
Performance problems in production systems are often due to insufficient memory for application needs. Adding memory to the server hosting one or more InterSystems IRIS instances lets you allocate more to the database cache, the routine cache, generic memory, or some combination. A database cache that is too small to hold the workload’s working set forces queries to fall back to disk, greatly increasing the number of disk reads required and creating a major performance problem, so this is often a primary reason to add memory. Increases in generic memory and the routine cache may also be helpful under certain circumstances.
Configuring Large and Huge Pages
Where supported, the use of large and huge memory pages can be of significant performance benefit and is highly recommended, as described in the following:
IBM AIX® — The use of large pages is highly recommended, especially when configuring over 16GB of shared memory (the sum of the database cache, the routine cache, and the generic memory heaps, as discussed in Estimating Memory Requirements).
By default, when large pages are configured, the system automatically uses them in memory allocation. If shared memory cannot be allocated in large pages, it is allocated in standard (small) pages. However, you can use the memlock parameter for finer-grained control over large pages.
For more information, see Configuring Large Pages on IBM AIX® in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
Linux (all distributions) — The use of static huge pages (2MB) when available is highly recommended for either physical (bare metal) servers or virtualized servers. Using static huge pages for the InterSystems IRIS shared memory segments yields an average CPU utilization reduction of approximately 10-15% depending on the application.
By default, when huge pages are configured, InterSystems IRIS attempts to provision shared memory in huge pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages and orphans the allocated huge page space, potentially causing system paging. However, you can use the memlock parameter to control this behavior and fail at startup if huge page allocation fails.
For more information, see Configuring Huge Pages on Linux in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
The use of large pages is recommended to reduce page table entry (PTE) overhead.
By default, when large pages are configured, InterSystems IRIS attempts to provision shared memory in large pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages. However, you can use the memlock parameter to control this behavior and fail at startup if large page allocation fails.
For more information, see Configuring Large Pages on Windows in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
CPU Sizing and Scaling for InterSystems IRIS
InterSystems IRIS is designed to make the most of a system’s total CPU capacity. Keep in mind that not all processors or processor cores are alike. There are variations at the surface such as clock speed, number of threads per core, and processor architectures, and also the varying impact of virtualization.
Basic CPU Sizing
Applications vary significantly from one to another, and there is no better measurement of CPU resource requirements than benchmarking and load testing your application and performance statistics collected from existing sites. If neither benchmarking or existing customer performance data is available, start with one of the following calculations:
1-2 processor cores per 100 users.
1 processor core for every 200,000 global references per second.
These recommendations are only starting points when application-specific data is not available, and may not be appropriate for your application. It is very important to benchmark and load test your application to verify its exact CPU requirements.
Balancing Core Count and Speed
Given a choice between faster CPU cores and more CPU cores, consider the following:
The more processes your application uses, the greater the benefit of raising the core count to increase concurrency and overall throughput.
The fewer processes your application uses, the greater the benefit of the fastest possible cores.
For example, an application with a great many users concurrently running simple queries will benefit from a higher core count, while one with relatively fewer users executing compute-intensive queries would benefit from faster but fewer cores. In theory, both applications would benefit from many fast cores, assuming there is no resource contention when multiple processes are running in all those cores simultaneously. As noted in Estimating Memory Requirements, the number of processor cores is a factor in estimating the memory to provision for a server, so increasing the core count may require additional memory.
Virtualization Considerations for CPU
Production systems are sized based on benchmarks and measurements at live customer sites. Virtualization using shared storage adds very little CPU overhead compared to bare metal, so it is valid to size virtual CPU requirements from bare metal monitoring.
For hyper-converged infrastructure (HCI) deployments, add 10% to your estimated host-level CPU requirements to cover the overhead of HCI storage agents or appliances.
In determining the best core count for individual VMs, strike a balance between the number of hosts required for availability and minimizing costs and host management overhead; by increasing core counts, you may be able to satisfy the former requirement without violating the latter.
The following best practices should be applied to virtual CPU allocation:
Production systems, especially database servers, are assumed to be highly utilized and should therefore be initially sized based on assumed equivalence between a physical CPU and its virtual counterpart. If you need six physical CPUs, assume you need six virtual CPUs.
Do not allocate more vCPUs than required to optimize performance. Although large numbers of vCPUs can be allocated to a virtual machine, there can be a (usually small) performance overhead for managing unused vCPUs. The key here is to monitor your systems regularly to ensure that vCPUs are correctly allocated.
Leveraging Core Count with Parallel Query Execution
When you upgrade by adding CPU cores, an InterSystems IRIS feature called parallel query execution helps you take the most effective advantage of the increased capacity.
Parallel query execution is built on a flexible infrastructure for maximizing CPU usage that spawns one process per CPU core, and is most effective with large data volumes, such as analytical workloads that make large aggregations.
For more information on parallel query processing, see Parallel Query Processing in the “Optimizing Query Performance “ chapter of the SQL Optimization Guide.
General Performance Enhancement on InterSystems IRIS Platforms
The following information may be helpful in improving the performance of your InterSystems IRIS deployment.
In most situations, the use of Intel Hyper-Threading or AMD Simultaneous Multithreading (SMT) is recommended for improved performance, either within a physical server or at the hypervisor layer in virtualized environments. There may be situations in a virtualized environment in which disabling Hyper-Threading or SMT is warranted; however, those are exceptional cases specific to a given application.
In the case of IBM AIX®, IBM Power processors offer multiple levels of SMT at 2, 4, and 8 threads per core. With the latest IBM Power9 processors, SMT-8 is the level most commonly used with InterSystems IRIS. There may be cases, however, especially with previous generation Power7 and Power8 processors, in which SMT-2 or SMT-4 is more appropriate for a given application. Benchmarking the application is the best approach to determining the ideal SMT level for a specific deployment.
By default, InterSystems IRIS allocates the minimum number of semaphore sets by maximizing the number of semaphores per set (see Semaphores in InterSystems Products). However, this is some evidence that this is not ideal for performance on Linux systems with non-uniform memory access (NUMA) architecture.
To address this, the semsperset parameter in the configuration parameter file (CPF) can be used to specify a lower number of semaphores per set. By default, semsperset is set to 0, which specifies the default behavior. Determining the most favorable setting will likely require some experimentation; if you have InterSystems IRIS deployed on a Linux/NUMA system, InterSystems recommends that you try an initial value of 250.