VMM page replacement tuning
Fuente: IBM
The memory management algorithm tries to keep the size of the free list and the percentage of real memory occupied by persistent segment pages within specified bounds.
These bounds, discussed in Real-memory management, can be altered with the vmo command, which can only be run by the root user. Changes made by this tool remain in effect until the next reboot of the system. To determine whether the vmo command is installed and available, run the following command:
# lslpp -lI bos.perf.tune
Executing the vmo command with the -a option displays the current parameter settings. For example:
# vmo -a cpu_scale_memp = 8 data_stagger_interval = 161 defps = 1 force_relalias_lite = 0 framesets = 2 htabscale = -1 kernel_heap_psize = 4096 large_page_heap_size = 0 lgpg_regions = 0 lgpg_size = 0 low_ps_handling = 1 lru_file_repage = 1 lru_poll_interval = 0 lrubucket = 131072 maxclient% = 80 maxfree = 1088 maxperm = 3118677 maxperm% = 80 maxpin = 3355444 maxpin% = 80 mbuf_heap_psize = 4096 memory_affinity = 1 memory_frames = 4194304 memplace_data = 2 memplace_mapped_file = 2 memplace_shm_anonymous = 2 memplace_shm_named = 2 memplace_stack = 2 memplace_text = 2 memplace_unmapped_file = 2 mempools = 1 minfree = 960 minperm = 779669 minperm% = 20 nokilluid = 0 npskill = 1536 npsrpgmax = 12288 npsrpgmin = 9216 npsscrubmax = 12288 npsscrubmin = 9216 npswarn = 6144 num_spec_dataseg = 0 numpsblks = 196608 page_steal_method = 0 pagecoloring = n/a pinnable_frames = 3868256 pta_balance_threshold = n/a relalias_percentage = 0 rpgclean = 0 rpgcontrol = 2 scrub = 0 scrubclean = 0 soft_min_lgpgs_vmpool = 0 spec_dataseg_int = 512 strict_maxclient = 1 strict_maxperm = 0 v_pinshm = 0 vm_modlist_threshold = -1 vmm_fork_policy = 1
Values for minfree and maxfree parameters
The purpose of the free list is to keep track of real-memory page frames released by terminating processes and to supply page frames to requestors immediately, without forcing them to wait for page steals and the accompanying I/O to complete.
The minfree limit specifies the free-list size below which page stealing to replenish the free list is to be started. The maxfree parameter is the size above which stealing ends. In the case of enabling strict file cache limits, like the strict_maxperm or strict_maxclient parameters, the minfree value is used to start page stealing. When the number of persistent pages is equal to or less than the difference between the values of the maxfree and minfree parameters, with the strict_maxperm parameter enabled, or when the number of client pages is equal to or less than the difference between the values of the maxclient and minfree parameters, with the strict_maxclient parameter enabled, page stealing starts.
The objectives in tuning these limits are to ensure the following:
- Any activity that has critical response-time objectives can always get the page frames it needs from the free list.
- The system does not experience unnecessarily high levels of I/O because of premature stealing of pages to expand the free list.
The default values of the minfree and maxfree parameters depend on the memory size of the machine. The difference between the maxfree and minfree parameters should always be equal to or greater than the value of the maxpgahead parameter, if you are using JFS. For Enhanced JFS, the difference between the maxfree and minfree parameters should always be equal to or greater than the value of the j2_maxPageReadAhead parameter. If you are using both JFS and Enhanced JFS, you should set the value of the minfree parameter to a number that is greater than or equal to the larger pageahead value of the two file systems.
The minfree and maxfree parameter values are different if there is more than one memory pool. Memory pools were introduced in AIX® 4.3.3 for MP systems with large amounts of RAM. Each memory pool has its own minfree and maxfree values. Prior to AIX 5.3 the minfree and maxfree values shown by the vmo command are the sum of the minfree and maxfree values for all memory pools. Starting with AIX 5.3 and later, the values shown by vmo command are per memory pool. The number of memory pools can be displayed with vmo -L mempools. A less precise but more comprehensive tool for investigating an appropriate size for minfree is the vmstat command. The following is a portion of vmstat command output on a system where the minfree value is being reached:
# vmstat 1 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 2 0 70668 414 0 0 0 0 0 0 178 7364 257 35 14 0 51 1 0 70669 755 0 0 0 0 0 0 196 19119 272 40 20 0 41 1 0 70704 707 0 0 0 0 0 0 190 8506 272 37 8 0 55 1 0 70670 725 0 0 0 0 0 0 205 8821 313 41 10 0 49 6 4 73362 123 0 5 36 313 1646 0 361 16256 863 47 53 0 0 5 3 73547 126 0 6 26 152 614 0 324 18243 1248 39 61 0 0 4 4 73591 124 0 3 11 90 372 0 307 19741 1287 39 61 0 0 6 4 73540 127 0 4 30 122 358 0 340 20097 970 44 56 0 0 8 3 73825 116 0 18 22 220 781 0 324 16012 934 51 49 0 0 8 4 74309 26 0 45 62 291 1079 0 352 14674 972 44 56 0 0 2 9 75322 0 0 41 87 283 943 0 403 16950 1071 44 56 0 0 5 7 75020 74 0 23 119 410 1611 0 353 15908 854 49 51 0 0
In the above example output, you can see that the minfree value of 120 is constantly being reached. Therefore, page replacement occurs and in this particular case, the free list even reaches 0 at one point. When that happens, threads needing free frames get blocked and cannot run until page replacement frees up some pages. To prevent this situation, you might consider increasing the minfree and maxfree values. If you conclude that you should always have at least 1000 pages free per memory pool, run the following command:
# vmo -o minfree=1000 -o maxfree=1008
To make this a permanent change, include the -p flag:
# vmo -o minfree=1000 -o maxfree=1008 -p
Starting with AIX 5.3, the default value of the minfree parameter is increased to 960 per memory pool and the default value of the maxfree parameter is increased to 1088 per memory pool.
Memory pools
The vmo -o mempools=number_of_memory_pools command allows you to change the number of memory pools that are configured at system boot time.
The mempools option is therefore not a dynamic change. It is recommended to not change this value without a good understanding of the behavior of the system and the VMM algorithms. You cannot change the mempools value on a UP kernel and on an MP kernel, the change is written to the kernel file.
This tunable should only be adjusted when advised by an IBM® service representative.
List-based LRU
In AIX® 5.3, the LRU algorithm can either use lists or the page frame table. Prior to AIX 5.3, the page frame table method was the only method available. The list-based algorithm provides a list of pages to scan for each type of segment.
The following is a list of the types of segments:
- Working
- Persistent
- Client
- Compressed
If WLM is enabled, there are lists for classes as well. You can disable the list-based LRU feature and enable the original physical-address-based scanning with the page_steal_method parameter of the vmo command. The default value for the page_steal_method parameter is 0, which means that the list-based LRU feature is enabled and lists are used to scan pages. If the page_steal_method parameter is set to 1, the physical-address-based scanning is used. The value for the page_steal_method parameter takes effect after a bosboot and reboot. Note: With list-based scanning, buckets that are specified with the lrubucket parameter are still used, but buckets can overlap on multiple lists and include a count of the number of pages that were scanned.
Reduce memory scanning overhead with the lrubucket parameter
Tuning with the lrubucket parameter can reduce scanning overhead on large memory systems.
The page-replacement algorithm scans memory frames looking for a free frame. During this scan, reference bits of pages are reset, and if a free frame has not been found, a second scan is done. In the second scan, if the reference bit is still off, the frame will be used for a new page (page replacement).
On large memory systems, there may be too many frames to scan, so now memory is divided up into buckets of frames. The page-replacement algorithm will scan the frames in the bucket and then start over on that bucket for the second scan before moving on to the next bucket. The default number of frames in this bucket is 131072 or 512 MB of RAM. The number of frames is tunable with the command vmo -o lrubucket=new value, and the value is in 4 KB frames.
Values for minperm and maxperm parameters
The operating system takes advantage of the varying requirements for real memory by leaving in memory pages of files that have been read or written.
If the file pages are requested again before their page frames are reassigned, this technique saves an I/O operation. These file pages may be from local or remote (for example, NFS) file systems.
The ratio of page frames used for files versus those used for computational (working or program text) segments is loosely controlled by the minperm and maxperm values:
- If percentage of RAM occupied by file pages rises above maxperm, page-replacement steals only file pages.
- If percentage of RAM occupied by file pages falls below minperm, page-replacement steals both file and computational pages.
- If percentage of RAM occupied by file pages is between minperm and maxperm, page-replacement steals only file pages unless the number of file repages is higher than the number of computational repages.
In a particular workload, it might be worthwhile to emphasize the avoidance of file I/O. In another workload, keeping computational segment pages in memory might be more important. To understand what the ratio is in the untuned state, use the vmstat command with the -v option.
# vmstat -v 1048576 memory pages 1002054 lruable pages 478136 free pages 1 memory pools 95342 pinned pages 80.1 maxpin percentage 20.0 minperm percentage 80.0 maxperm percentage 36.1 numperm percentage 362570 file pages 0.0 compressed percentage 0 compressed pages 35.0 numclient percentage 80.0 maxclient percentage 350782 client pages 0 remote pageouts scheduled 80 pending disk I/Os blocked with no pbuf 0 paging space I/Os blocked with no psbuf 3312 filesystem I/Os blocked with no fsbuf 0 client filesystem I/Os blocked with no fsbuf 474178 external pager filesystem I/Os blocked with no fsbuf
The numperm value gives the number of file pages in memory, 362570. This is 36.1 percent of real memory.
If you notice that the system is paging out to paging space, it could be that the file repaging rate is higher than the computational repaging rate since the number of file pages in memory is below the maxperm value. So, in this case we can prevent computational pages from being paged out by lowering the maxperm value to something lower than the numperm value. Since the numperm value is approximately 36%, we could lower the maxperm value down to 30%. Therefore, the page replacement algorithm only steals file pages. If the lru_file_repage parameter is set to 0, only file pages are stolen if the number of file pages in memory is greater than the value of the minperm parameter.
Persistent file cache limit with the strict_maxperm option
The strict_maxperm option of the vmo command, when set to 1, places a hard limit on how much memory is used for a persistent file cache by making the maxperm value be the upper limit for this file cache.
When the upper limit is reached, the least recently used (LRU) is performed on persistent pages. Attention: The strict_maxperm option should only be enabled for those cases that require a hard limit on the persistent file cache. Improper use of the strict_maxperm option can cause unexpected system behavior because it changes the VMM method of page replacement.
Enhanced JFS file system cache limit with the maxclient parameter
The maxclient parameter represents the maximum number of client pages that can be used for buffer cache if the strict_maxclient parameter is set to 1, which is the default value.
The enhanced JFS file system uses client pages for its buffer cache. The limit on client pages in real memory is enforced using the maxclient parameter, which is tunable. If the value of the strict_maxclient parameter is set to 0, the maxclient parameter acts as a soft limit. This means that the number of client pages can exceed the value of the maxclient parameter, and if that happens, only client file pages are stolen rather than computational pages when the client LRU daemon runs.
The LRU daemon begins to run when the number of client pages is within the number of minfree pages of the maxclient parameter's threshold. The LRU daemon attempts to steal client pages that have not been referenced recently. If the number of client pages is lower than the value of the maxclient parameter but higher than the value of the minperm parameter, and the value of the lru_file_repage parameter is set to 1, the LRU daemon references the repage counters.
If the value of the file repage counter is higher than the value of the computational repage counter, computational pages, which are the working storage, are selected for replacement. If the value of the computational repage counter exceeds the value of the file repage counter, file pages are selected for replacement.
If the value of the lru_file_repage parameter is set to 0 and the number of file pages exceeds the value of the minperm parameter, file pages are selected for replacement. If the number of file pages is lower than the value of the minperm parameter, any page that has not been referenced can be selected for replacement.
If the number of client pages exceeds the value of the maxclient parameter, which is possible if the value of the strict_maxclient parameter equals 0, file pages are selected for replacement.
The maxclient parameter also affects NFS clients and compressed pages. Also note that the maxclient parameter should generally be set to a value that is less than or equal to the maxperm parameter, particularly in the case where the strict_maxperm parameter is enabled, or the value of the strict_maxperm is set to 1.