MMLIB(3) Linux Programmer's Manual MMLIB(3) NAME m_getpageinfos, m_hashpages, m_merge, m_set_hash_default - identify and merge virtual memory pages of same contents SYNOPSIS #include <mmlib.h> ssize_t m_getpageinfos( pid_t pid, const void * start, const void * end, MPAGEINFO * pageinfos, size_t nel, const void * * contp ); ssize_t m_hashpages( pid_t pid, MPAGEINFO * pageinfos, size_t nel, unsigned long (*hashfunction) (void * addr, size_t size) ); ssize_t m_merge( MEQUALPAIR * pagepairs, size_t nel ); For fine tuning and testing: ssize_t m_set_hash_default( unsigned long (*hashfunction) (const void * addr, size_t size) ) unsigned long m_hash_addrothalf(const void *, size_t); unsigned long m_hash_const(const void *, size_t); DESCRIPTION mmlib allows to indicate pages of identical contents to the virtual memory system thereby reducing physical memory usage. Various strategies for detecting and sharing pages can be implemented safely with mmlib in user space. Using mmlib cannot affect the integrity of the memory system. For example, attempts to merge pages of different contents will fail. Programs using mmlib will typically proceed as follows. Information about each page (virtual, physical address and reference count) is retrieved with m_getpageinfos. To find pages of same content, m_hashpages is provided. Finally, potential candidates are merged together with m_merge. You can fine tune m_hashpages with your own hash function, by passing it to each call, or globally with m_set_hash_default. Please contribute any function faster and more significant than the current one! Information about pages is represented with the structure MPAGEINFO. Linux 2.0 12 January 1999 1 MMLIB(3) Linux Programmer's Manual MMLIB(3) typedef struct { const void * v_addr; /* virtual address */ const void * p_addr; /* physical address */ size_t count; /* reference count */ unsigned long hash; /* hash */ } MPAGEINFO; m_getpageinfos writes into array pageinfos information about the pages of process pid that are used to represent the region starting from start until end. The array page- infos must provide room for at least nel elements. At most nel entries are written into pageinfos in ascending order of v_addr. Swapped pages and those that cannot be merged will not be considered. For every page the fields v_addr (virtual address), p_addr (physical address), and count (reference count) are set. The field hash is ignored and left unchanged. The argument contp is either NULL or the address of a valid location. On success, m_getpageinfos returns the number of written entries which may be smaller than nel, if there are less entries in the region. In this case *contp is set to NULL. If more than nel entries are found, *contp is set to the first page that has not yet been examined. On error, -1 is returned. EXAMPLE m_getpageinfos(pid, (void *)0, (void *)-1, pagein- fos, nel, &cont) determines the first nel pages in process pid. m_getpageinfos(pid, (void *)-1, (void *)-1, pageinfos, 1, &cont) gets information about the last page in memory (if present) and sets cont to NULL. m_hashpages determines the hash values of pages in process pid. For each of the nel elements in array pageinfos the page containing the virtual address found in field v_addr is examined and the fields hash, p_addr, and count are set to their current values. If the page is not available count is set to zero and p_addr is set to NULL. On success, m_hashpages returns the number of successfully hashed pages, which may be smaller than nel, if some of the pages are not available. On error, -1 is returned, and errno is set appropriately. The hash method is specified with hashfunction. If NULL, the current system default is taken. Otherwise, Linux 2.0 12 January 1999 2 MMLIB(3) Linux Programmer's Manual MMLIB(3) hashfunction must be a pointer to a function that returns a hash value for the size bytes long object starting at addr. The value that m_hashpages writes into the fields hash may be different to the corresponding return value of hashfunction. With m_set_hash_default the default hash function is changed. Predefined hash functions start with prefix m_hash_. Alternate versions with prefix m_libhash_ allow perfor- mance comparisons with user defined functions, since they are executed the same way as user defined functions. m_hash_addrothalf and m_libhash_addrothalf compute a simple hash function based on the first half of a page. m_hash_const and m_libhash_const are constant func- tions. Useful for measuring calling overheads. m_merge indicates to the virtual memory system that the first nel entries in pagepairs should be merged by replac- ing each page spid, sv_addr by the page dpid, dv_addr. If possible, the physical page of spid, sv_addr will be freed. The result of the attempted merge is returned in the status field which is zero, if the pages have been merged. On success, m_merge returns the number of merged pages. On error, -1 is returned and errno is set appro- priately. typedef struct { pid_t spid; /* source pid */ const void * sv_addr; /* source virtual address */ pid_t dpid; /* destination pid */ const void * dv_addr; /* destination virtual address */ int status; /* return code */ } MEQUALPAIR; ERRORS ESRCH The process whose ID is pid could not be found. EPERM The calling process does not have appropriate Linux 2.0 12 January 1999 3 MMLIB(3) Linux Programmer's Manual MMLIB(3) privileges. The effective userid of the calling process must be equal to the effective userid of pid, or the superuser. EFAULT The pointer pageinfos, hashfunction, or contp is outside your accessible address space. EINVAL The parameters do not make sense. ENOPKG The kernel module mergemod has not been installed. EPROTO The version of the current mmlib uses a different protocol than the kernel module mergemod. A dif- ferent library or kernel module must be installed. EOVERFLOW Internal error, library/kernel probably corrupt. BUGS Currently, only root can use this library. Other users will receive an EPERM error. Not all cases of EFAULT are detected. All calls could be faster. The operating sys- tem may currently over commit memory due to m_merge. CAVEATS The information provided by m_getpageinfos and m_hashpages may not be accurate due to system operation during or after the calls are performed. HOW TO CONTRIBUTE mmlib has been designed to enable people on many levels of expertise to contribute. You do not need to be a kernel hacker to contribute. 1. Better hash functions. The ideal hash function should only consider a few lines of a memory page and still be more significant than the current one. 2. Better merging strategies. Currently only two are available mergemem(8) and mergeall(8). Linux 2.0 12 January 1999 4 MMLIB(3) Linux Programmer's Manual MMLIB(3) 3. Improved security schemes. See section SECURITY for potential current problems. 4. Make memory redundancy in processes sharable. Quite often, pages cannot be merged just because of a small memory offset caused by superfluous details like command lines of different length. You may improve sharing large areas using valloc(3) rather than malloc(3). Maybe malloc's library can be adapted for larger areas. 5. Improve the kernel module mergemod, if you are a devoted colonel hacker. In particular more effi- cient support for SMP is needed. PORTABILITY GUIDE Many system dependent aspects are hidden by the mmlib library. Please consider the following to avoid unneces- sary system dependence in your code. 1. Page size. The actual system's page size is not needed. Do not assume a particular size and don't call getpagesize(2). The hashfunction needed for m_hashpages receives the current page size as a separate argument dynamically. 2. Alignment. Do not assume that pages are page aligned. For example, the address handed over to hashfunction is only word aligned. 3. Do not assume that all pages are of the same size. Some memory systems allow different page sizes. Therefore, getpagesize(2) should be avoided. mmlib is link-compatible even in such situations. 4. Cache architecture. The provided interface should perform well on all architectures (virtually and physically mapped caches). For example on virtu- ally mapped caches, pages with same content but incompatible virtual memory location receive dif- ferent hash values. 5. Do not assume that the field hash carries the same value as returned by hashfunction. For reasons as mentioned in 4 a different value may be present. Linux 2.0 12 January 1999 5 MMLIB(3) Linux Programmer's Manual MMLIB(3) SECURITY Calls that refer to processes of different effective userids can only be performed by the superuser for secu- rity reasons. Otherwise the content of a page might be revealed to unauthorized processes. In a hostile environment only select processes should be merged whose purpose and operation is well known. Arbi- trary processes of untrusted users should not be merged. Sharing pages of unrelated user processes might provide an indirect hint about the existence of other users' pages of same content. Statistical information about sharing and memory usage might be exploited by unauthorized processes to this end. Even without such information the following scenario is possible. A hostile process tries to guess the content of a confidential page by creating a set of arbitrary pages containing some guesses. An authorized pro- cess merges one of those pages with the confiden- tial page. The sharing of the two pages is not directly visible to the hostile process. But modi- fying a shared page takes much longer, because it causes a copy-on-write page fault. VERSION Last modified 1999-01-18. SEE ALSO mergemem(1), mergeall(1) Linux 2.0 12 January 1999 6