The mergemem program is able to reduce memory consumption of processes under the Linux operating system. Many programs contain memory areas of the same content that remain undetected by the operating system. Typically, these areas contain data that have been generated on startup and remain unchanged for longer periods. With mergemem such areas are detected and shared. The sharing is performed on the operating system level and is invisible to the user level programs.
mergemem is particularily useful if you run many instances of interpreters and emulators (like Java or Prolog) that keep their code in private data areas. But also other programs can take advantage albeit to a lesser degree.
mergemem was realized in a student project by Philipp Richter and Philipp Reisner at TU Wien.
Please refer to one of the following sites via http:
Equally mergemem can be obtained via anonymous ftp:
% ftp ftp.complang.tuwien.ac.at login: anonymous password: my@email.adress ftp>cd pub/ulrich/mergemem/versions ftp>ls -ltr ftp>binary ftp>get XXX ftp>quit
In a hostile environment only select processes should be merged whose purpose and operation is well known. Arbitrary processes of untrusted users should not be merged.Sharing pages of unrelated user processes might provide an indirect hint about the existence of other users' pages of same content. Statistical information about sharing and memory usage might be exploited by unauthorized processes to this end. Even without such information the following scenario is possible.
A hostile process tries to guess the content of a confidential page by creating a set of arbitrary pages containing some guesses. An authorized process merges one of those pages with the confidential page. The sharing of the two pages is not directly visible to the hostile process. But modifying a shared page takes much longer, because it causes a copy-on-write page fault.
To install simply uncompress and untar the distribution in a convenient directory and follow the instructions in the INSTALL file.
The current tentative manual mmlib (3), source (based of 0.13), pre 0.14.
Worst case scenario: Pages are merged and immediately afterwards split back.
The following programs were run under i386-Linux 2.0.33. Memory requirements/savings are indicated on a per instance basis. Numbers should be multiplied with the actual number of instances. (i.e. big savings are only realized when running many instances).
Depending on what the processes do afterwards, the amount of sharing might be reduced. But most sharings seem to remain.
1st instance ................ 2216 kB further instance ............ +964 kB further instance merged ..... +84 kB i.e. 880 kB savedI.e. any further instance requires initially only 84 kB instead of 964 kB.
Saved state with lots of code and libraries
1st instance ................ 7608 kB further instance ............ +3616 kB further instance merged ..... +84 kB i.e. 3532 kB saved
bin/appletviewer demo/name/example1.html
SpreadSheet
1st instance ................ 12488 kB further instance ............ +7612 kB further instance merged ..... +2452 kB i.e. 5160 kB savedMoleculeViewer
1st instance ................ 13476 kB further instance ............ +8612 kB further instance merged ..... +3112 kB i.e. 5500 kB savedMoleculeViewer+SpreadSheet
MoleculeViewer .............. 13476 kB MoleculeViewer+SpreadSheet .. +7744 kB merged ...................... +2508 kB i.e. 5236 kB saved
1st instance ................ 3888 kB further instance ............ +1284 kB further instance merged ..... +476 kB i.e. 808 kB saved
Viewing two different documents:
1st document ................ 3536 kB 1st and 2nd document ........ +2724 kB after merging ............... +2163 kB i.e. 561 kB saved
1st instance ................ 1004 kB further instance ............ +516 kB further instance merged ..... +160 kB i.e. 356 kB saved
The measurements were performed as follows: First, it was ensured
that file buffers etc. are flushed by calling a process that allocates
and modifies slightly more memory than physically available.
Then, free
was used to measure the availabe memory. (For
this reason the first instance uses lots of memory. Part of it
is due to file buffering).