2. MTS Basics¶
2.1 Background¶
MTS works by replacing the system memory manager with a highly tuned memory manager designed for single and multithreaded applications that dynamically allocate memory. The basic interface to MTS is identical to the system memory manager. This allows most applications to use MTS without source code changes. On some platforms, dynamically-linked applications can use MTS without relinking.
The initial version of MTS was developed in 1988 for a C++ CAD/CAM program running on a 4 MB Xenix/386 machine. At the time we found that while the 300k LOC program ran fine for a while, its performance was inevitably lost due to excessive paging.
This experience resulted in a specialized heap subsystem which has proven to be highly durable. MTS stands for Memory Tuning System.
MTS also offers an extended interface to allow an application to tune the memory manager for the specific needs of the application and the host system. This chapter presents an overview of the concepts involved in using MTS. For platform-specific information, see the chapter that discusses your platform.
In our experience, if space management becomes is an issue, then the application is a good candidate for using MTS pools.
MTS has always been a very fast allocator, but more importantly it scales for apps that use gigabytes of memory. Currently, it isn’t unusual for customers to manage 256GB within a single application.
2.2 Installing MTS¶
Installing MTS is simply a matter of unpacking the archive for your platform. For dynamically-linked applications, MTS shared libraries must be installed in a directory that the operating system uses for loading shared libraries or DLLs. We suggest that you consult your system administrator and follow your local policy for installing shared libraries. The chapter on each specific platform lists the files included with MTS for that platform.
2.3 Using MTS¶
To use MTS on UNIX or Linux, or with statically-linked Windows applications, link the application with the appropriate MTS library. The MTS library must appear on the link line before any other library. To use MTS in a dynamically-linked application on Windows, compile the application normally, and then use the provided setmts utility to set the application to use the MTS allocator (see Chapter 4. Using MTS on Microsoft Windows for details.)
Systems that support library preloading can use MTS for dynamically linked applications without relinking the application. See the following chapters for platform-specific details on this feature. Appendix B presents the command line syntax for setting environment variables.
For details on linking MTS on a specific platform, see the chapter for that platform and the Makefile included with the MTS distribution.
2.4 MTS Design¶
MTS aggressively manages memory, without checking for consistency. This provides larger performance gains, but may cause problems for applications that rely on the more conservative behavior of the system libraries.
MTS may reveal subtle memory management problems that the system libraries do not trigger. We recommend using memory leak detection software to identify and correct memory management problems within programs used with MTS. For example, MTS will reuse freed memory faster than many system libraries do. Code that inadvertently relies on accessing freed memory may appear to run correctly under the system memory manager, but crash or produce incorrect results under MTS. The robust version detects many, but not all, of these problems.
MTS is built much like a micro-kernel with multiple allocators built on top of basic services that bind them together. The current architecture provides for an individual virtual heap per thread that handles allocation sizes up to 4k. As pages become empty they are recycled among all these thread bound heaps.
The set of smaller specialized allocators are based on allocation size:
allocator | size |
---|---|
Small | <= 4,096 b |
Middle | 4,097 b - 61,440 b |
Large | 61,441 b- 2 Mb |
Block | >= 2M |
2.5 C++ operator new and operator delete¶
Most C++ implementations implement operator new and operator delete using the system-provided malloc and free. On these implementations, no source code changes are necessary for operator new and operator delete to use MTS. On other implementations, you may be required to reimplement new and delete using malloc and free to take full-advantage of MTS.
If a C++ class overrides operator new and operator delete that rely on direct system calls rather than malloc and free, those operators will not use the MTS allocator. To get the greatest performance benefit from MTS, we recommend that you rewrite any such operators to use malloc and free.
2.6 MTS Allocator Versions¶
The MTS allocator is provided as both a static library and a dynamic library allowing you to make the optimal choice for your application and build pipeline. The file extension of these library versions are platform dependent. The names of the library files for your preferred platform can be found in the appropriate chapter of this manual for that platform.
Your MTS release will provide one or both 64 and 32 bit versions of the library. Both the static and dynamic libraries will have the suffix 64 or 32 denoting the bit version.
E.X. libmts64.so is the 64bit shared library version of MTS on Unix platforms.
For dynamically linked applications,
MTS includes a passthrough library in the “disabled” sub folder of the
release package. This library maintains all the symbols from MTS.h, but
simply falls through to the system malloc and free syscalls. This allows you
to easily test or benchmark your application without re-linking or re-compiling.
For example, You can simply use the LD_PRELOAD
environment variable on linux
to link to the disabled libmts shared library instead of the normal
libmts shared library and your application will use the system malloc and free
despite any mts specific api calls that your application may be using.
Note: The 7.13.8 release of MTS is a major milestone in terms of performance, testing and feature enhancements. At the same time, more antiquated allocator modes such as the single threaded mode and robust mode are no longer supported Please send your own feedback with respect to priorities around single threaded and robust modes to support@crankuptheamps.com.
2.7 Multithreaded Applications¶
The MTS library is transparently threadsafe. In other words, MTS can be used with any existing multithreaded application without changing the source code of the application. MTS permits multiple threads to allocate and free memory without the need for any application-level locking. MTS automatically performs the necessary locking to safely access its internal control structures.
MTS uses multiple internal heaps. The internal control structures of
each heap are completely independent from the control structures of the
other heaps. Therefore, heaps may safely be used in parallel by different
threads. On multiprocessor systems, MTS allows multiple threads to
simultaneously allocate and free memory. This design allows for highly
efficient multithreading. An application that uses MTS has a minimum of
4 heaps and a maximum of 128 heaps. When the environment variable
MTS_INIT_THREAD_HEAPS
is a number, MTS uses that number as the initial
number of heaps. Otherwise, the initial number of heaps depends on
the number of processors in the system. For systems with 1-4 CPUs, the
initial number of heaps is 4. For systems with 4-16 CPUs, the initial number
of heaps is equal to the number of CPUs. For systems with greater
than 16 CPUs, the initial number of heaps is 16.
For example, in a system with 2 CPUs, MTS starts with 4 heaps unless
MTS_INIT_THREAD_HEAPS
is set to a greater number. In a system with
10 CPUs, MTS starts with 10 heaps unless MTS_INIT_THREAD_HEAPS
is
greater than 10. In a system with 32 CPUs, MTS starts with 16 heaps
unless MTS INIT_THREAD_HEAPS
is greater than 16.
To increase the number of heaps during program execution, call the function mts init thread heaps. Function mts init thread heaps is defined in mts.h. The sample below shows the signature of the function:
void mts_init_thread_heaps(int number of heaps);
Note that function mts_init_thread_heaps
ignores values lower than
the current number of heaps. In practical terms, this means that a process
can never have a value lower than the current or initial number of heaps.
Threads are assigned to heaps on a round-robin basis in order to evenly
distribute threads among available heaps. A given thread always allocates
from the same heap, but may efficiently free memory that was allocated
from another heap. For example, in an application with 4 heaps, the first
thread to start is assigned to the first heap, the second thread to start is
assigned to the second heap, and so on up to four threads. The fifth thread
to start is assigned to the first heap.
The optimal number of heaps varies from application to application. In general, an application should be run with the minimum number of heaps that facilitate maximum parallel operations. Typically, that number of heaps is equal to the number of available CPUs. To determine the optimal number of heaps for a specific application, test the application with different numbers of heaps and compare the results.
2.8 Determining Multithreaded Performance Characteristics¶
You can determine the general multithreaded performance characteristics of MTS on your system by using the demo executable demoF. See Appendix A for instructions on building the demo.
First, run the program with one thread. Use the command below on Windows systems:
prompt> demoF 1
Use the command below on UNIX or Linux systems:
prompt> time demoF 1
Note the time reported in the output of the demoF executable. This shows how long a single thread takes to perform the sequence of calls to malloc and free.
Now run the same program with two threads and two heaps. Use the command below on Windows systems:
prompt> demoF 2
Use the command below on UNIX or Linux systems:
prompt> time demoF 2
On a multiple processor system where each thread is assigned to a different processor, running the program with two threads should take slightly longer than running the program with one thread. The reason for this is because the two threads are able to use two heaps in parallel.
You can experiment with running demoF with larger numbers of threads and heaps. Note, however, that since the program is memory intensive, large values may cause the program to run out of memory and fail.
2.9 The MTSPool Interface¶
MTSPool is a pooling interface to the MTS library. The general idea is that operations can be applied on an entire set of named memory allocations, the pool, for example. The most important operation is to delete an entire pool of memory, thus freeing its memory for subsequent allocations. Memory allocated from a pool can be treated identically to memory returned from malloc.
MTSPool references can be passed among threads, so that one thread can do allocations, while a different thread can conveniently clear all of its memory without having access to individual memory pointers. Pools grow adaptively in size and can include allocations across multiple heaps generated by different threads. They also improve locality of reference by using dedicated memory pages.
The MTSPool interface is documented in Appendix D. The MTS API as well as mts.h.
MTS also provides a C++ class that acts as a wrapper around the the MTS Pool interface. This class is compatible with the C++ std::allocator interface and allows seamless use of MTS with C++ standard library containers. This class is defined in MTSPoolObject.H