Berkeley DB Java Edition
version 5.0.34

com.sleepycat.je.util
Class DbCacheSize

java.lang.Object
  extended by com.sleepycat.je.util.DbCacheSize

public class DbCacheSize
extends Object

Estimates the in-memory cache size needed to hold a specified data set. To get an estimate of the in-memory footprint for a given database, specify the number of records and database characteristics and DbCacheSize will return a minimum and maximum estimate of the cache size required for holding the database in memory. Based on this information a JE cache size can be chosen and then configured using EnvironmentMutableConfig.setCacheSize(long) or using the EnvironmentConfig.MAX_MEMORY property.

Importance of the JE Cache

The JE cache is not an optional cache. It is used to hold the metadata for accessing JE data. In fact the JE cache size is probably the most critical factor to JE performance, since Btree nodes will have to be fetched during a database read or write operation if they are not in cache. During a single read or write operation, at each level of the Btree that a fetch is necessary, an IO may be necessary at a different disk location for each fetch. In addition, if internal nodes (INs) are not in cache, then write operations will cause additional copies of the INs to be written to storage, as modified INs are moved out of the cache to make room for other parts of the Btree during subsequent operations. This additional fetching and writing means that sizing the cache too small to hold the INs will result in lower operation performance.

For best performance, all Btree nodes should fit in the JE cache, including leaf nodes (LNs), which hold the record data, and INs, which hold record keys and other metadata. However, because system memory is limited, it is sometimes necessary to size the cache to hold all or at least most INs, but not the LNs. This utility estimates the size necessary to hold only INs, and the size to hold INs and LNs.

When most or all LNs do not fit in cache, using CacheMode.EVICT_LN can be beneficial to reduce the Java GC cost of collecting the LNs as they are moved out of cache. A recommended approach is to size the JE cache to hold all INs and size the Java heap to hold that amount plus the amount needed for GC working space and application objects, leaving any additional memory for use by the file system cache to hold LNs. Tests show this approach results in low GC overhead and predictable latency.

Estimating the JE Cache Size

Estimating JE in-memory sizes is not straightforward for several reasons. There is some fixed overhead for each Btree internal node, so fanout (maximum number of child entries per parent node) and degree of node sparseness impacts memory consumption. In addition, JE uses various compact in-memory representations that depend on key sizes, key prefixing, how many child nodes are resident, etc. The physical proximity of node children also allows compaction of child physical address values.

Therefore, when running this utility it is important to specify all EnvironmentConfig and DatabaseConfig settings that will be used in a production system. The EnvironmentConfig settings are specified by command line options for each property, using the same names as the EnvironmentConfig parameter name values. For example, EnvironmentConfig.LOG_FILE_MAX, which influences the amount of memory used to store physical record addresses, can be specified on the command line as:

-je.log.fileMax LENGTH

To be sure that this utility takes into account all relevant settings, especially as the utility enhanced in future versions, it is best to specify all EnvironmentConfig settings used by the application.

The DatabaseConfig settings are specified using command line options defined by this utility.

This utility estimates the JE cache size by creating an in-memory Environment and Database. In addition to the size of the Database, the minimum overhead for the Environment is output. The Environment overhead shown is likely to be smaller than actually needed because it doesn't take into account use of memory by JE daemon threads (cleaner, checkpointer, etc) or the memory used for locks that are held by application operations and transactions. An additional amount should be added to account for these factors.

This utility estimates the cache size for a single JE Database. To estimate the size for multiple Databases with different configuration parameters or different key and data sizes, run this utility for each Database and sum the sizes. If you are summing multiple runs for multiple Databases that are opened in a single Environment, the overhead size for the Environment should only be added once.

Key Prefixing and Compaction

Key prefixing deserves special consideration. It can significantly reduce the size of the cache and is generally recommended; however, the benefit can be difficult to predict. Key prefixing, in turn, impacts the benefits of key compaction, and the use of the EnvironmentConfig.TREE_COMPACT_MAX_KEY_LENGTH parameter.

For a given data set, the impact of key prefixing is determined by how many leading bytes are in common for the keys in a single bottom internal node (BIN). For example, if keys are assigned sequentially as long (8 byte) integers, and the maximum entries per node is 128 (the default value) then 6 or 7 of the 8 bytes of the key will have a common prefix in each BIN. Of course, when records are deleted, the number of prefixed bytes may be reduced because the range of key values in a BIN will be larger. For this example we will assume that, on average, 5 bytes in each BIN are a common prefix leaving 3 bytes per key that are unprefixed.

Key compaction is applied when the number of unprefixed bytes is less than a configured value; see EnvironmentConfig.TREE_COMPACT_MAX_KEY_LENGTH. In the example, the 3 unprefixed bytes per key is less than the default used for key compaction (16 bytes). This means that each key will use 16 bytes of memory, in addition to the amount used for the prefix for each BIN. The per-key overhead could be reduced by changing the TREE_COMPACT_MAX_KEY_LENGTH parameter to a smaller value, but care should be taken to ensure the compaction will be effective as keys are inserted and deleted over time.

Because key prefixing depends so much on the application key format and the way keys are assigned, the number of expected prefix bytes must be estimated by the user and specified to DbCacheSize using the -keyprefix argument.

Key Prefixing and Duplicates

When duplicates are configured for a Database (including DPL MANY_TO_ONE and MANY_TO_MANY secondary indices), key prefixing is always used. This is because the internal key in a duplicates database BIN is formed by concatenating the user-specified key and data. In secondary databases with duplicates configured, the data is the primary key, so the internal key is the concatenation of the secondary key and the primary key.

Key prefixing is always used for duplicates databases because prefixing is necessary to store keys efficiently. When the number of duplicates per unique user-specified key is more than the number of entries per BIN, the entire user-specified key will be the common prefix.

For example, a database that stores user information may use email address as the primary key and zip code as a secondary key. The secondary index database will be a duplicates database, and the internal key stored in the BINs will be a two part key containing zip code followed by email address. If on average there are more users per zip code than the number of entries in a BIN, then the key prefix will normally be at least as long as the zip code key. If there are less (more than one zip code appears in each BIN), then the prefix will be shorter than the zip code key.

It is also possible for the key prefix to be larger than the secondary key. If for one secondary key value (one zip code) there are a large number of primary keys (email addresses), then a single BIN may contain concatenated keys that all have the same secondary key (same zip code) and have primary keys (email addresses) that all have some number of prefix bytes in common. Therefore, when duplicates are specified it is possible to specify a prefix size that is larger than the key size.

DbCacheSize requires that -keyprefix is specified whenever -duplicates is specified.

Running the DbCacheSize utility

Usage:
 java { com.sleepycat.je.util.DbCacheSize |
        -jar je-.jar DbCacheSize }
  -records 
      # Total records (key/data pairs); required
  -key 
      # Average key bytes per record; required
  [-data ]
      # Average data bytes per record; if omitted no leaf
      # node sizes are included in the output; required with
      # -duplicates, and specifies the primary key length
  [-keyprefix ]
      # Expected size of the prefix for the keys in each
      # BIN; default: key prefixing is not configured;
      # required with -duplicates
  [-nodemax ]
      # Number of entries per Btree node; default: 128
  [-orderedinsertion]
      # Assume ordered insertions and no deletions, so BINs
      # are 100% full; default: unordered insertions and/or
      # deletions, BINs are 70% full
  [-duplicates]" +
      # Indicates that sorted duplicates are used, including
      # MANY_TO_ONE and MANY_TO_MANY secondary indices
  [-replicated]
      # Use a ReplicatedEnvironment; default: false
  [-ENV_PARAM_NAME VALUE]...
      # Any number of EnvironmentConfig parameters and
      # ReplicationConfig parameters (if -replicated)
 

You should run DbCacheSize on the same target platform and JVM for which you are sizing the cache, as cache sizes will vary. You may also need to specify -d32 or -d64 depending on your target, if the default JVM mode is not the same as the mode to be used in production.

To take full advantage of JE cache memory, it is strongly recommended that compressed oops (-XX:+UseCompressedOops) is specified when a 64-bit JVM is used and the maximum heap size is less than 32 GB. As described in the referenced documentation, compressed oops is sometimes the default JVM mode even when it is not explicitly specified in the Java command. However, if compressed oops is desired then it must be explicitly specified in the Java command when running DbCacheSize or a JE application. If it is not explicitly specified then JE will not aware of it, even if it is the JVM default setting, and will not take it into account when calculating cache memory sizes.

For example:

 $ java -jar je-X.Y.Z.jar DbCacheSize -records 554719 -key 16 -data 100

  === Environment Cache Overhead ===

  3,161,086 minimum bytes

  To account for JE daemon operation and record locks,
  a significantly larger amount is needed in practice.

  === Database Cache Size ===

   Minimum Bytes    Maximum Bytes   Description
  ---------------  ---------------  -----------
       19,891,344       23,118,992  Internal nodes only
      115,331,040      118,558,688  Internal nodes and leaf nodes

  === Internal Node Usage by Btree Level ===

   Minimum Bytes    Maximum Bytes      Nodes    Level
  ---------------  ---------------  ----------  -----
       19,596,552       22,787,848       6,233    1
          290,640          326,480          70    2
            4,152            4,664           1    3
 

This indicates that the minimum memory size to hold only the internal nodes of the Database Btree is approximately 20MB. The maximum size to hold the entire database, both internal nodes and data records, is approximately 119MB. To this amount, at least 3MB (plus more for locks and daemons) should be added to account for the environment overhead.


Method Summary
static void main(String[] args)
          Runs DbCacheSize as a command line utility.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
Runs DbCacheSize as a command line utility. For command usage, see class description.


Berkeley DB Java Edition
version 5.0.34

Copyright (c) 2004-2011 Oracle. All rights reserved.