Java Binding of Tkrzw

Introduction

DBM (Database Manager) is a concept to store an associative array on a permanent storage. In other words, DBM allows an application program to store key-value pairs in a file and reuse them later. Each of keys and values is a string or a sequence of bytes. A key must be unique within the database and a value is associated to it. You can retrieve a stored record with its key very quickly. Thanks to simple structure of DBM, its performance can be extremely high.

Tkrzw is a library implementing DBM with various algorithms. It features high degrees of performance, concurrency, scalability and durability. The following data structures are provided.

  • HashDBM : File datatabase manager implementation based on hash table.
  • TreeDBM : File datatabase manager implementation based on B+ tree.
  • SkipDBM : File datatabase manager implementation based on skip list.
  • TinyDBM : On-memory datatabase manager implementation based on hash table.
  • BabyDBM : On-memory datatabase manager implementation based on B+ tree.
  • CacheDBM : On-memory datatabase manager implementation with LRU deletion.
  • StdHashDBM : On-memory DBM implementations using std::unordered_map.
  • StdTreeDBM : On-memory DBM implementations using std::map.

Whereas Tkrzw is C++ library, this package provides its Java interface. All above data structures are available via one adapter class "DBM". Read the homepage for details.

DBM stores key-value pairs of strings. Each string is represented as a byte array in Java. Although you can also use methods with string arguments and return values, their internal representations are byte arrays.

All classes are defined under the package "tkrzw", which can be imported in source files of application programs.

import tkrzw.Utility;          // Library utilities
import tkrzw.Status;           // Status of operations
import tkrzw.StatusException;  // Exception to convey the status of operations
import tkrzw.DBM;              // Polymorphic database manager
import tkrzw.RecordProcessor;  // Interface of processor for a record
import tkrzw.Iterator;         // Iterator for each record
import tkrzw.Future;           // Future containing a status object and extra data
import tkrzw.AsyncDBM;         // Asynchronous database manager adapter
import tkrzw.File;             // Generic file implementation
import tkrzw.Index;            // Secondary index interface
import tkrzw.IndexIterator;    // Iterator for each record of the secondary index

An instance of the class "DBM" is used in order to handle a database. You can store, delete, and retrieve records with the instance. The result status of each operation is represented by an object of the class "Status". Iterator to access each record is implemented by the class "Iterator".

Installation

Install the latest version of Tkrzw beforehand and get the package of the Python binding of Tkrzw. JDK 9.0 or later is required to use this package.

Enter the directory of the extracted package then perform installation. The environment variable JAVA_HOME must be set properly.

./configure
make
make check
sudo make install

When a series of work finishes, the JAR file "tkrzw.jar" is installed under "/usr/local/share/java". The shared object files "libjtkrzw.so" and so on are installed under "/usr/local/lib". If you use a standard binary package on your system, read "/usr/local" as "/usr".

Let the class search path include "/usr/local/share/java/tkrzw.jar" and let the library search path include "/usr/local/lib".

CLASSPATH="$CLASSPATH:/usr/local/share/java/tkrzw.jar"
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
export CLASSPATH LD_LIBRARY_PATH

The above settings can be specified by options of the compiler and runtime command.

javac -cp .:/usr/local/share/java/tkrzw.jar FooBarBaz.java ...
java -cp .:/usr/local/share/java/tkrzw.jar -Djava.library.path=.:/usr/local/lib FooBarBaz ...

Example

The following code is a simple example to use a database, without checking errors. Many methods accept both byte arrays and strings. If strings are given, they are converted implicitly into byte arrays.

import tkrzw.*;

public class Example1 {
  public static void main(String[] args) {
    // Prepares the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true);
    
    // Sets records.
    // Keys and values are implicitly converted into byte arrays.
    dbm.set("first", "hop");
    dbm.set("second", "step");
    dbm.set("third", "jump");

    // Retrieves record values.
    // If the operation fails, null is returned.
    // If the class of the key is String, the value is converted into String.
    System.out.println(dbm.get("first"));
    System.out.println(dbm.get("second"));
    System.out.println(dbm.get("third"));
    System.out.println(dbm.get("fourth"));

    // Checks and deletes a record.
    if (dbm.contains("first")) {
      dbm.remove("first");
    }

    // Traverses records.
    // After using the iterator, it should be destructed explicitly.
    Iterator iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(record[0] + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close();
  }
}

The following code is a typical example to use a database, checking errors. Usually, objects of DBM and Iterator should be destructed in "finally" blocks to avoid memory leak. Even if the database is not closed, the destructor closes it implicitly. The method "orDie" throws an exception on failure so it is useful for checking errors.

import tkrzw.*;

public class Example2 {
  public static void main(String[] args) {
    DBM dbm = new DBM();
    try {
      // Prepares the database, giving tuning parameters.
      Status status = dbm.open(
          "casket.tkh", true, "truncate=True,num_buckets=100");
      // Checks the status explicitly.
      if (!status.isOK()) {
        throw new StatusException(status);
      }
    
      // Sets records.
      // Throws an exception on failure.
      dbm.set("first", "hop").orDie();
      dbm.set("second", "step").orDie();
      dbm.set("third", "jump").orDie();

      // Retrieves record values.
      String[] keys = {"first", "second", "third", "fourth"};
      for (String key : keys) {
        // Gives a status object to check.
        String value = dbm.get(key, status);
        if (status.isOK()) {
          System.out.println(value);
        } else {
          System.err.println(status);
          if (!status.equals(Status.NOT_FOUND_ERROR)) {
            throw new StatusException(status);
          }
        }
      }

      // Traverses records.
      Iterator iter = dbm.makeIterator();
      try {
        iter.first();
        while (true) {
          String[] record = iter.getString(status);
          if (!status.isOK()) {
            if (!status.equals(Status.NOT_FOUND_ERROR)) {
              throw new StatusException(status);
            }
            break;
          }
          System.out.println(record[0] + ": " + record[1]);
          iter.next();
        }
      } finally {
        // Releases the resources.
        iter.destruct();
      }

      // Closes the database.
      dbm.close().orDie();
    } finally {
      // Releases the resources.
      dbm.destruct();
    }
  }
}

The following code is examples to use a tree database which handles numeric keys with various key comparators. For integers, there are two ways: 1) to use the default comparator and serialize integer keys into byte sequences, or 2) to use the decimal integer comparator and represent keys as decimal integers like "123". For real numbers, there are also two ways: 1) to use the decimal real number comparator and represents keys as decimal real numbers like "123.45", or 2) to use the big-endian floating-point numbers comparator and serialize floating-point numbers into byte sequences.

import tkrzw.*;

public class Example3 {
  public static void main(String[] args) {
    DBM dbm = new DBM();

    // Opens a new database with the default key comparator (LexicalKeyComparator).
    dbm.open("casket.tkt", true, "truncate=True").orDie();

    // Sets records with the key being a big-endian binary of an integer.
    // e.g: "\x00\x00\x00\x00\x00\x00\x00\x31" -> "hop"
    dbm.set(Utility.serializeInt(1), "hop".getBytes()).orDie();
    dbm.set(Utility.serializeInt(256), "step".getBytes()).orDie();
    dbm.set(Utility.serializeInt(32), "jump".getBytes()).orDie();

    // Gets records with the key being a big-endian binary of an integer.
    System.out.println(new String(dbm.get(Utility.serializeInt(1))));
    System.out.println(new String(dbm.get(Utility.serializeInt(256))));
    System.out.println(new String(dbm.get(Utility.serializeInt(32))));

    // Lists up all records, restoring keys into integers.
    Iterator iter = dbm.makeIterator();
    iter.first();
    while (true) {
      byte[][] record = iter.get();
      if (record == null) {
        break;
      }
      System.out.println(Utility.deserializeInt(record[0]) + ": " + new String(record[1]));
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();

    // Opens a new database with the decimal integer comparator.
    dbm.open("casket.tkt", true, "truncate=True,key_comparator=Decimal").orDie();

    // Sets records with the key being a decimal string of an integer.
    // e.g: "1" -> "hop"
    dbm.set("1", "hop").orDie();
    dbm.set("256", "step").orDie();
    dbm.set("32", "jump").orDie();

    // Gets records with the key being a decimal string of an integer.
    System.out.println(dbm.get("1"));
    System.out.println(dbm.get("256"));
    System.out.println(dbm.get("32"));

    // Lists up all records, restoring keys into integers.
    iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(Long.parseLong(record[0]) + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();

    // Opens a new database with the decimal real number comparator.
    dbm.open("casket.tkt", true, "truncate=True,key_comparator=RealNumber").orDie();

    // Sets records with the key being a decimal string of a real number.
    // e.g: "1.5" -> "hop"
    dbm.set("1.5", "hop").orDie();
    dbm.set("256.5", "step").orDie();
    dbm.set("32.5", "jump").orDie();

    // Gets records with the key being a decimal string of a real number.
    System.out.println(dbm.get("1.5"));
    System.out.println(dbm.get("256.5"));
    System.out.println(dbm.get("32.5"));

    // Lists up all records, restoring keys into floating-point numbers.
    iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(Double.parseDouble(record[0]) + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();

    // Opens a new database with the big-endian signed integers comparator.
    dbm.open("casket.tkt", true, "truncate=True,key_comparator=SignedBigEndian").orDie();

    // Sets records with the key being a big-endian binary of a signed integer.
    // e.g: "\x00\x00\x00\x00\x00\x00\x00\x31" -> "hop"
    dbm.set(Utility.serializeInt(-1), "hop".getBytes()).orDie();
    dbm.set(Utility.serializeInt(-256), "step".getBytes()).orDie();
    dbm.set(Utility.serializeInt(-32), "jump".getBytes()).orDie();

    // Gets records with the key being a big-endian binary of a signed integer.
    System.out.println(new String(dbm.get(Utility.serializeInt(-1))));
    System.out.println(new String(dbm.get(Utility.serializeInt(-256))));
    System.out.println(new String(dbm.get(Utility.serializeInt(-32))));

    // Lists up all records, restoring keys into signed integers.
    iter = dbm.makeIterator();
    iter.first();
    while (true) {
      byte[][] record = iter.get();
      if (record == null) {
        break;
      }
      System.out.println(Utility.deserializeInt(record[0]) + ": " + new String(record[1]));
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();

    // Opens a new database with the big-endian floating-point numbers comparator.
    dbm.open("casket.tkt", true, "truncate=True,key_comparator=FloatBigEndian").orDie();

    // Sets records with the key being a big-endian binary of a floating-point number.
    // e.g: "\x3F\xF8\x00\x00\x00\x00\x00\x00" -> "hop"
    dbm.set(Utility.serializeFloat(1.5), "hop".getBytes()).orDie();
    dbm.set(Utility.serializeFloat(256.5), "step".getBytes()).orDie();
    dbm.set(Utility.serializeFloat(32.5), "jump".getBytes()).orDie();

    // Gets records with the key being a big-endian binary of a floating-point number.
    System.out.println(new String(dbm.get(Utility.serializeFloat(1.5))));
    System.out.println(new String(dbm.get(Utility.serializeFloat(256.5))));
    System.out.println(new String(dbm.get(Utility.serializeFloat(32.5))));

    // Lists up all records, restoring keys into floating-point numbers.
    iter = dbm.makeIterator();
    iter.first();
    while (true) {
      byte[][] record = iter.get();
      if (record == null) {
        break;
      }
      System.out.println(Utility.deserializeFloat(record[0]) + ": " + new String(record[1]));
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();
  }
}

The following code is a typical example of the asynchronous API. The AsyncDBM class manages a thread pool and handles database operations in the background in parallel. Each Method of AsyncDBM returns a Future object to monitor the result.

import tkrzw.*;

public class Example4 {
  public static void main(String[] args) {
    // Prepares the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true, "truncate=True,num_buckets=100");

    // Prepares the asynchronous adapter with 4 worker threads.
    AsyncDBM async= new AsyncDBM(dbm, 4);

    // Executes the Set method asynchronously.
    Future<Status> set_future = async.set("hello", "world");
    // Does something in the foreground.
    System.out.println("Setting a record");
    // Checks the result after awaiting the set operation.
    Status status = set_future.get();
    if (!status.isOK()) {
      System.out.println("ERROR: " + status.toString());
    }

    // Executes the get method asynchronously.
    Future<Status.And<String>> get_future = async.get("hello");
    // Does something in the foreground.
    System.out.println("Getting a record");
    // Checks the result after awaiting the get operation.
    Status.And<String> get_result = get_future.get();
    if (get_result.status.isOK()) {
      System.out.println("VALUE: " + get_result.value);
    }

    // Releases the asynchronous adapter.
    async.destruct();

    // Closes the database.
    dbm.close();
  }
}

The following code uses process, processMulti, and processEach methods which take callback functions to process the record efficiently. process is useful to update a record atomically according to the current value. processEach is useful to access every record in the most efficient way.

import java.util.HashMap;
import java.util.Map;
import tkrzw.*;

public class Example5 {
  public static void main(String[] args) {
    // Opens the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true, "truncate=True,num_buckets=100");

    // Sets records with lambda functions.
    dbm.process("doc-1", (k, v)->"Tokyo is the capital city of Japan.".getBytes(), true);
    dbm.process("doc-2", (k, v)->"Is she living in Tokyo, Japan?".getBytes(), true);
    dbm.process("doc-3", (k, v)->"She must leave Tokyo!".getBytes(), true);

    // Lowers record values.
    tkrzw.RecordProcessor lower = (key, value) -> {
      // If no matching record, None is given as the value.
      if (value == null) return null;
      // Sets the new value.
      return new String(value).toLowerCase().getBytes();
    };
    dbm.process("doc-1", lower, true);
    dbm.process("doc-2", lower, true);
    dbm.process("doc-3", lower, true);
    dbm.process("non-existent", lower, true);

    // Adds multiple records at once.
    RecordProcessor.WithKey[] ops1 = {
      new RecordProcessor.WithKey("doc-4", (k, v)->"Tokyo Go!".getBytes()),
      new RecordProcessor.WithKey("doc-5", (k, v)->"Japan Go!".getBytes()),
    };
    dbm.processMulti(ops1, true);

    // Modifies multiple records at once.
    RecordProcessor.WithKey[] ops2 = {
      new RecordProcessor.WithKey("doc-4", lower),
      new RecordProcessor.WithKey("doc-5", lower),
    };
    dbm.processMulti(ops2, true);

    // Checks the whole content.
    // This uses an external iterator and is relavively slow.
    Iterator iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(record[0] + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Opertion for word counting.
    Map<String, Integer> word_counts = new HashMap<String, Integer>();
    RecordProcessor wordCounter = (key, value) -> {
      if (key == null) return null;
      String[] words = new String(value).split("\\b");
      for (String word : words) {
        if (word.length() < 1) continue;
        char c = word.charAt(0);
        if (c < 'a' || c > 'z') continue;
        int old_count = word_counts.getOrDefault(word, 0);
        word_counts.put(word, old_count + 1);
      }
      return null;
    };

    // The second parameter should be false if the value is not updated.
    dbm.processEach(wordCounter, false);
    for(Map.Entry<String, Integer> entry : word_counts.entrySet()) {
      System.out.println(entry.getKey() + ":" + entry.getValue());
    }

    // Returning RecordProcessor.REMOVE by the callbacks removes the record.
    dbm.process("doc-1", (k, v)->RecordProcessor.REMOVE, true);
    System.out.println(dbm.count());
    RecordProcessor.WithKey[] ops3 = {
      new RecordProcessor.WithKey("doc-2", (k, v)->RecordProcessor.REMOVE),
      new RecordProcessor.WithKey("doc-3", (k, v)->RecordProcessor.REMOVE),
    };
    dbm.processMulti(ops3, true);
    System.out.println(dbm.count());
    dbm.processEach((k, v)->RecordProcessor.REMOVE, true);
    System.out.println(dbm.count());

    // Closes the database.
    dbm.close().orDie();
  }
}

The following code is an example to use a secondary index, which is useful to organize records by non-primary keys.

import java.util.HashMap;
import java.util.Map;
import tkrzw.*;

public class Example6 {
  public static void main(String[] args) {
    // Opens the index.
    Index index = new Index();
    index.open("casket.tkt", true, "truncate=True,num_buckets=100");

    // Adds records to the index.
    // The key is a division name and the value is person name.
    index.add("general", "anne").orDie();
    index.add("general", "matthew").orDie();
    index.add("general", "marilla").orDie();
    index.add("sales", "gilbert").orDie();

    // Anne moves to the sales division.
    index.remove("general", "anne").orDie();
    index.add("sales", "anne").orDie();

    // Prints all members for each division.
    String[] divisions = {"general", "sales"};
    for (String division : divisions) {
      System.out.println(division);
      String[] members = index.getValues(division, 0);
      for (String member : members) {
        System.out.println(" -- " + member);
      }
    }

    // Prints every record by iterator.
    IndexIterator iter = index.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) break;
      System.out.println(record[0] + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    index.close().orDie();
  }
}
Packages
Package
Description