Java Binding of Tkrzw

Introduction

DBM (Database Manager) is a concept to store an associative array on a permanent storage. In other words, DBM allows an application program to store key-value pairs in a file and reuse them later. Each of keys and values is a string or a sequence of bytes. A key must be unique within the database and a value is associated to it. You can retrieve a stored record with its key very quickly. Thanks to simple structure of DBM, its performance can be extremely high.

Tkrzw is a library implementing DBM with various algorithms. It features high degrees of performance, concurrency, scalability and durability. The following data structures are provided.

  • HashDBM : File datatabase manager implementation based on hash table.
  • TreeDBM : File datatabase manager implementation based on B+ tree.
  • SkipDBM : File datatabase manager implementation based on skip list.
  • TinyDBM : On-memory datatabase manager implementation based on hash table.
  • BabyDBM : On-memory datatabase manager implementation based on B+ tree.
  • CacheDBM : On-memory datatabase manager implementation with LRU deletion.
  • StdHashDBM : On-memory DBM implementations using std::unordered_map.
  • StdTreeDBM : On-memory DBM implementations using std::map.

Whereas Tkrzw is C++ library, this package provides its Java interface. All above data structures are available via one adapter class "DBM". Read the homepage for details.

DBM stores key-value pairs of strings. Each string is represented as a byte array in Java. Although you can also use methods with string arguments and return values, their internal representations are byte arrays.

All classes are defined under the package "tkrzw", which can be imported in source files of application programs.

import tkrzw.Status;
import tkrzw.StatusException;
import tkrzw.DBM;
import tkrzw.Iterator;
import tkrzw.Utility;

An instance of the class "DBM" is used in order to handle a database. You can store, delete, and retrieve records with the instance. The result status of each operation is represented by an object of the class "Status". Iterator to access each record is implemented by the class "Iterator".

Installation

Install the latest version of Tkrzw beforehand and get the package of the Python binding of Tkrzw. JDK 9.0 or later is required to use this package.

Enter the directory of the extracted package then perform installation. The environment variable JAVA_HOME must be set properly.

./configure
make
make check
sudo make install

When a series of work finishes, the JAR file "tkrzw.jar" is installed under "/usr/local/share/java". The shared object files "libjtkrzw.so" and so on are installed under "/usr/local/lib". If you use a standard binary package on your system, read "/usr/local" as "/usr".

Let the class search path include "/usr/local/share/java/tkrzw.jar" and let the library search path include "/usr/local/lib".

CLASSPATH="$CLASSPATH:/usr/local/share/java/tkrzw.jar"
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib"
export CLASSPATH LD_LIBRARY_PATH

The above settings can be specified by options of the compiler and runtime command.

javac -cp .:/usr/local/share/java/tkrzw.jar FooBarBaz.java ...
java -cp .:/usr/local/share/java/tkrzw.jar -Djava.library.path=.:/usr/local/lib FooBarBaz ...

Example

The following code is a simple example to use a database, without checking errors. Many methods accept both byte arrays and strings. If strings are given, they are converted implicitly into byte arrays.

import tkrzw.*;

public class Example1 {
  public static void main(String[] args) {
    // Prepares the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true);
    
    // Sets records.
    // Keys and values are implicitly converted into byte arrays.
    dbm.set("first", "hop");
    dbm.set("second", "step");
    dbm.set("third", "jump");

    // Retrieves record values.
    // If the operation fails, null is returned.
    // If the class of the key is String, the value is converted into String.
    System.out.println(dbm.get("first"));
    System.out.println(dbm.get("second"));
    System.out.println(dbm.get("third"));
    System.out.println(dbm.get("fourth"));

    // Checks and deletes a record.
    if (dbm.contains("first")) {
      dbm.remove("first");
    }

    // Traverses records.
    // After using the iterator, it should be destructed explicitly.
    Iterator iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(record[0] + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Closes the database.
    // After using the database, it should be destructed explicitly.
    dbm.close();
    dbm.destruct();
  }
}

The following code is a typical example to use a database, checking errors. Usually, objects of DBM and Iterator should be destructed in "finally" blocks to avoid memory leak. Even if the database is not closed, the destructor closes it implicitly. The method "orDie" throws an exception on failure so it is useful for checking errors.

import tkrzw.*;

public class Example2 {
  public static void main(String[] args) {
    DBM dbm = new DBM();
    try {
      // Prepares the database, giving tuning parameters.
      Status status = dbm.open(
          "casket.tkh", true, "truncate=True,num_buckets=100");
      // Checks the status explicitly.
      if (!status.isOK()) {
        throw new StatusException(status);
      }
    
      // Sets records.
      // Throws an exception on failure.
      dbm.set("first", "hop").orDie();
      dbm.set("second", "step").orDie();
      dbm.set("third", "jump").orDie();

      // Retrieves record values.
      String[] keys = {"first", "second", "third", "fourth"};
      for (String key : keys) {
        // Gives a status object to check.
        String value = dbm.get(key, status);
        if (status.isOK()) {
          System.out.println(value);
        } else {
          System.err.println(status);
          if (!status.equals(Status.NOT_FOUND_ERROR)) {
            throw new StatusException(status);
          }
        }
      }

      // Traverses records.
      Iterator iter = dbm.makeIterator();
      try {
        iter.first();
        while (true) {
          String[] record = iter.getString(status);
          if (!status.isOK()) {
            if (!status.equals(Status.NOT_FOUND_ERROR)) {
              throw new StatusException(status);
            }
            break;
          }
          System.out.println(record[0] + ": " + record[1]);
          iter.next();
        }
      } finally {
        // Releases the resources.
        iter.destruct();
      }

      // Closes the database.
      dbm.close().orDie();
    } finally {
      // Releases the resources.
      dbm.destruct();
    }
  }
}

The following code is a typical example of the asynchronous API. The AsyncDBM class manages a thread pool and handles database operations in the background in parallel. Each Method of AsyncDBM returns a Future object to monitor the result.

import tkrzw.*;

public class Example3 {
  public static void main(String[] args) {
    // Prepares the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true, "truncate=True,num_buckets=100");

    // Prepares the asynchronous adapter with 4 worker threads.
    AsyncDBM async= new AsyncDBM(dbm, 4);

    // Executes the Set method asynchronously.
    Future<Status> set_future = async.set("hello", "world");
    // Does something in the foreground.
    System.out.println("Setting a record");
    // Checks the result after awaiting the set operation.
    Status status = set_future.get();
    if (!status.isOK()) {
      System.out.println("ERROR: " + status.toString());
    }

    // Executes the get method asynchronously.
    Future<Status.And<String>> get_future = async.get("hello");
    // Does something in the foreground.
    System.out.println("Getting a record");
    // Checks the result after awaiting the get operation.
    Status.And<String> get_result = get_future.get();
    if (get_result.status.isOK()) {
      System.out.println("VALUE: " + get_result.value);
    }

    // Releases the asynchronous adapter.
    async.destruct();

    // Closes the database.
    dbm.close();
  }
}

The following code uses process, processMulti, and processEach methods which take callback functions to process the record efficiently. process is useful to update a record atomically according to the current value. processEach is useful to access every record in the most efficient way.

import java.util.HashMap;
import java.util.Map;
import tkrzw.*;

public class Example4 {
  public static void main(String[] args) {
    // Opens the database.
    DBM dbm = new DBM();
    dbm.open("casket.tkh", true, "truncate=True,num_buckets=100");

    // Sets records with lambda functions.
    dbm.process("doc-1", (k, v)->"Tokyo is the capital city of Japan.".getBytes(), true);
    dbm.process("doc-2", (k, v)->"Is she living in Tokyo, Japan?".getBytes(), true);
    dbm.process("doc-3", (k, v)->"She must leave Tokyo!".getBytes(), true);

    // Lowers record values.
    tkrzw.RecordProcessor lower = (key, value) -> {
      // If no matching record, None is given as the value.
      if (value == null) return null;
      // Sets the new value.
      return new String(value).toLowerCase().getBytes();
    };
    dbm.process("doc-1", lower, true);
    dbm.process("doc-2", lower, true);
    dbm.process("doc-3", lower, true);
    dbm.process("non-existent", lower, true);

    // Adds multiple records at once.
    RecordProcessor.WithKey[] ops1 = {
      new RecordProcessor.WithKey("doc-4", (k, v)->"Tokyo Go!".getBytes()),
      new RecordProcessor.WithKey("doc-5", (k, v)->"Japan Go!".getBytes()),
    };
    dbm.processMulti(ops1, true);

    // Modifies multiple records at once.
    RecordProcessor.WithKey[] ops2 = {
      new RecordProcessor.WithKey("doc-4", lower),
      new RecordProcessor.WithKey("doc-5", lower),
    };
    dbm.processMulti(ops2, true);

    // Checks the whole content.
    // This uses an external iterator and is relavively slow.
    Iterator iter = dbm.makeIterator();
    iter.first();
    while (true) {
      String[] record = iter.getString();
      if (record == null) {
        break;
      }
      System.out.println(record[0] + ": " + record[1]);
      iter.next();
    }
    iter.destruct();

    // Opertion for word counting.
    Map<String, Integer> word_counts = new HashMap<String, Integer>();
    RecordProcessor wordCounter = (key, value) -> {
      if (key == null) return null;
      String[] words = new String(value).split("\\b");
      for (String word : words) {
        if (word.length() < 1) continue;
        char c = word.charAt(0);
        if (c < 'a' || c > 'z') continue;
        int old_count = word_counts.getOrDefault(word, 0);
        word_counts.put(word, old_count + 1);
      }
      return null;
    };

    // The second parameter should be false if the value is not updated.
    dbm.processEach(wordCounter, false);
    for(Map.Entry<String, Integer> entry : word_counts.entrySet()) {
      System.out.println(entry.getKey() + ":" + entry.getValue());
    }

    // Returning RecordProcessor.REMOVE by the callbacks removes the record.
    dbm.process("doc-1", (k, v)->RecordProcessor.REMOVE, true);
    System.out.println(dbm.count());
    RecordProcessor.WithKey[] ops3 = {
      new RecordProcessor.WithKey("doc-2", (k, v)->RecordProcessor.REMOVE),
      new RecordProcessor.WithKey("doc-3", (k, v)->RecordProcessor.REMOVE),
    };
    dbm.processMulti(ops3, true);
    System.out.println(dbm.count());
    dbm.processEach((k, v)->RecordProcessor.REMOVE, true);
    System.out.println(dbm.count());

    // Closes the database.
    dbm.close().orDie();
    dbm.destruct();
  }
}
Packages
Package
Description