Ruby Binding of Tkrzw

Tkrzw: a set of implementations of DBM

Introduction

DBM (Database Manager) is a concept to store an associative array on a permanent storage. In other words, DBM allows an application program to store key-value pairs in a file and reuse them later. Each of keys and values is a string or a sequence of bytes. A key must be unique within the database and a value is associated to it. You can retrieve a stored record with its key very quickly. Thanks to simple structure of DBM, its performance can be extremely high.

Tkrzw is a library implementing DBM with various algorithms. It features high degrees of performance, concurrency, scalability and durability. The following data structures are provided.

Whereas Tkrzw is C++ library, this package provides its Ruby interface. All above data structures are available via one adapter class “DBM”. Read the homepage for details.

The following classes are mainly used.

DBM stores key-value pairs of strings. You can specify any type of objects as keys and values if they can be converted into strings. When you retreive keys and values, they are always represented as strings.

The module file is “tkrzw”, which defines the module “Tkrzw”.

require 'tkrzw'

An instance of the class “DBM” is used in order to handle a database. You can store, delete, and retrieve records with the instance. The result status of each operation is represented by an object of the class “Status”. Iterator to access access each record is implemented by the class “Iterator”.

Installation

Install the latest version of Tkrzw beforehand and get the package of the Python binding of Tkrzw. Ruby 2.5 or later is required to use this package.

Enter the directory of the extracted package then perform installation.

ruby extconf.rb
make
ruby check.rb
sudo make install

Example

The following code is a typical example to use a database. A DBM object can be used like a Hash object. The “each” iterator is useful to access each record in the database.

require 'tkrzw'

# Prepares the database.
dbm = Tkrzw::DBM.new
dbm.open("casket.tkh", true, truncate: true,num_buckets: 100)

# Sets records.
dbm["first"] = "hop"
dbm["second"] = "step"
dbm["third"] = "jump"

# Retrieves record values.
# If the operation fails, nil is returned.
p dbm["first"]
p dbm["second"]
p dbm["third"]
p dbm["fourth"]

# Checks and deletes a record.
if dbm.include?("first")
  dbm.remove("first")
end

# Traverses records.
dbm.each do |key, value|
  p key + ": " + value
end

# Closes and the database.
dbm.close

The following code is a more complex example. You should use “ensure” clauses to destruct instances of DBM and Iterator, in order to release unused resources. Even if the database is not closed, the destructor closes it implicitly. The method “or_die” throws an exception on failure so it is useful for checking errors.

require 'tkrzw'

dbm = Tkrzw::DBM.new
begin
  # Prepares the database.
  # Options are given as a Hash object.
  open_params = {
    max_page_size: 4080,
    max_branches: 256,
    key_comparator: "decimal",
    concurrent: true,
    encoding: "UTF-8",
    truncate: true,
  }
  status = dbm.open("casket.tkt", true, open_params)
  if not status.ok?
    raise Tkrzw::StatusExceptio.new(status)
  end

  # Sets records.
  # The method OrDie raises a runtime error on failure.
  dbm.set(1, "hop").or_die
  dbm.set(2, "step").or_die
  dbm.set(3, "jump").or_die

  # Retrieves records without checking errors.
  p dbm.get(1)
  p dbm.get(2)
  p dbm.get(3)
  p dbm.get(4)

  # To know the status of retrieval, give a status object to "get".
  # You can compare a status object and a status code directly.
  status = Tkrzw::Status.new
  value = dbm.get(1, status)
  printf("status: %s\n", status)
  if status == Tkrzw::Status::SUCCESS
    printf("value: %s\n", value)
  end

  # Rebuilds the database.
  # Almost the same options as the "open" method can be given.
  dbm.rebuild(align_pow: 0, max_page_size: 1024).or_die

  # Traverses records with an iterator.
  begin
    iter = dbm.make_iterator
    iter.first
    while true do
      status = Tkrzw::Status.new
      record = iter.get(status)
      break if not status.ok?
      printf("%s: %s\n", record[0], record[1])
      iter.next
    end
  ensure
    # Releases the resources.
    iter.destruct
  end

  # Closes the database.
  dbm.close.or_die
ensure
  # Releases the resources.
  dbm.destruct
end

The following code is examples to use a tree database which handles numeric keys with various key comparators. For integers, there are two ways: 1) to use the default comparator and serialize integer keys into byte sequences, or 2) to use the decimal integer comparator and represent keys as decimal integers like “123”. For real numbers, there are also two ways: 1) to use the decimal real number comparator and represents keys as decimal real numbers like “123.45”, or 2) to use the big-endian floating-point numbers comparator and serialize floating-point numbers into byte sequences.

require 'tkrzw'

# Opens a new database with the default key comparator (LexicalKeyComparator).
dbm = Tkrzw::DBM.new
open_params = {
  truncate: true,
}
status = dbm.open("casket.tkt", true, open_params).or_die

# Sets records with the key being a big-endian binary of an integer.
# e.g: "\x00\x00\x00\x00\x00\x00\x00\x31" -> "hop"
dbm.set(Tkrzw::Utility::serialize_int(1), "hop").or_die
dbm.set(Tkrzw::Utility::serialize_int(256), "step").or_die
dbm.set(Tkrzw::Utility::serialize_int(32), "jump").or_die

# Gets records with the key being a big-endian binary of an integer.
p dbm.get(Tkrzw::Utility::serialize_int(1))
p dbm.get(Tkrzw::Utility::serialize_int(256))
p dbm.get(Tkrzw::Utility::serialize_int(32))

# Lists up all records, restoring keys into integers.
iter = dbm.make_iterator
iter.first
while true do
  record = iter.get(status)
  break if not record
  printf("%d: %s\n", Tkrzw::Utility::deserialize_int(record[0]), record[1])
  iter.next
end
iter.destruct

# Closes the database.
dbm.close.or_die
dbm.destruct

# Opens a new database with the decimal integer comparator.
dbm = Tkrzw::DBM.new
open_params = {
  truncate: true,
  key_comparator: "Decimal",
}
status = dbm.open("casket.tkt", true, open_params).or_die

# Sets records with the key being a decimal string of an integer.
# e.g: "1" -> "hop"
dbm.set("1", "hop").or_die
dbm.set("256", "step").or_die
dbm.set("32", "jump").or_die

# Gets records with the key being a decimal string of an integer.
p dbm.get("1")
p dbm.get("256")
p dbm.get("32")

# Lists up all records, restoring keys into integers.
iter = dbm.make_iterator
iter.first
while true do
  record = iter.get(status)
  break if not record
  printf("%d: %s\n", record[0].to_i, record[1])
  iter.next
end
iter.destruct

# Closes the database.
dbm.close.or_die
dbm.destruct

# Opens a new database with the decimal real number comparator.
dbm = Tkrzw::DBM.new
open_params = {
  truncate: true,
  key_comparator: "RealNumber",
}
status = dbm.open("casket.tkt", true, open_params).or_die

# Sets records with the key being a decimal string of a real number.
# e.g: "1.5" -> "hop"
dbm.set("1.5", "hop").or_die
dbm.set("256.5", "step").or_die
dbm.set("32.5", "jump").or_die

# Gets records with the key being a decimal string of a real number.
p dbm.get("1.5")
p dbm.get("256.5")
p dbm.get("32.5")

# Lists up all records, restoring keys into floating-point numbers.
iter = dbm.make_iterator
iter.first
while true do
  record = iter.get(status)
  break if not record
  printf("%.3f: %s\n", record[0].to_f, record[1])
  iter.next
end
iter.destruct

# Closes the database.
dbm.close.or_die
dbm.destruct

# Opens a new database with the big-endian signed integers comparator.
dbm = Tkrzw::DBM.new
open_params = {
  truncate: true,
  key_comparator: "SignedBigEndian",
}
status = dbm.open("casket.tkt", true, open_params).or_die

# Sets records with the key being a big-endian binary of a signed integer.
# e.g: "\x00\x00\x00\x00\x00\x00\x00\x31" -> "hop"
dbm.set(Tkrzw::Utility::serialize_int(-1), "hop").or_die
dbm.set(Tkrzw::Utility::serialize_int(-256), "step").or_die
dbm.set(Tkrzw::Utility::serialize_int(-32), "jump").or_die

# Gets records with the key being a big-endian binary of a signed integer.
p dbm.get(Tkrzw::Utility::serialize_int(-1))
p dbm.get(Tkrzw::Utility::serialize_int(-256))
p dbm.get(Tkrzw::Utility::serialize_int(-32))

# Lists up all records, restoring keys into signed integers.
iter = dbm.make_iterator
iter.first
while true do
  record = iter.get(status)
  break if not record
  printf("%d: %s\n", Tkrzw::Utility::deserialize_int(record[0]), record[1])
  iter.next
end
iter.destruct

# Closes the database.
dbm.close.or_die
dbm.destruct

# Opens a new database with the big-endian floating-point numbers comparator.
dbm = Tkrzw::DBM.new
open_params = {
  truncate: true,
  key_comparator: "FloatBigEndian",
}
status = dbm.open("casket.tkt", true, open_params).or_die

# Sets records with the key being a big-endian binary of a floating-point number.
# e.g: "\x3F\xF8\x00\x00\x00\x00\x00\x00" -> "hop"
dbm.set(Tkrzw::Utility::serialize_float(1.5), "hop").or_die
dbm.set(Tkrzw::Utility::serialize_float(256.5), "step").or_die
dbm.set(Tkrzw::Utility::serialize_float(32.5), "jump").or_die

# Gets records with the key being a big-endian binary of a floating-point number.
p dbm.get(Tkrzw::Utility::serialize_float(1.5))
p dbm.get(Tkrzw::Utility::serialize_float(256.5))
p dbm.get(Tkrzw::Utility::serialize_float(32.5))

# Lists up all records, restoring keys into floating-point numbers.
iter = dbm.make_iterator
iter.first
while true do
  record = iter.get(status)
  break if not record
  printf("%f: %s\n", Tkrzw::Utility::deserialize_float(record[0]), record[1])
  iter.next
end
iter.destruct

# Closes the database.
dbm.close.or_die
dbm.destruct

The following code is a typical example of the asynchronous API. The AsyncDBM class manages a thread pool and handles database operations in the background in parallel. Each Method of AsyncDBM returns a Future object to monitor the result.

require 'tkrzw'

dbm = Tkrzw::DBM.new
dbm.open("casket.tkh", true, truncate: true, num_buckets: 100)

# Prepares the asynchronous adapter with 4 worker threads.
async = Tkrzw::AsyncDBM.new(dbm, 4)

# Executes the Set method asynchronously.
future = async.set("hello", "world")
# Does something in the foreground.
until future.wait(0)
  puts("Setting a record")
end
# Checks the result after awaiting the Set operation.
status = future.get
if status != Tkrzw::Status::SUCCESS
  puts("ERROR: " + status.to_s)
end

# Executes the Get method asynchronously.
future = async.get("hello")
# Does something in the foreground.
puts("Getting a record")
# Awaits the operation and get the result.
status, value = future.get()
if status == Tkrzw::Status::SUCCESS
  puts("VALUE: " + value)
end

# Releases the asynchronous adapter.
async.destruct

# Closes the database.
dbm.close

The following code uses process, process_multi, and process_each functions which take blocks to process the record efficiently. process is useful to update a record atomically according to the current value. process_each is useful to access every record in the most efficient way.

require 'tkrzw'

# Opens the database.
dbm = Tkrzw::DBM.new
dbm.open("casket.tkh", true, truncate: true, num_buckets: 1000)

# Sets records with blocks.
dbm.process("doc-1", true) {|key, value| "Tokyo is the capital city of Japan."}
dbm.process("doc-2", true) {|key, value| "Is she living in Tokyo, Japan?"}
dbm.process("doc-3", true) {|key, value| "She must leave Tokyo!"}

# Lowers record values.
def lower(key, value)
  # If no matching record, nil is given as the value.
  return nil if not value
  # Sets the new value.
  return value.downcase
end
dbm.process("doc-1", true) {|k, v| lower(k, v)}
dbm.process("doc-2", true) {|k, v| lower(k, v)}
dbm.process("doc-3", true) {|k, v| lower(k, v)}
dbm.process("non-existent", true){|k, v| lower(k, v)}

# If you don't update the record, set the second parameter to false.
dbm.process("doc-3", false) {|k, v| p k, v}

# Adds multiple records at once.
records = {"doc-4"=>"Tokyo Go!", "doc-5"=>"Japan Go!"}
dbm.process_multi(["doc-4", "doc-5"], true) {|k, v|
  records[k]
}

# Modifies multiple records at once.
dbm.process_multi(["doc-4", "doc-5"], true) {|k, v| lower(k, v)}

# Checks the whole content.
# This uses an external iterator and is relavively slow.
dbm.each do |key, value|
  p key + ": " + value
end

# Function for word counting.
def word_counter(key, value, counts)
  return if not key
  value.split(/\W+/).each {|word|
    counts[word] = (counts[word] or 0) + 1
  }
end
word_counts = {}

# The parameter should be false if the value is not updated.
dbm.process_each(false) {|k, v| word_counter(k, v, word_counts)}
p word_counts

# Returning False by the callbacks removes the record.
dbm.process("doc-1", true) {|k, v| false}
p dbm.count
dbm.process_multi(["doc-2", "doc-3"], true) {|k, v| false}
p dbm.count
dbm.process_each(true) {|k, v| false}
p dbm.count

# Closes the database.
dbm.close

The following code is an example to use a secondary index, which is useful to organize records by non-primary keys.

require 'tkrzw'

# Opens the incex.
index = Tkrzw::Index.new
index.open("casket.tkt", true, truncate: true, num_buckets: 100).or_die

# Adds records to the index.
# The key is a division name and the value is person name.
index.add("general", "anne").or_die
index.add("general", "matthew").or_die
index.add("general", "marilla").or_die
index.add("sales", "gilbert").or_die

# Anne moves to the sales division.
index.remove("general", "anne").or_die
index.add("sales", "anne").or_die

# Prints all members for each division.
["general", "sales"].each do |division|
  puts(division)
  members = index.get_values(division)
  members.each do |member|
    puts(" -- " + member)
  end
end

# Prints every record by iterator.
iter = index.make_iterator
iter.first
loop do
  record = iter.get
  break if not record
  puts(record[0] + ": " + record[1])
  iter.next
end
iter.destruct

# Prints every record by the iterator block
index.each do |key, value|
  puts(key + ": " + value)
end

# Closes the index.
index.close.or_die