tkrzw module

class tkrzw.Utility

Bases: object

Library utilities.

VERSION = '0.0.0'

The package version numbers.

OS_NAME = 'unknown'

The recognized OS name.

PAGE_SIZE = 4096

The size of a memory page on the OS.

INT32MIN = -2147483648

The minimum value of int32.

INT32MAX = 2147483647

The maximum value of int32.

UINT32MAX = 4294967295

The maximum value of uint32.

INT64MIN = -9223372036854775808

The minimum value of int64.

INT64MAX = 9223372036854775807

The maximum value of int64.

UINT64MAX = 18446744073709551615

The maximum value of uint64.

classmethod GetMemoryCapacity()

Gets the memory capacity of the platform.

Returns

The memory capacity of the platform in bytes, or -1 on failure.

classmethod GetMemoryUsage()

Gets the current memory usage of the process.

Returns

The current memory usage of the process in bytes, or -1 on failure.

classmethod PrimaryHash(data, num_buckets=None)

Primary hash function for the hash database.

Parameters
  • data – The data to calculate the hash value for.

  • num_buckets – The number of buckets of the hash table. If it is omitted, UINT64MAX is set.

Returns

The hash value.

classmethod SecondaryHash(data, num_shards=None)

Secondary hash function for sharding.

Parameters
  • data – The data to calculate the hash value for.

  • num_shards – The number of shards. If it is omitted, UINT64MAX is set.

Returns

The hash value.

classmethod EditDistanceLev(a, b)

Gets the Levenshtein edit distance of two Unicode strings.

Parameters
  • a – A Unicode string.

  • b – The other Unicode string.

Returns

The Levenshtein edit distance of the two strings.

class tkrzw.Status(code=0, message='')

Bases: object

Status of operations.

SUCCESS = 0

Success.

UNKNOWN_ERROR = 1

Generic error whose cause is unknown.

SYSTEM_ERROR = 2

Generic error from underlying systems.

NOT_IMPLEMENTED_ERROR = 3

Error that the feature is not implemented.

PRECONDITION_ERROR = 4

Error that a precondition is not met.

INVALID_ARGUMENT_ERROR = 5

Error that a given argument is invalid.

CANCELED_ERROR = 6

Error that the operation is canceled.

NOT_FOUND_ERROR = 7

Error that a specific resource is not found.

PERMISSION_ERROR = 8

Error that the operation is not permitted.

INFEASIBLE_ERROR = 9

Error that the operation is infeasible.

DUPLICATION_ERROR = 10

Error that a specific resource is duplicated.

BROKEN_DATA_ERROR = 11

Error that internal data are broken.

NETWORK_ERROR = 12

Error caused by networking failure.

APPLICATION_ERROR = 13

Generic error caused by the application logic.

__init__(code=0, message='')

Sets the code and the message.

Parameters
  • code – The status code. This can be omitted and then SUCCESS is set.

  • message – An arbitrary status message. This can be omitted and the an empty string is set.

__repr__()

Returns a string representation of the object.

Returns

The string representation of the object.

__str__()

Returns a string representation of the content.

Returns

The string representation of the content.

__eq__(rhs)

Returns true if the given object is equivalent to this object.

Returns

True if the given object is equivalent to this object.

This supports comparison between a status object and a status code number.

Set(code=0, message='')

Sets the code and the message.

Parameters
  • code – The status code. This can be omitted and then SUCCESS is set.

  • message – An arbitrary status message. This can be omitted and the an empty string is set.

Join(rht)

Assigns the internal state from another status object only if the current state is success.

Parameters

rhs – The status object.

GetCode()

Gets the status code.

Returns

The status code.

GetMessage()

Gets the status message.

Returns

The status message.

IsOK()

Returns true if the status is success.

Returns

True if the status is success, or False on failure.

OrDie()

Raises an exception if the status is not success.

Raises

StatusException – An exception containing the status object.

classmethod CodeName(code)

Gets the string name of a status code.

Param

code The status code.

Returns

The name of the status code.

__hash__ = None
class tkrzw.Future

Bases: object

Future containing a status object and extra data.

Future objects are made by methods of AsyncDBM. Every future object should be destroyed by the “Destruct” method or the “Get” method to free resources. This class implements the awaitable protocol so an instance is usable with the “await” sentence.

__init__()

The constructor cannot be called directly. Use methods of AsyncDBM.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns a string representation of the content.

Returns

The string representation of the content.

__await__()

Waits for the operation to be done and returns an iterator.

Returns

The iterator which stops immediately.

Wait(timeout=- 1)

Waits for the operation to be done.

Parameters

timeout – The waiting time in seconds. If it is negative, no timeout is set.

Returns

True if the operation has done. False if timeout occurs.

Get()

Waits for the operation to be done and gets the result status.

Returns

The result status and extra data if any. The existence and the type of extra data depends on the operation which makes the future. For DBM#Get, a tuple of the status and the retrieved value is returned. For DBM#Set and DBM#Remove, the status object itself is returned.

The internal resource is released by this method. “Wait” and “Get” cannot be called after calling this method.

exception tkrzw.StatusException(status)

Bases: RuntimeError

Exception to convey the status of operations.

__init__(status)

Sets the status.

Parameters

status – The status object.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns A string representation of the content.

Returns

The string representation of the content.

GetStatus()

Gets the status object

Returns

The status object.

class tkrzw.DBM

Bases: object

Polymorphic database manager.

All operations except for Open and Close are thread-safe; Multiple threads can access the same database concurrently. You can specify a data structure when you call the Open method. Every opened database must be closed explicitly by the Close method to avoid data corruption. This class implements the iterable protocol so an instance is usable with “for-in” loop.

ANY_DATA = b'\x00[ANY]\x00'

The special bytes value for no-operation or any data.

__init__()

Does nothing especially.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns A string representation of the content.

Returns

The string representation of the content.

__len__()

Gets the number of records, to enable the len operator.

Returns

The number of records on success, or 0 on failure.

__getitem__(key)

Gets the value of a record, to enable the [] operator.

Parameters

key – The key of the record.

Returns

The value of the matching record. An exception is raised for missing records. If the given key is a string, the returned value is also a string. Otherwise, the return value is bytes.

Raises

StatusException – An exception containing the status object.

__contains__(key)

Checks if a record exists or not, to enable the in operator.

Parameters

key – The key of the record.

Returns

True if the record exists, or False if not. No exception is raised for missing records.

__setitem__(key, value)

Sets a record of a key and a value, to enable the []= operator.

Parameters
  • key – The key of the record.

  • value – The value of the record.

Raises

StatusException – An exception containing the status object.

__delitem__(key)

Removes a record of a key, to enable the del [] operator.

Parameters

key – The key of the record.

Raises

StatusException – An exception containing the status object.

__iter__()

Makes an iterator and initialize it, to comply to the iterator protocol.

Returns

The iterator for each record.

Open(path, writable, **params)

Opens a database file.

Parameters
  • path – A path of the file.

  • writable – If true, the file is writable. If false, it is read-only.

  • params – Optional keyword parameters.

Returns

The result status.

The extension of the path indicates the type of the database.
  • .tkh : File hash database (HashDBM)

  • .tkt : File tree database (TreeDBM)

  • .tks : File skip database (SkipDBM)

  • .tkmt : On-memory hash database (TinyDBM)

  • .tkmb : On-memory tree database (BabyDBM)

  • .tkmc : On-memory cache database (CacheDBM)

  • .tksh : On-memory STL hash database (StdHashDBM)

  • .tkst : On-memory STL tree database (StdTreeDBM)

The optional parameters can include an option for the concurrency tuning. By default, database operatins are done under the GIL (Global Interpreter Lock), which means that database operations are not done concurrently even if you use multiple threads. If the “concurrent” parameter is true, database operations are done outside the GIL, which means that database operations can be done concurrently if you use multiple threads. However, the downside is that swapping thread data is costly so the actual throughput is often worse in the concurrent mode than in the normal mode. Therefore, the concurrent mode should be used only if the database is huge and it can cause blocking of threads in multi-thread usage.

The optional parameters can include options for the file opening operation.
  • truncate (bool): True to truncate the file.

  • no_create (bool): True to omit file creation.

  • no_wait (bool): True to fail if the file is locked by another process.

  • no_lock (bool): True to omit file locking.

  • sync_hard (bool): True to do physical synchronization when closing.

The optional parameter “dbm” supercedes the decision of the database type by the extension. The value is the type name: “HashDBM”, “TreeDBM”, “SkipDBM”, “TinyDBM”, “BabyDBM”, “CacheDBM”, “StdHashDBM”, “StdTreeDBM”.

The optional parameter “file” specifies the internal file implementation class. The default file class is “MemoryMapAtomicFile”. The other supported classes are “StdFile”, “MemoryMapAtomicFile”, “PositionalParallelFile”, and “PositionalAtomicFile”.

For HashDBM, these optional parameters are supported.
  • update_mode (string): How to update the database file: “UPDATE_IN_PLACE” for the in-palce or “UPDATE_APPENDING” for the appending mode.

  • record_crc_mode (string): How to add the CRC data to the record: “RECORD_CRC_NONE” to add no CRC to each record, “RECORD_CRC_8” to add CRC-8 to each record, “RECORD_CRC_16” to add CRC-16 to each record, or “RECORD_CRC_32” to add CRC-32 to each record.

  • record_comp_mode (string): How to compress the record data: “RECORD_COMP_NONE” to do no compression, “RECORD_COMP_ZLIB” to compress with ZLib, “RECORD_COMP_ZSTD” to compress with ZStd, “RECORD_COMP_LZ4” to compress with LZ4, “RECORD_COMP_LZMA” to compress with LZMA, “RECORD_COMP_RC4” to cipher with RC4, “RECORD_COMP_AES” to cipher with AES.

  • offset_width (int): The width to represent the offset of records.

  • align_pow (int): The power to align records.

  • num_buckets (int): The number of buckets for hashing.

  • restore_mode (string): How to restore the database file: “RESTORE_SYNC” to restore to the last synchronized state, “RESTORE_READ_ONLY” to make the database read-only, or “RESTORE_NOOP” to do nothing. By default, as many records as possible are restored.

  • fbp_capacity (int): The capacity of the free block pool.

  • min_read_size (int): The minimum reading size to read a record.

  • cache_buckets (bool): True to cache the hash buckets on memory.

  • cipher_key (string): The encryption key for cipher compressors.

For TreeDBM, all optional parameters for HashDBM are available. In addition, these optional parameters are supported.
  • max_page_size (int): The maximum size of a page.

  • max_branches (int): The maximum number of branches each inner node can have.

  • max_cached_pages (int): The maximum number of cached pages.

  • page_update_mode (string): What to do when each page is updated: “PAGE_UPDATE_NONE” is to do no operation or “PAGE_UPDATE_WRITE” is to write immediately.

  • key_comparator (string): The comparator of record keys: “LexicalKeyComparator” for the lexical order, “LexicalCaseKeyComparator” for the lexical order ignoring case, “DecimalKeyComparator” for the order of the decimal integer numeric expressions, “HexadecimalKeyComparator” for the order of the hexadecimal integer numeric expressions, “RealNumberKeyComparator” for the order of the decimal real number expressions.

For SkipDBM, these optional parameters are supported.
  • offset_width (int): The width to represent the offset of records.

  • step_unit (int): The step unit of the skip list.

  • max_level (int): The maximum level of the skip list.

  • restore_mode (string): How to restore the database file: “RESTORE_SYNC” to restore to the last synchronized state or “RESTORE_NOOP” to do nothing make the database read-only. By default, as many records as possible are restored.

  • sort_mem_size (int): The memory size used for sorting to build the database in the at-random mode.

  • insert_in_order (bool): If true, records are assumed to be inserted in ascending order of the key.

  • max_cached_records (int): The maximum number of cached records.

For TinyDBM, these optional parameters are supported.
  • num_buckets (int): The number of buckets for hashing.

For BabyDBM, these optional parameters are supported.
  • key_comparator (string): The comparator of record keys. The same ones as TreeDBM.

For CacheDBM, these optional parameters are supported.
  • cap_rec_num (int): The maximum number of records.

  • cap_mem_size (int): The total memory size to use.

All databases support taking update logs into files. It is enabled by setting the prefix of update log files.
  • ulog_prefix (str): The prefix of the update log files.

  • ulog_max_file_size (num): The maximum file size of each update log file. By default, it is 1GiB.

  • ulog_server_id (num): The server ID attached to each log. By default, it is 0.

  • ulog_dbm_index (num): The DBM index attached to each log. By default, it is 0.

For the file “PositionalParallelFile” and “PositionalAtomicFile”, these optional parameters are supported.
  • block_size (int): The block size to which all blocks should be aligned.

  • access_options (str): Values separated by colon. “direct” for direct I/O. “sync” for synchrnizing I/O, “padding” for file size alignment by padding, “pagecache” for the mini page cache in the process.

If the optional parameter “num_shards” is set, the database is sharded into multiple shard files. Each file has a suffix like “-00003-of-00015”. If the value is 0, the number of shards is set by patterns of the existing files, or 1 if they doesn’t exist.

Close()

Closes the database file.

Returns

The result status.

Process(key, func, writable)

Processes a record with an arbitrary function.

Parameters
  • key – The key of the record.

  • func – The function to process a record. The first parameter is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.

  • writable – True if the processor can edit the record.

Returns

The result status.

This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.

Get(key, status=None)

Gets the value of a record of a key.

Parameters
  • key – The key of the record.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The bytes value of the matching record or None on failure.

GetStr(key, status=None)

Gets the value of a record of a key, as a string.

Parameters
  • key – The key of the record.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The string value of the matching record or None on failure.

GetMulti(*keys)

Gets the values of multiple records of keys.

Parameters

keys – The keys of records to retrieve.

Returns

A map of retrieved records. Keys which don’t match existing records are ignored.

GetMultiStr(*keys)

Gets the values of multiple records of keys, as strings.

Parameters

keys – The keys of records to retrieve.

Returns

A map of retrieved records. Keys which don’t match existing records are ignored.

Set(key, value, overwrite=True)

Sets a record of a key and a value.

Parameters
  • key – The key of the record.

  • value – The value of the record.

  • overwrite – Whether to overwrite the existing value. It can be omitted and then false is set.

Returns

The result status. If overwriting is abandoned, DUPLICATION_ERROR is returned.

SetMulti(overwrite=True, **records)

Sets multiple records of the keyword arguments.

Parameters
  • overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.

  • records – Records to store, specified as keyword parameters.

Returns

The result status. If there are records avoiding overwriting, DUPLICATION_ERROR is returned.

SetAndGet(key, value, overwrite=True)

Sets a record and get the old value.

Parameters
  • key – The key of the record.

  • value – The value of the record.

  • overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.

Returns

A pair of the result status and the old value. If the record has not existed when inserting the new record, None is assigned as the value. If not None, the type of the returned old value is the same as the parameter value.

Remove(key)

Removes a record of a key.

Parameters

key – The key of the record.

Returns

The result status. If there’s no matching record, NOT_FOUND_ERROR is returned.

RemoveMulti(keys)

Removes records of keys.

Parameters

key – The keys of the records.

Returns

The result status. If there are missing records, NOT_FOUND_ERROR is returned.

RemoveAndGet(key)

Removes a record and get the value.

Parameters

key – The key of the record.

Returns

A pair of the result status and the record value. If the record does not exist, None is assigned as the value. If not None, the type of the returned value is the same as the parameter key.

Append(key, value, delim='')

Appends data at the end of a record of a key.

Parameters
  • key – The key of the record.

  • value – The value to append.

  • delim – The delimiter to put after the existing record.

Returns

The result status.

If there’s no existing record, the value is set without the delimiter.

AppendMulti(delim='', **records)

Appends data to multiple records of the keyword arguments.

Parameters
  • delim – The delimiter to put after the existing record.

  • records – Records to append, specified as keyword parameters.

Returns

The result status.

If there’s no existing record, the value is set without the delimiter.

CompareExchange(key, expected, desired)

Compares the value of a record and exchanges if the condition meets.

Parameters
  • key – The key of the record.

  • expected – The expected value. If it is None, no existing record is expected. If it is ANY_DATA, an existing record with any value is expacted.

  • desired – The desired value. If it is None, the record is to be removed. If it is ANY_DATA, no update is done.

Returns

The result status. If the condition doesn’t meet, INFEASIBLE_ERROR is returned.

CompareExchangeAndGet(key, expected, desired)

Does compare-and-exchange and/or gets the old value of the record.

Parameters
  • key – The key of the record.

  • expected – The expected value. If it is None, no existing record is expected. If it is ANY_DATA, an existing record with any value is expacted.

  • desired – The desired value. If it is None, the record is to be removed. If it is ANY_DATA, no update is done.

Returns

A pair of the result status and the.old value of the record. If the condition doesn’t meet, the state is INFEASIBLE_ERROR. If there’s no existing record, the value is None. If not None, the type of the returned old value is the same as the expected or desired value.

Increment(key, inc=1, init=0, status=None)

Increments the numeric value of a record.

Parameters
  • key – The key of the record.

  • inc – The incremental value. If it is Utility.INT64MIN, the current value is not changed and a new record is not created.

  • init – The initial value.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The current value, or None on failure.

The record value is stored as an 8-byte big-endian integer. Negative is also supported.

CompareExchangeMulti(expected, desired)

Compares the values of records and exchanges if the condition meets.

Parameters
  • expected – A sequence of pairs of the record keys and their expected values. If the value is None, no existing record is expected. If the value is ANY_DATA, an existing record with any value is expacted.

  • desired – A sequence of pairs of the record keys and their desired values. If the value is None, the record is to be removed.

Returns

The result status. If the condition doesn’t meet, INFEASIBLE_ERROR is returned.

Rekey(new_key, overwrite=True, copying=False)

Changes the key of a record.

Parameters
  • old_key – The old key of the record.

  • new_key – The new key of the record.

  • overwrite – Whether to overwrite the existing record of the new key.

  • copying – Whether to retain the record of the old key.

Returns

The result status. If there’s no matching record to the old key, NOT_FOUND_ERROR is returned. If the overwrite flag is false and there is an existing record of the new key, DUPLICATION ERROR is returned.

This method is done atomically. The other threads observe that the record has either the old key or the new key. No intermediate states are observed.

PopFirst(status=None)

Gets the first record and removes it.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the bytes key and the bytes value of the first record. On failure, None is returned.

PopFirstStr(status=None)

Gets the first record as strings and removes it.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the string key and the string value of the first record. On failure, None is returned.

PushLast(value, wtime=None)

Adds a record with a key of the current timestamp.

Parameters
  • value – The value of the record.

  • wtime – The current wall time used to generate the key. If it is None, the system clock is used.

Returns

The result status.

The key is generated as an 8-bite big-endian binary string of the timestamp. If there is an existing record matching the generated key, the key is regenerated and the attempt is repeated until it succeeds.

ProcessEach(func, writable)

Processes each and every record in the database with an arbitrary function.

Parameters
  • func – The function to process a record. The first parameter is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.

  • writable – True if the processor can edit the record.

Returns

The result status.

The given function is called repeatedly for each record. It is also called once before the iteration and once after the iteration with both the key and the value being None. This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.

Count()

Gets the number of records.

Returns

The number of records on success, or None on failure.

GetFileSize()

Gets the current file size of the database.

Returns

The current file size of the database, or None on failure.

GetFilePath()

Gets the path of the database file.

Returns

The file path of the database, or None on failure.

GetTimestamp()

Gets the timestamp in seconds of the last modified time.

Returns

The timestamp of the last modified time, or None on failure.

Clear()

Removes all records.

Returns

The result status.

Rebuild(**params)

Rebuilds the entire database.

Parameters

params – Optional keyword parameters.

Returns

The result status.

The optional parameters are the same as the Open method. Omitted tuning parameters are kept the same or implicitly optimized.

In addition, HashDBM, TreeDBM, and SkipDBM supports the following parameters.
  • skip_broken_records (bool): If true, the operation continues even if there are broken records which can be skipped.

  • sync_hard (bool): If true, physical synchronization with the hardware is done before finishing the rebuilt file.

ShouldBeRebuilt()

Checks whether the database should be rebuilt.

Returns

True to be optimized or false with no necessity.

Synchronize(hard, **params)

Synchronizes the content of the database to the file system.

Parameters
  • hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.

  • params – Optional keyword parameters.

Returns

The result status.

Only SkipDBM uses the optional parameters. The “merge” parameter specifies paths of databases to merge, separated by colon. The “reducer” parameter specifies the reducer to apply to records of the same key. “ReduceToFirst”, “ReduceToSecond”, “ReduceToLast”, etc are supported.

CopyFileData(dest_path, sync_hard=False)

Copies the content of the database file to another file.

Parameters
  • dest_path – A path to the destination file.

  • sync_hard – True to do physical synchronization with the hardware.

Returns

The result status.

Export(dest_dbm)

Exports all records to another database.

Parameters

dest_dbm – The destination database.

Returns

The result status.

ExportToFlatRecords(dest_file)

Exports all records of a database to a flat record file.

Parameters

dest_file – The file object to write records in.

Returns

The result status.

A flat record file contains a sequence of binary records without any high level structure so it is useful as a intermediate file for data migration.

ImportFromFlatRecords(src_file)

Imports records to a database from a flat record file.

Parameters

src_file – The file object to read records from.

Returns

The result status.

ExportKeysAsLines(dest_file)

Exports the keys of all records as lines to a text file.

Parameters

dest_file – The file object to write keys in.

Returns

The result status.

As the exported text file is smaller than the database file, scanning the text file by the search method is often faster than scanning the whole database.

Inspect()

Inspects the database.

Returns

A map of property names and their values.

IsOpen()

Checks whether the database is open.

Returns

True if the database is open, or false if not.

IsWritable()

Checks whether the database is writable.

Returns

True if the database is writable, or false if not.

IsHealthy()

Checks whether the database condition is healthy.

Returns

True if the database condition is healthy, or false if not.

IsOrdered()

Checks whether ordered operations are supported.

Returns

True if ordered operations are supported, or false if not.

Search(mode, pattern, capacity=0)

Searches the database and get keys which match a pattern.

Parameters
  • mode – The search mode. “contain” extracts keys containing the pattern. “begin” extracts keys beginning with the pattern. “end” extracts keys ending with the pattern. “regex” extracts keys partially matches the pattern of a regular expression. “edit” extracts keys whose edit distance to the UTF-8 pattern is the least. “editbin” extracts keys whose edit distance to the binary pattern is the least. “containcase”, “containword”, and “containcaseword” extract keys considering case and word boundary. Ordered databases support “upper” and “lower” which extract keys whose positions are upper/lower than the pattern. “upperinc” and “lowerinc” are their inclusive versions.

  • pattern – The pattern for matching.

  • capacity – The maximum records to obtain. 0 means unlimited.

Returns

A list of string keys matching the condition.

MakeIterator()

Makes an iterator for each record.

Returns

The iterator for each record.

classmethod RestoreDatabase(old_file_path, new_file_path, class_name='', end_offset=- 1, cipher_key='')

Restores a broken database as a new healthy database.

Parameters
  • old_file_path – The path of the broken database.

  • new_file_path – The path of the new database to be created.

  • class_name – The name of the database class. If it is None or empty, the class is guessed from the file extension.

  • end_offset – The exclusive end offset of records to read. Negative means unlimited. 0 means the size when the database is synched or closed properly. Using a positive value is not meaningful if the number of shards is more than one.

  • cipher_key – The encryption key for cipher compressors.

Returns

The result status.

class tkrzw.Iterator(dbm)

Bases: object

Iterator for each record.

__init__(dbm)

Initializes the iterator.

Parameters

dbm – The database to scan.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns A string representation of the content.

Returns

The string representation of the content.

__next__()

Moves the iterator to the next record, to comply to the iterator protocol.

Returns

A tuple of The key and the value of the current record.

First()

Initializes the iterator to indicate the first record.

Returns

The result status.

Even if there’s no record, the operation doesn’t fail.

Last()

Initializes the iterator to indicate the last record.

Returns

The result status.

Even if there’s no record, the operation doesn’t fail. This method is suppoerted only by ordered databases.

Jump(key)

Initializes the iterator to indicate a specific record.

Parameters

key – The key of the record to look for.

Returns

The result status.

Ordered databases can support “lower bound” jump; If there’s no record with the same key, the iterator refers to the first record whose key is greater than the given key. The operation fails with unordered databases if there’s no record with the same key.

JumpLower(key, inclusive=False)

Initializes the iterator to indicate the last record whose key is lower than a given key.

Parameters
  • key – The key to compare with.

  • inclusive – If true, the considtion is inclusive: equal to or lower than the key.

Returns

The result status.

Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.

JumpUpper(key, inclusive=False)

Initializes the iterator to indicate the first record whose key is upper than a given key.

Parameters
  • key – The key to compare with.

  • inclusive – If true, the considtion is inclusive: equal to or upper than the key.

Returns

The result status.

Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.

Next()

Moves the iterator to the next record.

Returns

The result status.

If the current record is missing, the operation fails. Even if there’s no next record, the operation doesn’t fail.

Previous()

Moves the iterator to the previous record.

Returns

The result status.

If the current record is missing, the operation fails. Even if there’s no previous record, the operation doesn’t fail. This method is suppoerted only by ordered databases.

Get(status=None)

Gets the key and the value of the current record of the iterator.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.

GetStr(status=None)

Gets the key and the value of the current record of the iterator, as strings.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the string key and the string value of the current record. On failure, None is returned.

GetKey(status=None)

Gets the key of the current record.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

The bytes key of the current record or None on failure.

GetKeyStr(status=None)

Gets the key of the current record, as a string.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

The string key of the current record or None on failure.

GetValue(status=None)

Gets the value of the current record.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

The bytes value of the current record or None on failure.

GetValueStr(status=None)

Gets the value of the current record, as a string.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

The string value of the current record or None on failure.

Set(value)

Sets the value of the current record.

Parameters

value – The value of the record.

Returns

The result status.

Remove()

Removes the current record.

Returns

The result status.

Step(status=None)

Gets the current record and moves the iterator to the next record.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.

StepStr(status=None)

Gets the current record and moves the iterator to the next record, as strings.

Parameters

status – A status object to which the result status is assigned. It can be omitted.

Returns

A tuple of the string key and the string value of the current record. On failure, None is returned.

class tkrzw.AsyncDBM(dbm, num_worker_threads)

Bases: object

Asynchronous database manager adapter.

This class is a wrapper of DBM for asynchronous operations. A task queue with a thread pool is used inside. Every method except for the constructor and the destructor is run by a thread in the thread pool and the result is set in the future oject of the return value. The caller can ignore the future object if it is not necessary. The Destruct method waits for all tasks to be done. Therefore, the destructor should be called before the database is closed.

__init__(dbm, num_worker_threads)

Sets up the task queue.

Parameters
  • dbm – A database object which has been opened.

  • num_worker_threads – The number of threads in the internal thread pool.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns a string representation of the content.

Returns

The string representation of the content.

Destruct()

Destructs the asynchronous database adapter.

This method waits for all tasks to be done.

Get(key)

Gets the value of a record of a key.

Parameters

key – The key of the record.

Returns

The future for the result status and the bytes value of the matching record.

GetStr(key)

Gets the value of a record of a key, as a string.

Parameters

key – The key of the record.

Returns

The future for the result status and the string value of the matching record.

GetMulti(*keys)

Gets the values of multiple records of keys.

Parameters

keys – The keys of records to retrieve.

Returns

The future for the result status and a map of retrieved records. Keys which don’t match existing records are ignored.

GetMultiStr(*keys)

Gets the values of multiple records of keys, as strings.

Parameters

keys – The keys of records to retrieve.

Returns

The future for the result status and a map of retrieved records. Keys which don’t match existing records are ignored.

Set(key, value, overwrite=True)

Sets a record of a key and a value.

Parameters
  • key – The key of the record.

  • value – The value of the record.

  • overwrite – Whether to overwrite the existing value. It can be omitted and then false is set.

Returns

The future for the result status. If overwriting is abandoned, DUPLICATION_ERROR is set.

SetMulti(overwrite=True, **records)

Sets multiple records of the keyword arguments.

Parameters
  • overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.

  • records – Records to store, specified as keyword parameters.

Returns

The future for the result status. If overwriting is abandoned, DUPLICATION_ERROR is set.

Append(key, value, delim='')

Appends data at the end of a record of a key.

Parameters
  • key – The key of the record.

  • value – The value to append.

  • delim – The delimiter to put after the existing record.

Returns

The future for the result status.

If there’s no existing record, the value is set without the delimiter.

AppendMulti(delim='', **records)

Appends data to multiple records of the keyword arguments.

Parameters
  • delim – The delimiter to put after the existing record.

  • records – Records to append, specified as keyword parameters.

Returns

The future for the result status.

If there’s no existing record, the value is set without the delimiter.

CompareExchange(key, expected, desired)

Compares the value of a record and exchanges if the condition meets.

Parameters
  • key – The key of the record.

  • expected – The expected value. If it is None, no existing record is expected. If it is DBM.ANY_DATA, an existing record with any value is expacted.

  • desired – The desired value. If it is None, the record is to be removed. If it is None, the record is to be removed. If it is DBM.ANY_DATA, no update is done.

Returns

The future for the result status. If the condition doesn’t meet, INFEASIBLE_ERROR is set.

Increment(key, inc=1, init=0)

Increments the numeric value of a record.

Parameters
  • key – The key of the record.

  • inc – The incremental value. If it is Utility.INT64MIN, the current value is not changed and a new record is not created.

  • init – The initial value.

Returns

The future for the result status and the current value.

The record value is stored as an 8-byte big-endian integer. Negative is also supported.

ProcessMulti(key_func_pairs, writable)

Processes multiple records with arbitrary functions.

Parameters
  • key_func_pairs – A list of pairs of keys and their functions. The first parameter of the function is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.

  • writable – True if the processors can edit the record.

Returns

The result status.

This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.

CompareExchangeMulti(expected, desired)

Compares the values of records and exchanges if the condition meets.

Parameters
  • expected – A sequence of pairs of the record keys and their expected values. If the value is None, no existing record is expected. If the value is DBM.ANY_DATA, an existing record with any value is expacted.

  • desired – A sequence of pairs of the record keys and their desired values. If the value is None, the record is to be removed.

Returns

The future for the result status. If the condition doesn’t meet, INFEASIBLE_ERROR is set.

Rekey(new_key, overwrite=True, copying=False)

Changes the key of a record.

Parameters
  • old_key – The old key of the record.

  • new_key – The new key of the record.

  • overwrite – Whether to overwrite the existing record of the new key.

  • copying – Whether to retain the record of the old key.

Returns

The future for the result status. If there’s no matching record to the old key, NOT_FOUND_ERROR is set. If the overwrite flag is false and there is an existing record of the new key, DUPLICATION ERROR is set.

This method is done atomically. The other threads observe that the record has either the old key or the new key. No intermediate states are observed.

PopFirst()

Gets the first record and removes it.

Returns

The future for a tuple of the result status, the bytes key, and the bytes value of the first record.

PopFirstStr()

Gets the first record as strings and removes it.

Returns

The future for a tuple of the result status, the string key, and the string value of the first record.

PushLast(value, wtime=None)

Adds a record with a key of the current timestamp.

Parameters
  • value – The value of the record.

  • wtime – The current wall time used to generate the key. If it is None, the system clock is used.

Returns

The future for the result status.

The key is generated as an 8-bite big-endian binary string of the timestamp. If there is an existing record matching the generated key, the key is regenerated and the attempt is repeated until it succeeds.

Clear()

Removes all records.

Returns

The future for the result status.

Rebuild(**params)

Rebuilds the entire database.

Parameters

params – Optional keyword parameters.

Returns

The future for the result status.

The parameters work in the same way as with DBM::Rebuild.

Synchronize(hard, **params)

Synchronizes the content of the database to the file system.

Parameters
  • hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.

  • params – Optional keyword parameters.

Returns

The future for the result status.

The parameters work in the same way as with DBM::Synchronize.

CopyFileData(dest_path, sync_hard=False)

Copies the content of the database file to another file.

Parameters
  • dest_path – A path to the destination file.

  • sync_hard – True to do physical synchronization with the hardware.

Returns

The future for the result status.

Export(dest_dbm)

Exports all records to another database.

Parameters

dest_dbm – The destination database. The lefetime of the database object must last until the task finishes.

Returns

The future for the result status.

ExportToFlatRecords(dest_file)

Exports all records of a database to a flat record file.

Parameters

dest_file – The file object to write records in. The lefetime of the file object must last until the task finishes.

Returns

The future for the result status.

A flat record file contains a sequence of binary records without any high level structure so it is useful as a intermediate file for data migration.

ImportFromFlatRecords(src_file)

Imports records to a database from a flat record file.

Parameters

src_file – The file object to read records from. The lefetime of the file object must last until the task finishes.

Returns

The future for the result status.

Search(mode, pattern, capacity=0)

Searches the database and get keys which match a pattern.

Parameters
  • mode – The search mode. “contain” extracts keys containing the pattern. “begin” extracts keys beginning with the pattern. “end” extracts keys ending with the pattern. “regex” extracts keys partially matches the pattern of a regular expression. “edit” extracts keys whose edit distance to the UTF-8 pattern is the least. “editbin” extracts keys whose edit distance to the binary pattern is the least.

  • pattern – The pattern for matching.

  • capacity – The maximum records to obtain. 0 means unlimited.

Returns

The future for the result status and a list of keys matching the condition.

class tkrzw.File

Bases: object

Generic file implementation.

All operations except for “open” and “close” are thread-safe; Multiple threads can access the same file concurrently. You can specify a concrete class when you call the “open” method. Every opened file must be closed explicitly by the “close” method to avoid data corruption.

__init__()

Initializes the file object.

__repr__()

Returns A string representation of the object.

Returns

The string representation of the object.

__str__()

Returns A string representation of the content.

Returns

The string representation of the content.

Open(path, writable, **params)

Opens a file.

Parameters
  • path – A path of the file.

  • writable – If true, the file is writable. If false, it is read-only.

  • params – Optional keyword parameters.

Returns

The result status.

The optional parameters can include an option for the concurrency tuning. By default, database operatins are done under the GIL (Global Interpreter Lock), which means that database operations are not done concurrently even if you use multiple threads. If the “concurrent” parameter is true, database operations are done outside the GIL, which means that database operations can be done concurrently if you use multiple threads. However, the downside is that swapping thread data is costly so the actual throughput is often worse in the concurrent mode than in the normal mode. Therefore, the concurrent mode should be used only if the database is huge and it can cause blocking of threads in multi-thread usage.

The optional parameters can include options for the file opening operation.
  • truncate (bool): True to truncate the file.

  • no_create (bool): True to omit file creation.

  • no_wait (bool): True to fail if the file is locked by another process.

  • no_lock (bool): True to omit file locking.

  • sync_hard (bool): True to do physical synchronization when closing.

The optional parameter “file” specifies the internal file implementation class. The default file class is “MemoryMapAtomicFile”. The other supported classes are “StdFile”, “MemoryMapAtomicFile”, “PositionalParallelFile”, and “PositionalAtomicFile”.

For the file “PositionalParallelFile” and “PositionalAtomicFile”, these optional parameters are supported.
  • block_size (int): The block size to which all blocks should be aligned.

  • access_options (str): Values separated by colon. “direct” for direct I/O. “sync” for synchrnizing I/O, “padding” for file size alignment by padding, “pagecache” for the mini page cache in the process.

Close()

Closes the file.

Returns

The result status.

Read(off, size, status=None)

Reads data.

Parameters
  • off – The offset of a source region.

  • size – The size to be read.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The bytes value of the read data or None on failure.

ReadStr(off, size, status=None)

Reads data as a string.

Parameters
  • off – The offset of a source region.

  • size – The size to be read.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The string value of the read data or None on failure.

Write(off, data)

Writes data.

Parameters
  • off – The offset of the destination region.

  • data – The data to write.

Returns

The result status.

Append(data, status=None)

Appends data at the end of the file.

Parameters
  • data – The data to write.

  • status – A status object to which the result status is assigned. It can be omitted.

Returns

The offset at which the data has been put, or None on failure.

Truncate(size)

Truncates the file.

Parameters

size – The new size of the file.

Returns

The result status.

If the file is shrunk, data after the new file end is discarded. If the file is expanded, null codes are filled after the old file end.

Synchronize(hard, off=0, size=0)

Synchronizes the content of the file to the file system.

Parameters
  • hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.

  • off – The offset of the region to be synchronized.

  • size – The size of the region to be synchronized. If it is zero, the length to the end of file is specified.

Returns

The result status.

The pysical file size can be larger than the logical size in order to improve performance by reducing frequency of allocation. Thus, you should call this function before accessing the file with external tools.

GetSize()

Gets the size of the file.

Returns

The size of the file or None on failure.

GetPath()

Gets the path of the file.

Returns

The path of the file or None on failure.

Search(mode, pattern, capacity=0)

Searches the file and get lines which match a pattern.

Parameters
  • mode – The search mode. “contain” extracts lines containing the pattern. “begin” extracts lines beginning with the pattern. “end” extracts lines ending with the pattern. “regex” extracts lines partially matches the pattern of a regular expression. “edit” extracts lines whose edit distance to the UTF-8 pattern is the least. “editbin” extracts lines whose edit distance to the binary pattern is the least.

  • pattern – The pattern for matching.

  • capacity – The maximum records to obtain. 0 means unlimited.

Returns

A list of lines matching the condition.