tkrzw module¶
-
class
tkrzw.
Utility
¶ Bases:
object
Library utilities.
-
VERSION
= '0.0.0'¶ The package version numbers.
-
OS_NAME
= 'unknown'¶ The recognized OS name.
-
PAGE_SIZE
= 4096¶ The size of a memory page on the OS.
-
INT32MIN
= -2147483648¶ The minimum value of int32.
-
INT32MAX
= 2147483647¶ The maximum value of int32.
-
UINT32MAX
= 4294967295¶ The maximum value of uint32.
-
INT64MIN
= -9223372036854775808¶ The minimum value of int64.
-
INT64MAX
= 9223372036854775807¶ The maximum value of int64.
-
UINT64MAX
= 18446744073709551615¶ The maximum value of uint64.
-
classmethod
GetMemoryCapacity
()¶ Gets the memory capacity of the platform.
- Returns
The memory capacity of the platform in bytes, or -1 on failure.
-
classmethod
GetMemoryUsage
()¶ Gets the current memory usage of the process.
- Returns
The current memory usage of the process in bytes, or -1 on failure.
-
classmethod
PrimaryHash
(data, num_buckets=None)¶ Primary hash function for the hash database.
- Parameters
data – The data to calculate the hash value for.
num_buckets – The number of buckets of the hash table. If it is omitted, UINT64MAX is set.
- Returns
The hash value.
-
classmethod
SecondaryHash
(data, num_shards=None)¶ Secondary hash function for sharding.
- Parameters
data – The data to calculate the hash value for.
num_shards – The number of shards. If it is omitted, UINT64MAX is set.
- Returns
The hash value.
-
classmethod
EditDistanceLev
(a, b)¶ Gets the Levenshtein edit distance of two Unicode strings.
- Parameters
a – A Unicode string.
b – The other Unicode string.
- Returns
The Levenshtein edit distance of the two strings.
-
classmethod
SerializeInt
(num)¶ Serializes an integer into a big-endian binary sequence.
- Parameters
num – an integer.
- Returns
The result binary sequence.
-
classmethod
DeserializeInt
(data)¶ Deserializes a big-endian binary sequence into an integer.
- Parameters
data – a binary sequence.
- Returns
The result integer.
-
classmethod
SerializeFloat
(num)¶ Serializes a floating-point number into a big-endian binary sequence.
- Parameters
num – a floating-point number.
- Returns
The result binary sequence.
-
classmethod
DeserializeFloat
(data)¶ Deserializes a big-endian binary sequence into a floating-point number.
- Parameters
data – a binary sequence.
- Returns
The result floating-point number.
-
-
class
tkrzw.
Status
(code=0, message='')¶ Bases:
object
Status of operations.
-
SUCCESS
= 0¶ Success.
-
UNKNOWN_ERROR
= 1¶ Generic error whose cause is unknown.
-
SYSTEM_ERROR
= 2¶ Generic error from underlying systems.
-
NOT_IMPLEMENTED_ERROR
= 3¶ Error that the feature is not implemented.
-
PRECONDITION_ERROR
= 4¶ Error that a precondition is not met.
-
INVALID_ARGUMENT_ERROR
= 5¶ Error that a given argument is invalid.
-
CANCELED_ERROR
= 6¶ Error that the operation is canceled.
-
NOT_FOUND_ERROR
= 7¶ Error that a specific resource is not found.
-
PERMISSION_ERROR
= 8¶ Error that the operation is not permitted.
-
INFEASIBLE_ERROR
= 9¶ Error that the operation is infeasible.
-
DUPLICATION_ERROR
= 10¶ Error that a specific resource is duplicated.
-
BROKEN_DATA_ERROR
= 11¶ Error that internal data are broken.
-
NETWORK_ERROR
= 12¶ Error caused by networking failure.
-
APPLICATION_ERROR
= 13¶ Generic error caused by the application logic.
-
__init__
(code=0, message='')¶ Sets the code and the message.
- Parameters
code – The status code. This can be omitted and then SUCCESS is set.
message – An arbitrary status message. This can be omitted and the an empty string is set.
-
__repr__
()¶ Returns a string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns a string representation of the content.
- Returns
The string representation of the content.
-
__eq__
(rhs)¶ Returns true if the given object is equivalent to this object.
- Returns
True if the given object is equivalent to this object.
This supports comparison between a status object and a status code number.
-
Set
(code=0, message='')¶ Sets the code and the message.
- Parameters
code – The status code. This can be omitted and then SUCCESS is set.
message – An arbitrary status message. This can be omitted and the an empty string is set.
-
Join
(rht)¶ Assigns the internal state from another status object only if the current state is success.
- Parameters
rhs – The status object.
-
GetCode
()¶ Gets the status code.
- Returns
The status code.
-
GetMessage
()¶ Gets the status message.
- Returns
The status message.
-
IsOK
()¶ Returns true if the status is success.
- Returns
True if the status is success, or False on failure.
-
OrDie
()¶ Raises an exception if the status is not success.
- Raises
StatusException – An exception containing the status object.
-
classmethod
CodeName
(code)¶ Gets the string name of a status code.
- Param
code The status code.
- Returns
The name of the status code.
-
__hash__
= None¶
-
-
class
tkrzw.
Future
¶ Bases:
object
Future containing a status object and extra data.
Future objects are made by methods of AsyncDBM. Every future object should be destroyed by the “Destruct” method or the “Get” method to free resources. This class implements the awaitable protocol so an instance is usable with the “await” sentence.
-
__init__
()¶ The constructor cannot be called directly. Use methods of AsyncDBM.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns a string representation of the content.
- Returns
The string representation of the content.
-
__await__
()¶ Waits for the operation to be done and returns an iterator.
- Returns
The iterator which stops immediately.
-
Wait
(timeout=- 1)¶ Waits for the operation to be done.
- Parameters
timeout – The waiting time in seconds. If it is negative, no timeout is set.
- Returns
True if the operation has done. False if timeout occurs.
-
Get
()¶ Waits for the operation to be done and gets the result status.
- Returns
The result status and extra data if any. The existence and the type of extra data depends on the operation which makes the future. For DBM#Get, a tuple of the status and the retrieved value is returned. For DBM#Set and DBM#Remove, the status object itself is returned.
The internal resource is released by this method. “Wait” and “Get” cannot be called after calling this method.
-
-
exception
tkrzw.
StatusException
(status)¶ Bases:
RuntimeError
Exception to convey the status of operations.
-
__init__
(status)¶ Sets the status.
- Parameters
status – The status object.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns A string representation of the content.
- Returns
The string representation of the content.
-
GetStatus
()¶ Gets the status object
- Returns
The status object.
-
-
class
tkrzw.
DBM
¶ Bases:
object
Polymorphic database manager.
All operations except for Open and Close are thread-safe; Multiple threads can access the same database concurrently. You can specify a data structure when you call the Open method. Every opened database must be closed explicitly by the Close method to avoid data corruption. This class implements the iterable protocol so an instance is usable with “for-in” loop.
-
ANY_DATA
= b'\x00[ANY]\x00'¶ The special bytes value for no-operation or any data.
-
__init__
()¶ Does nothing especially.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns A string representation of the content.
- Returns
The string representation of the content.
-
__len__
()¶ Gets the number of records, to enable the len operator.
- Returns
The number of records on success, or 0 on failure.
-
__getitem__
(key)¶ Gets the value of a record, to enable the [] operator.
- Parameters
key – The key of the record.
- Returns
The value of the matching record. An exception is raised for missing records. If the given key is a string, the returned value is also a string. Otherwise, the return value is bytes.
- Raises
StatusException – An exception containing the status object.
-
__contains__
(key)¶ Checks if a record exists or not, to enable the in operator.
- Parameters
key – The key of the record.
- Returns
True if the record exists, or False if not. No exception is raised for missing records.
-
__setitem__
(key, value)¶ Sets a record of a key and a value, to enable the []= operator.
- Parameters
key – The key of the record.
value – The value of the record.
- Raises
StatusException – An exception containing the status object.
-
__delitem__
(key)¶ Removes a record of a key, to enable the del [] operator.
- Parameters
key – The key of the record.
- Raises
StatusException – An exception containing the status object.
-
__iter__
()¶ Makes an iterator and initialize it, to comply to the iterator protocol.
- Returns
The iterator for each record.
-
Open
(path, writable, **params)¶ Opens a database file.
- Parameters
path – A path of the file.
writable – If true, the file is writable. If false, it is read-only.
params – Optional keyword parameters.
- Returns
The result status.
- The extension of the path indicates the type of the database.
.tkh : File hash database (HashDBM)
.tkt : File tree database (TreeDBM)
.tks : File skip database (SkipDBM)
.tkmt : On-memory hash database (TinyDBM)
.tkmb : On-memory tree database (BabyDBM)
.tkmc : On-memory cache database (CacheDBM)
.tksh : On-memory STL hash database (StdHashDBM)
.tkst : On-memory STL tree database (StdTreeDBM)
The optional parameters can include an option for the concurrency tuning. By default, database operatins are done under the GIL (Global Interpreter Lock), which means that database operations are not done concurrently even if you use multiple threads. If the “concurrent” parameter is true, database operations are done outside the GIL, which means that database operations can be done concurrently if you use multiple threads. However, the downside is that swapping thread data is costly so the actual throughput is often worse in the concurrent mode than in the normal mode. Therefore, the concurrent mode should be used only if the database is huge and it can cause blocking of threads in multi-thread usage.
- The optional parameters can include options for the file opening operation.
truncate (bool): True to truncate the file.
no_create (bool): True to omit file creation.
no_wait (bool): True to fail if the file is locked by another process.
no_lock (bool): True to omit file locking.
sync_hard (bool): True to do physical synchronization when closing.
The optional parameter “dbm” supercedes the decision of the database type by the extension. The value is the type name: “HashDBM”, “TreeDBM”, “SkipDBM”, “TinyDBM”, “BabyDBM”, “CacheDBM”, “StdHashDBM”, “StdTreeDBM”.
The optional parameter “file” specifies the internal file implementation class. The default file class is “MemoryMapAtomicFile”. The other supported classes are “StdFile”, “MemoryMapAtomicFile”, “PositionalParallelFile”, and “PositionalAtomicFile”.
- For HashDBM, these optional parameters are supported.
update_mode (string): How to update the database file: “UPDATE_IN_PLACE” for the in-palce or “UPDATE_APPENDING” for the appending mode.
record_crc_mode (string): How to add the CRC data to the record: “RECORD_CRC_NONE” to add no CRC to each record, “RECORD_CRC_8” to add CRC-8 to each record, “RECORD_CRC_16” to add CRC-16 to each record, or “RECORD_CRC_32” to add CRC-32 to each record.
record_comp_mode (string): How to compress the record data: “RECORD_COMP_NONE” to do no compression, “RECORD_COMP_ZLIB” to compress with ZLib, “RECORD_COMP_ZSTD” to compress with ZStd, “RECORD_COMP_LZ4” to compress with LZ4, “RECORD_COMP_LZMA” to compress with LZMA, “RECORD_COMP_RC4” to cipher with RC4, “RECORD_COMP_AES” to cipher with AES.
offset_width (int): The width to represent the offset of records.
align_pow (int): The power to align records.
num_buckets (int): The number of buckets for hashing.
restore_mode (string): How to restore the database file: “RESTORE_SYNC” to restore to the last synchronized state, “RESTORE_READ_ONLY” to make the database read-only, or “RESTORE_NOOP” to do nothing. By default, as many records as possible are restored.
fbp_capacity (int): The capacity of the free block pool.
min_read_size (int): The minimum reading size to read a record.
cache_buckets (bool): True to cache the hash buckets on memory.
cipher_key (string): The encryption key for cipher compressors.
- For TreeDBM, all optional parameters for HashDBM are available. In addition, these optional parameters are supported.
max_page_size (int): The maximum size of a page.
max_branches (int): The maximum number of branches each inner node can have.
max_cached_pages (int): The maximum number of cached pages.
page_update_mode (string): What to do when each page is updated: “PAGE_UPDATE_NONE” is to do no operation or “PAGE_UPDATE_WRITE” is to write immediately.
key_comparator (string): The comparator of record keys: “LexicalKeyComparator” for the lexical order, “LexicalCaseKeyComparator” for the lexical order ignoring case, “DecimalKeyComparator” for the order of decimal integer numeric expressions, “HexadecimalKeyComparator” for the order of hexadecimal integer numeric expressions, “RealNumberKeyComparator” for the order of decimal real number expressions, “SignedBigEndianKeyComparator” for the order of binary signed integer expressions, and “FloatBigEndianKeyComparator” for the order of binary float-number expressions.
- For SkipDBM, these optional parameters are supported.
offset_width (int): The width to represent the offset of records.
step_unit (int): The step unit of the skip list.
max_level (int): The maximum level of the skip list.
restore_mode (string): How to restore the database file: “RESTORE_SYNC” to restore to the last synchronized state or “RESTORE_NOOP” to do nothing make the database read-only. By default, as many records as possible are restored.
sort_mem_size (int): The memory size used for sorting to build the database in the at-random mode.
insert_in_order (bool): If true, records are assumed to be inserted in ascending order of the key.
max_cached_records (int): The maximum number of cached records.
- For TinyDBM, these optional parameters are supported.
num_buckets (int): The number of buckets for hashing.
- For BabyDBM, these optional parameters are supported.
key_comparator (string): The comparator of record keys. The same ones as TreeDBM.
- For CacheDBM, these optional parameters are supported.
cap_rec_num (int): The maximum number of records.
cap_mem_size (int): The total memory size to use.
- All databases support taking update logs into files. It is enabled by setting the prefix of update log files.
ulog_prefix (str): The prefix of the update log files.
ulog_max_file_size (num): The maximum file size of each update log file. By default, it is 1GiB.
ulog_server_id (num): The server ID attached to each log. By default, it is 0.
ulog_dbm_index (num): The DBM index attached to each log. By default, it is 0.
- For the file “PositionalParallelFile” and “PositionalAtomicFile”, these optional parameters are supported.
block_size (int): The block size to which all blocks should be aligned.
access_options (str): Values separated by colon. “direct” for direct I/O. “sync” for synchrnizing I/O, “padding” for file size alignment by padding, “pagecache” for the mini page cache in the process.
If the optional parameter “num_shards” is set, the database is sharded into multiple shard files. Each file has a suffix like “-00003-of-00015”. If the value is 0, the number of shards is set by patterns of the existing files, or 1 if they doesn’t exist.
-
Close
()¶ Closes the database file.
- Returns
The result status.
-
Process
(key, func, writable)¶ Processes a record with an arbitrary function.
- Parameters
key – The key of the record.
func – The function to process a record. The first parameter is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.
writable – True if the processor can edit the record.
- Returns
The result status.
This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.
-
Get
(key, status=None)¶ Gets the value of a record of a key.
- Parameters
key – The key of the record.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The bytes value of the matching record or None on failure.
-
GetStr
(key, status=None)¶ Gets the value of a record of a key, as a string.
- Parameters
key – The key of the record.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The string value of the matching record or None on failure.
-
GetMulti
(*keys)¶ Gets the values of multiple records of keys.
- Parameters
keys – The keys of records to retrieve.
- Returns
A map of retrieved records. Keys which don’t match existing records are ignored.
-
GetMultiStr
(*keys)¶ Gets the values of multiple records of keys, as strings.
- Parameters
keys – The keys of records to retrieve.
- Returns
A map of retrieved records. Keys which don’t match existing records are ignored.
-
Set
(key, value, overwrite=True)¶ Sets a record of a key and a value.
- Parameters
key – The key of the record.
value – The value of the record.
overwrite – Whether to overwrite the existing value. It can be omitted and then false is set.
- Returns
The result status. If overwriting is abandoned, DUPLICATION_ERROR is returned.
-
SetMulti
(overwrite=True, **records)¶ Sets multiple records of the keyword arguments.
- Parameters
overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.
records – Records to store, specified as keyword parameters.
- Returns
The result status. If there are records avoiding overwriting, DUPLICATION_ERROR is returned.
-
SetAndGet
(key, value, overwrite=True)¶ Sets a record and get the old value.
- Parameters
key – The key of the record.
value – The value of the record.
overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.
- Returns
A pair of the result status and the old value. If the record has not existed when inserting the new record, None is assigned as the value. If not None, the type of the returned old value is the same as the parameter value.
-
Remove
(key)¶ Removes a record of a key.
- Parameters
key – The key of the record.
- Returns
The result status. If there’s no matching record, NOT_FOUND_ERROR is returned.
-
RemoveMulti
(keys)¶ Removes records of keys.
- Parameters
key – The keys of the records.
- Returns
The result status. If there are missing records, NOT_FOUND_ERROR is returned.
-
RemoveAndGet
(key)¶ Removes a record and get the value.
- Parameters
key – The key of the record.
- Returns
A pair of the result status and the record value. If the record does not exist, None is assigned as the value. If not None, the type of the returned value is the same as the parameter key.
-
Append
(key, value, delim='')¶ Appends data at the end of a record of a key.
- Parameters
key – The key of the record.
value – The value to append.
delim – The delimiter to put after the existing record.
- Returns
The result status.
If there’s no existing record, the value is set without the delimiter.
-
AppendMulti
(delim='', **records)¶ Appends data to multiple records of the keyword arguments.
- Parameters
delim – The delimiter to put after the existing record.
records – Records to append, specified as keyword parameters.
- Returns
The result status.
If there’s no existing record, the value is set without the delimiter.
-
CompareExchange
(key, expected, desired)¶ Compares the value of a record and exchanges if the condition meets.
- Parameters
key – The key of the record.
expected – The expected value. If it is None, no existing record is expected. If it is ANY_DATA, an existing record with any value is expacted.
desired – The desired value. If it is None, the record is to be removed. If it is ANY_DATA, no update is done.
- Returns
The result status. If the condition doesn’t meet, INFEASIBLE_ERROR is returned.
-
CompareExchangeAndGet
(key, expected, desired)¶ Does compare-and-exchange and/or gets the old value of the record.
- Parameters
key – The key of the record.
expected – The expected value. If it is None, no existing record is expected. If it is ANY_DATA, an existing record with any value is expacted.
desired – The desired value. If it is None, the record is to be removed. If it is ANY_DATA, no update is done.
- Returns
A pair of the result status and the.old value of the record. If the condition doesn’t meet, the state is INFEASIBLE_ERROR. If there’s no existing record, the value is None. If not None, the type of the returned old value is the same as the expected or desired value.
-
Increment
(key, inc=1, init=0, status=None)¶ Increments the numeric value of a record.
- Parameters
key – The key of the record.
inc – The incremental value. If it is Utility.INT64MIN, the current value is not changed and a new record is not created.
init – The initial value.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The current value, or None on failure.
The record value is stored as an 8-byte big-endian integer. Negative is also supported.
-
CompareExchangeMulti
(expected, desired)¶ Compares the values of records and exchanges if the condition meets.
- Parameters
expected – A sequence of pairs of the record keys and their expected values. If the value is None, no existing record is expected. If the value is ANY_DATA, an existing record with any value is expacted.
desired – A sequence of pairs of the record keys and their desired values. If the value is None, the record is to be removed.
- Returns
The result status. If the condition doesn’t meet, INFEASIBLE_ERROR is returned.
-
Rekey
(new_key, overwrite=True, copying=False)¶ Changes the key of a record.
- Parameters
old_key – The old key of the record.
new_key – The new key of the record.
overwrite – Whether to overwrite the existing record of the new key.
copying – Whether to retain the record of the old key.
- Returns
The result status. If there’s no matching record to the old key, NOT_FOUND_ERROR is returned. If the overwrite flag is false and there is an existing record of the new key, DUPLICATION ERROR is returned.
This method is done atomically. The other threads observe that the record has either the old key or the new key. No intermediate states are observed.
-
PopFirst
(status=None)¶ Gets the first record and removes it.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the bytes key and the bytes value of the first record. On failure, None is returned.
-
PopFirstStr
(status=None)¶ Gets the first record as strings and removes it.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the string key and the string value of the first record. On failure, None is returned.
-
PushLast
(value, wtime=None)¶ Adds a record with a key of the current timestamp.
- Parameters
value – The value of the record.
wtime – The current wall time used to generate the key. If it is None, the system clock is used.
- Returns
The result status.
The key is generated as an 8-bite big-endian binary string of the timestamp. If there is an existing record matching the generated key, the key is regenerated and the attempt is repeated until it succeeds.
-
ProcessEach
(func, writable)¶ Processes each and every record in the database with an arbitrary function.
- Parameters
func – The function to process a record. The first parameter is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.
writable – True if the processor can edit the record.
- Returns
The result status.
The given function is called repeatedly for each record. It is also called once before the iteration and once after the iteration with both the key and the value being None. This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.
-
Count
()¶ Gets the number of records.
- Returns
The number of records on success, or None on failure.
-
GetFileSize
()¶ Gets the current file size of the database.
- Returns
The current file size of the database, or None on failure.
-
GetFilePath
()¶ Gets the path of the database file.
- Returns
The file path of the database, or None on failure.
-
GetTimestamp
()¶ Gets the timestamp in seconds of the last modified time.
- Returns
The timestamp of the last modified time, or None on failure.
-
Clear
()¶ Removes all records.
- Returns
The result status.
-
Rebuild
(**params)¶ Rebuilds the entire database.
- Parameters
params – Optional keyword parameters.
- Returns
The result status.
The optional parameters are the same as the Open method. Omitted tuning parameters are kept the same or implicitly optimized.
- In addition, HashDBM, TreeDBM, and SkipDBM supports the following parameters.
skip_broken_records (bool): If true, the operation continues even if there are broken records which can be skipped.
sync_hard (bool): If true, physical synchronization with the hardware is done before finishing the rebuilt file.
-
ShouldBeRebuilt
()¶ Checks whether the database should be rebuilt.
- Returns
True to be optimized or false with no necessity.
-
Synchronize
(hard, **params)¶ Synchronizes the content of the database to the file system.
- Parameters
hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.
params – Optional keyword parameters.
- Returns
The result status.
Only SkipDBM uses the optional parameters. The “merge” parameter specifies paths of databases to merge, separated by colon. The “reducer” parameter specifies the reducer to apply to records of the same key. “ReduceToFirst”, “ReduceToSecond”, “ReduceToLast”, etc are supported.
-
CopyFileData
(dest_path, sync_hard=False)¶ Copies the content of the database file to another file.
- Parameters
dest_path – A path to the destination file.
sync_hard – True to do physical synchronization with the hardware.
- Returns
The result status.
-
Export
(dest_dbm)¶ Exports all records to another database.
- Parameters
dest_dbm – The destination database.
- Returns
The result status.
-
ExportToFlatRecords
(dest_file)¶ Exports all records of a database to a flat record file.
- Parameters
dest_file – The file object to write records in.
- Returns
The result status.
A flat record file contains a sequence of binary records without any high level structure so it is useful as a intermediate file for data migration.
-
ImportFromFlatRecords
(src_file)¶ Imports records to a database from a flat record file.
- Parameters
src_file – The file object to read records from.
- Returns
The result status.
-
ExportKeysAsLines
(dest_file)¶ Exports the keys of all records as lines to a text file.
- Parameters
dest_file – The file object to write keys in.
- Returns
The result status.
As the exported text file is smaller than the database file, scanning the text file by the search method is often faster than scanning the whole database.
-
Inspect
()¶ Inspects the database.
- Returns
A map of property names and their values.
-
IsOpen
()¶ Checks whether the database is open.
- Returns
True if the database is open, or false if not.
-
IsWritable
()¶ Checks whether the database is writable.
- Returns
True if the database is writable, or false if not.
-
IsHealthy
()¶ Checks whether the database condition is healthy.
- Returns
True if the database condition is healthy, or false if not.
-
IsOrdered
()¶ Checks whether ordered operations are supported.
- Returns
True if ordered operations are supported, or false if not.
-
Search
(mode, pattern, capacity=0)¶ Searches the database and get keys which match a pattern.
- Parameters
mode – The search mode. “contain” extracts keys containing the pattern. “begin” extracts keys beginning with the pattern. “end” extracts keys ending with the pattern. “regex” extracts keys partially matches the pattern of a regular expression. “edit” extracts keys whose edit distance to the UTF-8 pattern is the least. “editbin” extracts keys whose edit distance to the binary pattern is the least. “containcase”, “containword”, and “containcaseword” extract keys considering case and word boundary. Ordered databases support “upper” and “lower” which extract keys whose positions are upper/lower than the pattern. “upperinc” and “lowerinc” are their inclusive versions.
pattern – The pattern for matching.
capacity – The maximum records to obtain. 0 means unlimited.
- Returns
A list of string keys matching the condition.
-
MakeIterator
()¶ Makes an iterator for each record.
- Returns
The iterator for each record.
-
classmethod
RestoreDatabase
(old_file_path, new_file_path, class_name='', end_offset=- 1, cipher_key=None)¶ Restores a broken database as a new healthy database.
- Parameters
old_file_path – The path of the broken database.
new_file_path – The path of the new database to be created.
class_name – The name of the database class. If it is None or empty, the class is guessed from the file extension.
end_offset – The exclusive end offset of records to read. Negative means unlimited. 0 means the size when the database is synched or closed properly. Using a positive value is not meaningful if the number of shards is more than one.
cipher_key – The encryption key for cipher compressors. If it is None, an empty key is used.
- Returns
The result status.
-
-
class
tkrzw.
Iterator
(dbm)¶ Bases:
object
Iterator for each record.
-
__init__
(dbm)¶ Initializes the iterator.
- Parameters
dbm – The database to scan.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns A string representation of the content.
- Returns
The string representation of the content.
-
__next__
()¶ Moves the iterator to the next record, to comply to the iterator protocol.
- Returns
A tuple of The key and the value of the current record.
-
First
()¶ Initializes the iterator to indicate the first record.
- Returns
The result status.
Even if there’s no record, the operation doesn’t fail.
-
Last
()¶ Initializes the iterator to indicate the last record.
- Returns
The result status.
Even if there’s no record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
Jump
(key)¶ Initializes the iterator to indicate a specific record.
- Parameters
key – The key of the record to look for.
- Returns
The result status.
Ordered databases can support “lower bound” jump; If there’s no record with the same key, the iterator refers to the first record whose key is greater than the given key. The operation fails with unordered databases if there’s no record with the same key.
-
JumpLower
(key, inclusive=False)¶ Initializes the iterator to indicate the last record whose key is lower than a given key.
- Parameters
key – The key to compare with.
inclusive – If true, the considtion is inclusive: equal to or lower than the key.
- Returns
The result status.
Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
JumpUpper
(key, inclusive=False)¶ Initializes the iterator to indicate the first record whose key is upper than a given key.
- Parameters
key – The key to compare with.
inclusive – If true, the considtion is inclusive: equal to or upper than the key.
- Returns
The result status.
Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
Next
()¶ Moves the iterator to the next record.
- Returns
The result status.
If the current record is missing, the operation fails. Even if there’s no next record, the operation doesn’t fail.
-
Previous
()¶ Moves the iterator to the previous record.
- Returns
The result status.
If the current record is missing, the operation fails. Even if there’s no previous record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
Get
(status=None)¶ Gets the key and the value of the current record of the iterator.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.
-
GetStr
(status=None)¶ Gets the key and the value of the current record of the iterator, as strings.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the string key and the string value of the current record. On failure, None is returned.
-
GetKey
(status=None)¶ Gets the key of the current record.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The bytes key of the current record or None on failure.
-
GetKeyStr
(status=None)¶ Gets the key of the current record, as a string.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The string key of the current record or None on failure.
-
GetValue
(status=None)¶ Gets the value of the current record.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The bytes value of the current record or None on failure.
-
GetValueStr
(status=None)¶ Gets the value of the current record, as a string.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The string value of the current record or None on failure.
-
Set
(value)¶ Sets the value of the current record.
- Parameters
value – The value of the record.
- Returns
The result status.
-
Remove
()¶ Removes the current record.
- Returns
The result status.
-
Step
(status=None)¶ Gets the current record and moves the iterator to the next record.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.
-
StepStr
(status=None)¶ Gets the current record and moves the iterator to the next record, as strings.
- Parameters
status – A status object to which the result status is assigned. It can be omitted.
- Returns
A tuple of the string key and the string value of the current record. On failure, None is returned.
-
-
class
tkrzw.
AsyncDBM
(dbm, num_worker_threads)¶ Bases:
object
Asynchronous database manager adapter.
This class is a wrapper of DBM for asynchronous operations. A task queue with a thread pool is used inside. Every method except for the constructor and the destructor is run by a thread in the thread pool and the result is set in the future oject of the return value. The caller can ignore the future object if it is not necessary. The Destruct method waits for all tasks to be done. Therefore, the destructor should be called before the database is closed.
-
__init__
(dbm, num_worker_threads)¶ Sets up the task queue.
- Parameters
dbm – A database object which has been opened.
num_worker_threads – The number of threads in the internal thread pool.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns a string representation of the content.
- Returns
The string representation of the content.
-
Destruct
()¶ Destructs the asynchronous database adapter.
This method waits for all tasks to be done.
-
Get
(key)¶ Gets the value of a record of a key.
- Parameters
key – The key of the record.
- Returns
The future for the result status and the bytes value of the matching record.
-
GetStr
(key)¶ Gets the value of a record of a key, as a string.
- Parameters
key – The key of the record.
- Returns
The future for the result status and the string value of the matching record.
-
GetMulti
(*keys)¶ Gets the values of multiple records of keys.
- Parameters
keys – The keys of records to retrieve.
- Returns
The future for the result status and a map of retrieved records. Keys which don’t match existing records are ignored.
-
GetMultiStr
(*keys)¶ Gets the values of multiple records of keys, as strings.
- Parameters
keys – The keys of records to retrieve.
- Returns
The future for the result status and a map of retrieved records. Keys which don’t match existing records are ignored.
-
Set
(key, value, overwrite=True)¶ Sets a record of a key and a value.
- Parameters
key – The key of the record.
value – The value of the record.
overwrite – Whether to overwrite the existing value. It can be omitted and then false is set.
- Returns
The future for the result status. If overwriting is abandoned, DUPLICATION_ERROR is set.
-
SetMulti
(overwrite=True, **records)¶ Sets multiple records of the keyword arguments.
- Parameters
overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.
records – Records to store, specified as keyword parameters.
- Returns
The future for the result status. If overwriting is abandoned, DUPLICATION_ERROR is set.
-
Append
(key, value, delim='')¶ Appends data at the end of a record of a key.
- Parameters
key – The key of the record.
value – The value to append.
delim – The delimiter to put after the existing record.
- Returns
The future for the result status.
If there’s no existing record, the value is set without the delimiter.
-
AppendMulti
(delim='', **records)¶ Appends data to multiple records of the keyword arguments.
- Parameters
delim – The delimiter to put after the existing record.
records – Records to append, specified as keyword parameters.
- Returns
The future for the result status.
If there’s no existing record, the value is set without the delimiter.
-
CompareExchange
(key, expected, desired)¶ Compares the value of a record and exchanges if the condition meets.
- Parameters
key – The key of the record.
expected – The expected value. If it is None, no existing record is expected. If it is DBM.ANY_DATA, an existing record with any value is expacted.
desired – The desired value. If it is None, the record is to be removed. If it is None, the record is to be removed. If it is DBM.ANY_DATA, no update is done.
- Returns
The future for the result status. If the condition doesn’t meet, INFEASIBLE_ERROR is set.
-
Increment
(key, inc=1, init=0)¶ Increments the numeric value of a record.
- Parameters
key – The key of the record.
inc – The incremental value. If it is Utility.INT64MIN, the current value is not changed and a new record is not created.
init – The initial value.
- Returns
The future for the result status and the current value.
The record value is stored as an 8-byte big-endian integer. Negative is also supported.
-
ProcessMulti
(key_func_pairs, writable)¶ Processes multiple records with arbitrary functions.
- Parameters
key_func_pairs – A list of pairs of keys and their functions. The first parameter of the function is the key of the record. The second parameter is the value of the existing record, or None if it the record doesn’t exist. The return value is a string or bytes to update the record value. If the return value is None, the record is not modified. If the return value is False (not a false value but the False object), the record is removed.
writable – True if the processors can edit the record.
- Returns
The result status.
This method is not available in the concurrent mode because the function cannot be invoked outside the GIL.
-
CompareExchangeMulti
(expected, desired)¶ Compares the values of records and exchanges if the condition meets.
- Parameters
expected – A sequence of pairs of the record keys and their expected values. If the value is None, no existing record is expected. If the value is DBM.ANY_DATA, an existing record with any value is expacted.
desired – A sequence of pairs of the record keys and their desired values. If the value is None, the record is to be removed.
- Returns
The future for the result status. If the condition doesn’t meet, INFEASIBLE_ERROR is set.
-
Rekey
(new_key, overwrite=True, copying=False)¶ Changes the key of a record.
- Parameters
old_key – The old key of the record.
new_key – The new key of the record.
overwrite – Whether to overwrite the existing record of the new key.
copying – Whether to retain the record of the old key.
- Returns
The future for the result status. If there’s no matching record to the old key, NOT_FOUND_ERROR is set. If the overwrite flag is false and there is an existing record of the new key, DUPLICATION ERROR is set.
This method is done atomically. The other threads observe that the record has either the old key or the new key. No intermediate states are observed.
-
PopFirst
()¶ Gets the first record and removes it.
- Returns
The future for a tuple of the result status, the bytes key, and the bytes value of the first record.
-
PopFirstStr
()¶ Gets the first record as strings and removes it.
- Returns
The future for a tuple of the result status, the string key, and the string value of the first record.
-
PushLast
(value, wtime=None)¶ Adds a record with a key of the current timestamp.
- Parameters
value – The value of the record.
wtime – The current wall time used to generate the key. If it is None, the system clock is used.
- Returns
The future for the result status.
The key is generated as an 8-bite big-endian binary string of the timestamp. If there is an existing record matching the generated key, the key is regenerated and the attempt is repeated until it succeeds.
-
Clear
()¶ Removes all records.
- Returns
The future for the result status.
-
Rebuild
(**params)¶ Rebuilds the entire database.
- Parameters
params – Optional keyword parameters.
- Returns
The future for the result status.
The parameters work in the same way as with DBM::Rebuild.
-
Synchronize
(hard, **params)¶ Synchronizes the content of the database to the file system.
- Parameters
hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.
params – Optional keyword parameters.
- Returns
The future for the result status.
The parameters work in the same way as with DBM::Synchronize.
-
CopyFileData
(dest_path, sync_hard=False)¶ Copies the content of the database file to another file.
- Parameters
dest_path – A path to the destination file.
sync_hard – True to do physical synchronization with the hardware.
- Returns
The future for the result status.
-
Export
(dest_dbm)¶ Exports all records to another database.
- Parameters
dest_dbm – The destination database. The lefetime of the database object must last until the task finishes.
- Returns
The future for the result status.
-
ExportToFlatRecords
(dest_file)¶ Exports all records of a database to a flat record file.
- Parameters
dest_file – The file object to write records in. The lefetime of the file object must last until the task finishes.
- Returns
The future for the result status.
A flat record file contains a sequence of binary records without any high level structure so it is useful as a intermediate file for data migration.
-
ImportFromFlatRecords
(src_file)¶ Imports records to a database from a flat record file.
- Parameters
src_file – The file object to read records from. The lefetime of the file object must last until the task finishes.
- Returns
The future for the result status.
-
Search
(mode, pattern, capacity=0)¶ Searches the database and get keys which match a pattern.
- Parameters
mode – The search mode. “contain” extracts keys containing the pattern. “begin” extracts keys beginning with the pattern. “end” extracts keys ending with the pattern. “regex” extracts keys partially matches the pattern of a regular expression. “edit” extracts keys whose edit distance to the UTF-8 pattern is the least. “editbin” extracts keys whose edit distance to the binary pattern is the least.
pattern – The pattern for matching.
capacity – The maximum records to obtain. 0 means unlimited.
- Returns
The future for the result status and a list of keys matching the condition.
-
-
class
tkrzw.
File
¶ Bases:
object
Generic file implementation.
All operations except for Open and Close are thread-safe; Multiple threads can access the same file concurrently. You can specify a concrete class when you call the Open method. Every opened file must be closed explicitly by the Close method to avoid data corruption.
-
__init__
()¶ Initializes the file object.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns A string representation of the content.
- Returns
The string representation of the content.
-
Open
(path, writable, **params)¶ Opens a file.
- Parameters
path – A path of the file.
writable – If true, the file is writable. If false, it is read-only.
params – Optional keyword parameters.
- Returns
The result status.
The optional parameters can include an option for the concurrency tuning. By default, database operatins are done under the GIL (Global Interpreter Lock), which means that database operations are not done concurrently even if you use multiple threads. If the “concurrent” parameter is true, database operations are done outside the GIL, which means that database operations can be done concurrently if you use multiple threads. However, the downside is that swapping thread data is costly so the actual throughput is often worse in the concurrent mode than in the normal mode. Therefore, the concurrent mode should be used only if the database is huge and it can cause blocking of threads in multi-thread usage.
- The optional parameters can include options for the file opening operation.
truncate (bool): True to truncate the file.
no_create (bool): True to omit file creation.
no_wait (bool): True to fail if the file is locked by another process.
no_lock (bool): True to omit file locking.
sync_hard (bool): True to do physical synchronization when closing.
The optional parameter “file” specifies the internal file implementation class. The default file class is “MemoryMapAtomicFile”. The other supported classes are “StdFile”, “MemoryMapAtomicFile”, “PositionalParallelFile”, and “PositionalAtomicFile”.
- For the file “PositionalParallelFile” and “PositionalAtomicFile”, these optional parameters are supported.
block_size (int): The block size to which all blocks should be aligned.
access_options (str): Values separated by colon. “direct” for direct I/O. “sync” for synchrnizing I/O, “padding” for file size alignment by padding, “pagecache” for the mini page cache in the process.
-
Close
()¶ Closes the file.
- Returns
The result status.
-
Read
(off, size, status=None)¶ Reads data.
- Parameters
off – The offset of a source region.
size – The size to be read.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The bytes value of the read data or None on failure.
-
ReadStr
(off, size, status=None)¶ Reads data as a string.
- Parameters
off – The offset of a source region.
size – The size to be read.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The string value of the read data or None on failure.
-
Write
(off, data)¶ Writes data.
- Parameters
off – The offset of the destination region.
data – The data to write.
- Returns
The result status.
-
Append
(data, status=None)¶ Appends data at the end of the file.
- Parameters
data – The data to write.
status – A status object to which the result status is assigned. It can be omitted.
- Returns
The offset at which the data has been put, or None on failure.
-
Truncate
(size)¶ Truncates the file.
- Parameters
size – The new size of the file.
- Returns
The result status.
If the file is shrunk, data after the new file end is discarded. If the file is expanded, null codes are filled after the old file end.
-
Synchronize
(hard, off=0, size=0)¶ Synchronizes the content of the file to the file system.
- Parameters
hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.
off – The offset of the region to be synchronized.
size – The size of the region to be synchronized. If it is zero, the length to the end of file is specified.
- Returns
The result status.
The pysical file size can be larger than the logical size in order to improve performance by reducing frequency of allocation. Thus, you should call this function before accessing the file with external tools.
-
GetSize
()¶ Gets the size of the file.
- Returns
The size of the file or None on failure.
-
GetPath
()¶ Gets the path of the file.
- Returns
The path of the file or None on failure.
-
Search
(mode, pattern, capacity=0)¶ Searches the file and get lines which match a pattern.
- Parameters
mode – The search mode. “contain” extracts lines containing the pattern. “begin” extracts lines beginning with the pattern. “end” extracts lines ending with the pattern. “regex” extracts lines partially matches the pattern of a regular expression. “edit” extracts lines whose edit distance to the UTF-8 pattern is the least. “editbin” extracts lines whose edit distance to the binary pattern is the least.
pattern – The pattern for matching.
capacity – The maximum records to obtain. 0 means unlimited.
- Returns
A list of lines matching the condition.
-
-
class
tkrzw.
Index
¶ Bases:
object
Secondary index interface.
All operations except for Open and Close are thread-safe; Multiple threads can access the same index concurrently. You can specify a data structure when you call the Open method. Every opened index must be closed explicitly by the Close method to avoid data corruption. This class implements the iterable protocol so an instance is usable with “for-in” loop.
-
__repr__
()¶ Returns a string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns a string representation of the content.
- Returns
The string representation of the content.
-
__len__
()¶ Gets the number of records, to enable the len operator.
- Returns
The number of records on success, or 0 on failure.
-
__contains__
(record)¶ Checks if a record exists or not, to enable the in operator.
- Parameters
record – A tuple of the key and the value to check.
- Returns
True if the record exists, or False if not. No exception is raised for missing records.
-
__iter__
()¶ Makes an iterator and initialize it, to comply to the iterator protocol.
- Returns
The iterator for each record.
-
Open
(path, writable, **params)¶ Opens an index file.
- Parameters
path – A path of the file.
writable – If true, the file is writable. If false, it is read-only.
params – Optional keyword parameters.
- Returns
The result status.
If the path is empty, BabyDBM is used internally, which is equivalent to using the MemIndex class. If the path ends with “.tkt”, TreeDBM is used internally, which is equivalent to using the FileIndex class. If the key comparator of the tuning parameter is not set, PairLexicalKeyComparator is set implicitly. Other compatible key comparators are PairLexicalCaseKeyComparator, PairDecimalKeyComparator, PairHexadecimalKeyComparator, PairRealNumberKeyComparator, PairSignedBigEndianKeyComparator, and PairFloatBigEndianKeyComparator. Other options can be specified as with DBM::Open.
-
Close
()¶ Closes the index file.
- Returns
The result status.
-
GetValues
(key, max=0)¶ Gets all values of records of a key. :param key: The key to look for. :param max: The maximum number of values to get. 0 means unlimited. :return: A list of all values of the key. An empty list is returned on failure.
-
GetValuesStr
(key, max=0)¶ Gets all values of records of a key, as strings. :param key: The key to look for. :param max: The maximum number of values to get. 0 means unlimited. :return: A list of all values of the key. An empty list is returned on failure.
-
Add
(key, value)¶ Adds a record.
- Parameters
key – The key of the record. This can be an arbitrary expression to search the index.
value – The value of the record. This should be a primary value of another database.
- Returns
The result status.
-
Remove
(key, value)¶ Removes a record.
- Parameters
key – The key of the record.
value – The value of the record.
- Returns
The result status.
-
Count
()¶ Gets the number of records.
- Returns
The number of records, or 0 on failure.
-
GetFilePath
()¶ Gets the path of the index file. :return: The file path of the index, or an empty string on failure.
-
Clear
()¶ Removes all records.
- Returns
The result status.
-
Rebuild
()¶ Rebuilds the entire index.
- Returns
The result status.
-
Synchronize
(hard)¶ Synchronizes the content of the index to the file system.
- Parameters
hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.
- Returns
The result status.
-
IsOpen
()¶ Checks whether the index is open.
- Returns
True if the index is open, or false if not.
-
IsWritable
()¶ Checks whether the index is writable.
- Returns
True if the index is writable, or false if not.
-
MakeIterator
()¶ Makes an iterator for each record.
- Returns
The iterator for each record.
-
-
class
tkrzw.
IndexIterator
(index)¶ Bases:
object
Iterator for each record of the secondary index.
-
__init__
(index)¶ Initializes the iterator.
- Parameters
index – The index to scan.
-
__repr__
()¶ Returns A string representation of the object.
- Returns
The string representation of the object.
-
__str__
()¶ Returns A string representation of the content.
- Returns
The string representation of the content.
-
__next__
()¶ Moves the iterator to the next record, to comply to the iterator protocol.
- Returns
A tuple of The key and the value of the current record.
-
First
()¶ Initializes the iterator to indicate the first record.
-
Last
()¶ Initializes the iterator to indicate the last record.
-
Jump
(key, value='')¶ Initializes the iterator to indicate a specific range.
- Parameters
key – The key of the lower bound.
value – The value of the lower bound.
-
Next
()¶ Moves the iterator to the next record.
-
Previous
()¶ Moves the iterator to the previous record.
-
Get
()¶ Gets the key and the value of the current record of the iterator.
- Returns
A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.
-
GetStr
()¶ Gets the key and the value of the current record of the iterator, as strings.
- Returns
A tuple of the string key and the string value of the current record. On failure, None is returned.
-