clean-db


Package info
Description

A database library based on datamodels expressed as Clean types. Designed for large amounts of data.

Readme
# CleanDB _CleanDB_ is an embedded database for the fuctional programming language [Clean](https://clean.cs.ru.nl/Clean). Concurrent accesses are realised by multiple processes accessing the same database file on the same machine. There is no support for distributing databases across multiple machines yet. The main design principle is that data is organised in maps moddeld by datatypes of the host language _Clean_. The design makes it possible to specify the structure of data and also access the data using the host language. No different language (e.g. _SQL_) is required; data accesses are specified by _Clean_ functions, which enables maximum code reusability. A drawback is that the level of abstraction is lower than the one provided by relational databases. Indexes on data for instance have to be explicitely specified and accessed. This design also makes _CleanDB_ very efficient. Data does not have to be transferred between the database and application programs. Data access is specified by accumulator functions which access the data with very low overhead. This allows to retrieved accumulated values (e.g. sums) on a dataset which is larger than the memory available. The lower level of abstraction also allows programmer to implement very efficient data accesses. The burden of having to handle low level details can become an advantage here, as relational databases provide less control on how data is retrived. Implementing highly efficient solution with relational databases furthermore require to understand the internal workings anyhow. _CleanDB_ is built on top of [LMDB](https://symas.com/lmdb/), so it inherits some of it's properties: * Transactions have a full _ACID_ semantics. * Concurrent read-only transtions scale almost linearly with the number of cores, in case data fits in memory (http://www.lmdb.tech/bench/inmem/scaling.html). * Write transactions are efficient, but not not scale. Only a single write transaction can be active at the same time. These properties make _CleanDB_ mostly suited for read intensive applications. An example is [VIIA](https://gitlab.com/top-software/viia), where data (mostly positions of vessels) is added at a fixed rate to the database. Analysing the data in contrast requires numerous reads of the same data and has to scale with the number of users and alerts. Still write operations should be suffienctly fast for writing user content of tens of thousands of users. [VIIA](https://gitlab.com/top-software/viia) can for instance handle several thousand vessel position updates per second. ## Licence The package is licensed under the BSD-2-Clause license; for details, see the [LICENSE](/LICENSE) file. ### LMDB Copyright The OpenLDAP Foundation; for the license see [LICENSE](/LICENSE_LMDB).
Changelog
# Changelog #### 15.2.2 - Chore: accept itasks 0.17. #### 15.2.1 - Chore: update LMDB dependency. #### 15.2.0 - Feature: add `sync`, which flushes the data buffers related to a `Txn` to disk. #### 15.1.1 - Enhancement: optimise stopping parallel operations when using continuations yielding multiple results. ### 15.1.0 - Feature: add `ConstContinuation` for parallel operations with constant parameters not being based on data from a map. ## 15.0.0 - Change: the function yielding the parameter for the parallelised transactions given to `getAccumNestedParallel`, `getAccumNestedParallelWithLocalState`, `GetContinuationResult`, `getMultiAccumNestedParallel`, `getMultiAccumNestedParallelWithLocalState` can now yield multiple parameters for a single key-value pair. - Change: the function yielding the parameter for the parallelised transactions given to `getAccumNestedParallel`, `getAccumNestedParallelWithLocalState`, `GetContinuationResult`, `getMultiAccumNestedParallel`, `getMultiAccumNestedParallelWithLocalState` has access to a nested transaction. - Feature: add `mapContinuation` for mapping the `parParam` domain of `GetContinuation`s. #### 14.0.1 - Fix: retrieving large values from partitioned maps, when not using `getSingle`. ## 14.0.0 - Change: functions performing actions on the filesystem which could be affected by issues, such as permission problems, yield errors instead of abort in case of failure: `openDatabase`, `closeDatabase`, `withDatabase`, `withDatabaseSt`, `deleteAllMaps`, `deleteDataFromAllMaps`, `foldedPartitions`, `deletePartition`. #### 13.1.3 - Enhancement: Include map name in error log in case of an invalid binary DB encoding. #### 13.1.2 - Fix: fix overflow problem if number of entries exceeds the maximum value of a 32-bit int for CleanDB statistics. ### 13.1.1 - Chore: add support for concurrency v4. ### 13.1.0 - Feature: add `getAccumNestedParallelWithLocalStateAndContinuationChain`, which allows to read from multiple `dbMaps` through a single function call (increasing performance over the existing functions when reading from multiple `dbMaps`). - Feature: add `combineContinuations`, making it possible to combine continuation chains. #### 13.0.1 - Fix: bug causing keys to not be found when using `getMultiAccumDelete` variants in combination with conditions that make `executeCond` go to the next key. As in that case the cursor could in some cases not be in the correct position after deleting a key-value pair. ## 13.0.0 - Change: `getMultiAccum` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMultiAccumDelete` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMultiAccumNested` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMultiAccumNestedDelete` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMultiAccumNestedParallel` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMultiAccumNestedParallelWithLocalState` now receives a key-value pair instead of a key-value-set pair in the update function. - Change: `getMulti` now has a `==` constraint on the key argument. - Removed: `getMultiAccumPerValue` (renamed it to `getMultiAccum`). - Removed: `getMultiAccumPerValueNested` (renamed it to `getMultiAccumNested`). - Removed: `getMultiAccumPerValueNestedParallel` (renamed it to `getMultiAccumNestedParallel`). - Removed: `getMultiAccumPerValueNestedParallelWithLocalState` (renamed it to `getMultiAccumNestedParallelWithLocalState`). ### 12.1.0 - Feature: add `getMapStats` to get statistics about all DB maps. - Feature: add `gEq`, `JSONEncode` and `JSONDecode` instances of `DBEnvironmentInfo` and `MapStats`. - Feature: add iTasks extension with `gText` and `gEditor` instances of `DBEnvironmentInfo` and `MapStats`. ## 12.0.0 - Change: replace `abortTxnsWhichExceedMaxTime` by the more flexible `abortTxnsForWhich`. - Feature: add `doTransactionStWithOptions` with option to determine which (partition) databases are used in a transaction. #### 11.0.2 - Fix: encoding of large values to make them compatible with CleanDB `10`. #### 11.0.1 - Fix: `getMultiAccumKeysOnly` and `getMultiAccumKeysOnlyNested` provide all keys for which the condition holds, instead of the first key only. - Enhancement: optimise composing keys of dynamic length. ## 11.0.0 - Change: support combinations with keys of dynamic length containing 0-bytes. THIS IS A BREAKING CHANGE ON THE DB ENCODING! Database files generates by earlier version might not be compatible if keys with dynamic length are used in tuple keys, as non-rightmost type. ### 10.6.0 - Feature: add `maximumKeySize` to `Database.CleanDb and `mdb_env_get_maxkeyize` to `Database.CleanDb._Lmdb`. ### 10.5.0 - Enhancement: improve error logging for `abortTxnsWhichExceedMaxTime`. - Feature: export `SysCallEnv` instances for `Txn` and `ReadOnlyTxn` in `Database.CleanDB`. ### 10.4.0 - Feature: add `readaheadPartitionDatabases` database option. - Enhancement: retry to get a lock when opening a database a number of times if no locks are available. - Enhancement: re-use cursor for put operations. This prevents creating a cursor for each put and improves performance when adding data with small distance between keys. #### 10.3.1 - Fix: omitted keys in results for multi-maps with disjunctions of conditions moving in different directions to the same key. ### 10.3.0 - Feature: add `keyExists`, which returns whether a key is part of a `dbMap`. - Feature: add `getMultiAccumKeysOnly` and `getMultiAccumKeysOnlyNested`, which allow to iterate over keys that adhere to a condition without retrieving the values of those keys. - Feature: add `keyValueExists`, which returns whether a key-value pair is part of a `DBMultiMap`. - Feature: add `putKeyValueExists`, which puts a key-value pair into a dbMap and additionally returns whether a key-value pair is part of a `DBMultiMap`. #### 10.2.3 - Enhancement: prevent `abortTxnsWhichExceedMaxTime` from trying to kill stale readers. #### 10.2.2 - Chore: accept `base` `3.0`. #### 10.2.1 - Enhancement: include map names in error messages if operations on maps fail. ### 10.2.0 - Feature: add `deleteDataFromAllMaps`, to remove all data from the database safely if other transaction may be active at the same time. - Fix: prevent issues with `withTestCleanDB`/`withLabeledTestCleanDB` in case other transactions are still active on the test database after the main test finishes. ### 10.1.0 - Feature: add `positionIsNotInOneOf` condition, used for seeing if a position is not within any of the provided regions. #### 10.0.3 - Fix: make `hasFixedSize` functions of tuple `key` instances work with `undef` argument. #### 10.0.2 - Enhancement: improve logging of `abortTxnsWhichExceedMaxTime`. #### 10.0.1 - Chore: adapt to `base-compiler 3.0` and remove `base-compiler-itasks` dependency. ## 10.0.0 - Change: add predicate on PID to `abortTxnsWhichExceedMaxTime`. ## 9.0.0 - Change: type of `abortTxnsWhichExceedMaxTime` and do no longer run the function within a subprocess. The function now receives and updates a `:: TransactionDurationTable`. - Feature: add `initialTxnDurationTable` function. #### 8.6.0 - Feature: add `abortTxnsWhichExceedMaxTime` function which terminates read-only transactions which take too long to complete, within a subprocess. #### 8.5.2 - Enhancement: prevent crashes when starting read-only transactions without reader slots available, in case slots can be reclaimed from stale readers. #### 8.5.1 - Fix: use `size_t` pointers in `dbEnvironmentInfoForPath` to avoid conversion errors. #### 8.5.0 - Feature: add `dbEnvironmentInfoForPath` which returns LMDB environment information for the provided path. This allows to also retrieve the environment information for partitioned databases. ### 8.4.0 - Feature: add `dbEnvironmentInfoFor` which returns LMDB environment information for the provided CleanDB. #### 8.3.7 - Chore: adjust to containers 2 deriving `gPrint` for `Set and gast deriving `genShow` for `Set`. #### 8.3.6 - Chore: add generic-binary-encoding as an optional dependency. #### 8.3.5 - Chore: adapt clean-db for changes in base 2.0 #### 8.3.4 - Fix: prevent non-terminating parallel operations in case a subprocess terminates unexpectedly for additional cases. #### 8.3.3 - Fix: prevent deadlocks in parallel operations in corner cases. #### 8.3.2 - Fix: prevent non-terminating parallel operations in case a subprocess terminates unexpectedly. #### 8.3.1 - Enhancement: Use sigkill to terminate child processes in case the main process terminates to be sure all processes are terminated (even in case custom signal handlers are used). ### 8.3.0 - Feature: introduce variants of parallel operations using a local state per subprocess. #### 8.2.1 - Fix: crashes when using `deletePartition`. ### 8.2.0 - Feature: introduce partitioned maps. #### 8.1.3 - Fix: `incrementedStrWithFixedLength` and primitive key instance of `Char`. #### 8.1.2 - Fix: replacing a large value with another one with a length of a multiple of the chunk size. - Enhancement: LMDB interface requiring less `malloc`/`free`/`memcopy` operations. #### 8.1.1 - Fix: improve stability by updating LMDB. ### 8.1.0 - Feature: Add `deleteEmptyMaps`. #### 8.0.1 - Chore: Update graph-copy dependency. ## 8.0.0 - Change: For parallel operations use separate functions for transforming the key/values, the nested transactions and updating the result. Communication is done by arbitrary parameters. This makes the working more comprehensible and enables minimising the size of information used to communicate between sub-processes. The transaction transforming the key/values can now indicate to stop the operation as well. - Enhancement: Optimised stopping parallel operations using the predicate. #### 7.0.2 - Chore: update dependencies (LMDB, GCC version used to generate binaries). #### 7.0.1 - Fix: Create the large value map only when large values are read/written. This fixes the problem where attempting to create the large value map during a read only transaction would cause a failure. ## 7.0.0 - Enhancement: store large values in chunks to avoid having to allocate sequences of free memory pages. after upgrading to this version it is not possible to downgrade. #### 6.2.1 - Enhancement: Prevent crash with erroneous position data present in the database caused by a past bug (fixed with `6.2.0`). ### 6.2.0 - Feature: Add `getAccumDelete`, `getAccumNestedDelete`, `getMultiAccumDelete` and `getMultiAccumNestedDelete`. - Chore: Adapt to `geo` version `2`. ### 6.1.0 - Feature: Add `startTransaction`, `startReadOnlyTransaction` and `endTransaction` to the `Database.CleanDB` module. These functions can be used to use a transaction in a custom scope, unlike `doTransaction` which limits the scope in which the `Txn` can be used. When using `startTransaction`, the user is responsible for closing the transaction using `endTransaction`. - Feature: Add 'alterMulti' for altering values for keys of a `DBMultiMap`. #### 6.0.1 - Enhancement: optimise `put` for non-multimaps by avoiding a memcopy operation on the value representation. ## 6.0.0 Continues from version 4, as version 5 is discontinued. - Change: for parallel operations also provide transactions to the functions accumulating the parallel result state. ## 5.0.0 Discontinued as the idea of read/write cursors turned out to not improve the performance for real applications! - Change: use `WriteOperation`s and `*ReadCursor`s for key and value string representations to avoid copying such representations multiple times, therefore reducing memory operations. - Change: major restructuring of modules. #### 4.4.5 - Fix: issues with not enough reader slots to be available if the same DB is used with and without parallel operations enabled from different processes. #### 4.4.4 - Fix: Fix segfaults occuring for large write transactions. ### 4.4.3 - Enhancement: Include LMDB version with workaround for poor write performance with fragmented freelist. ### 4.4.2 - Enhancement: Add `==` instances for `:: CleanDBOptions`, `:: SyncMode`, `:: ParallelOperationSetting`. ### 4.4.1 - Chore: accept clean-platform with ^v0.3 and ^v0.4. ### 4.4.0 - Feature: add `getMultiAccumPerValue`/`getMultiAccumPerValueNested`. - Enhancement: improve performance and reduce memory usage of `getMultiAccumPerValueNestedParallel`. ### 4.3.0 - Feature: add `withLabeledTestCleanDB` to allow using multiple test databases in a safe way. ### 4.2.0 - Feature: add `deleteAllMaps`. - Enhancement: `withTestCleanDB` keeps the test database open (as unsafe side-effect) which improves performance of testing. - Fix: prevent error messages when the parent process terminates during a parallel operation. ### 4.1.0 - Feature: add support for maybe-type keys. #### 4.0.1 - Fix: prevent crashes when databases are opened/closed at exactly the same time. - Enhancement: prevent superfluous checks for stale readers. ## 4.0.0 - Change: introduce `parallelOperationsSetting` database option; enables using a process pool for nested parallel transactions to improve performance. - Change: parallel operations do not provide a consistent view on the database anymore. - Change: added parallel operations setting parameter to `withTestCleanDB`. #### 3.0.1 - Fix: error messages, report correct function in which the error occurred. ## 3.0.0 - Feature: make the maximum database size configurable. ### 2.1.0 - Feature: add functions for nested transactions performed in parallel. #### 2.0.1 - Enhancement: improve performance of `isIn` condition. ## 2.0.0 - Change: introduce `GivenKeyContinuation` record as argument of `GivenKey` for more readable code. - Chore: don't required deprecated itasks compiler anymore. - Change: make the `usePreviousKey` field optional (`?None` is equivalent to `const ?None`, but provides information for optimisations). - Change: add a `keyMayBeSkipped` flag to `GivenKeyContinuation`, which is used to optimise combined conditions in some cases. #### 1.1.4 - Enhancement: add uniqueness annotations to withTestCleanDB to allow passing unique results. #### 1.1.3 - Enhancement: add uniqueness annotations to `withDatabase`/`withDatabaseSt` to allow passing unique results. #### 1.1.2 - Fix: fixed a regression in LMDB. #### 1.1.1 - Enhancement: slight performance increase putting already existing key/value pairs into multi maps. ### 1.1.0 - Feature: package mdb_stat executable. #### 1.0.1 - Chore: Update base-compiler-itasks from =1.0.0 to ^1.0.0 || ^2.0.0. ## 1.0.0 - Initial version
Versions