Core Technologies


CoBox is built on top of dat. dat is a modular peer-to-peer technology stack. You can find a good explanation of how it works in the guide 'how dat works'.

dat

hypercore

dat's core database is an append-only log called hypercore. This log can be easily replicated between devices or peers in real-time, and it's integrity is preserved - ensuring other peers haven't tampered with the log's data - using cryptographically signed entries and merkle trees. To do so, each hypercore holds its own cryptographic keypair - a public key and a signing secret_key. Only the holder of the secret_key, the 'writer', can make changes to the log. But all peers with a copy of the log, without the keys, are allowed to 'read'.

hyperdrive

Lets step up the stack one step. On top of hypercore, dat have built a tool called hyperdrive. hyperdrive is a file system abstraction that uses two hypercore's - content and metadata - to represent a fully functioning unix-style file system. It can be mounted as a folder on your computer. When combined with a system of replication and peer discovery, 'writers' of a hyperdrive archive can dynamically update files on the remote 'reader' computers. Changes are replicated with all connected peers in real time, without the need for an authenticated server to guarantee the integrity of the data.

There are limitations to this setup. CoBox has taken steps to solve one limitation, which is the ability for multiple devices to participate as a 'writer', not just a 'reader', in a given file system.

With the default dat setup, if diverging changes are made using the same secret_key from different devices, the integrity of the log is broken and can be regarded as 'forked'. To prevent data corruption, its imperative that the secret_key is only ever used on a single device. This is a significant usability problem - it makes hyperdrive unsuitable for collaborative applications involving multiple peers or devices with write access - the fabled 'multi-writer' setup.

An additional usability issue dat struggles with is having many many secret_keys scattered across your file system, one for each hypercore, which, in our case, is quite a lot! We've gotten around this by using libsodium's key derivation function. All your hypercore keys are derived deterministically from a single parent_key.

multifeed

To resolve this issue, CoBox has made use of innovations by the kappa-db community. multifeed is an aggregation tool for multiple hypercores - it binds together a set of hypercores under a single public key, or as we call it, address. This public key does not directly correspond to a single hypercore, rather it corresponds to a dynamic set. When peers meet at this address on with their chosen networking tool, they first exchange a list of hypercore public keys they hold, then proceed to update each log with the latest data, as well as importing any new logs that may have appeared.

In a simple configuration, such as implemented in the IRC chat app Cabal, each peer maps to a single hypercore instance. For CoBox, each peer maps to three hypercores. If we need to, we can add to this dynamically in the future. Two of these hypercores - content and metadata feeds - are used by hyperdrive. The third is simply named log and is similar to Cabal's use, it stores binary-encoded JSON messages which are used for application-layer data.

kappa-core

To serve peers with live updates from a set of continuously sync'ing hypercores, the kappa-db community built a dynamic indexing system called kappa-core. kappa-core can be used to build custom materialised views over the datasets contained within a multifeed's collection of hypercores. We've used kappa-core to collect together the metadata from each peer's personal hyperdrive, to bind them together to act as a single multi-writer hyperdrive, called kappa-drive. To assist with possible collisions in drive state, we've implemented a conflict resolution mechanism based on 'vector clocks'. We've also built a module called kappa-view-query which enables us define a dynamic st of indexes for message types in our hypercores, over which we can perform scoped queries, and implement map/filter/reduce functions to reduce large datasets quickly and easily and help serve application layer data.

hyperswarm

CoBox uses the hyperswarm distributed hash table to connect peers over our multifeed address.