This article is part of the MoonBlokz series, which focuses on creating a hyper-local blockchain using microcontrollers and radio communication. You can find the previous articles here:
“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”
Linus Torvalds
In MoonBlokz, we define data structures for various situations, including radio communication messages, variables stored in the device’s memory, and data in persistent storage. Our capabilities at each level are limited, so we aim to use the most compact structures possible. We also have specific requirements for each of these different contexts:
In this article, we will focus on the internal data structure of the blockchain, which forms the foundation for all data representations. Additionally, we will explore unique implementation strategies, such as using B-trees for flash storage and in-memory caching, in separate articles that discuss the implementation of these modules.
The fundamental data structure in a blockchain is a block utilized across all layers. Every block has two parts: a fixed-size header and a variable-size payload.
(I use Rust-style integer notation below. The definitions are available on this site: https://doc.rust-lang.org/book/ch03-02-data-types.html)
The header has ten fields:
The structure of the payload varies by type, and the total size of the block is capped by a configuration parameter (MAX_BLOCK_SIZE). We discuss the used payload types one-by-one:
It is the most complex block type with many possible different layouts. At the top level, the payload consists of a transaction count (represented in two bytes) along with a list of transactions. Theoretically, the maximum number of transactions a block can hold is 65535, but this is limited by the configured maximum block size.
Each transaction has a fixed header that contains the following fields:
The following data fields depend on the transaction type:
It is a straightforward money transfer between two nodes using a minimal set of data fields. Because transactions are priced according to their size, these are the least expensive transactions available. However, this type of transaction has some drawbacks. First, there is no privacy for the node transfers; all data is visible to everyone, and the participants in the network are easily identifiable. Second, each transaction permits only one sender and one receiver.
Node transfers have a fixed size (96 bytes) and have the following fields:
Transaction uniqueness
Our blockchain needs a mechanism to prevent processing transactions more than once. This is straightforward for UTXO (Unspent Transaction Output) transactions, as a UTXO can only be consumed once as an input. However, a different approach is necessary for balance transactions, where sending the same amount to the same recipient multiple times can be a valid scenario.
To address this, we utilize two fields. First, the `anchor_sequence` is automatically incremented when new blocks are added to the blockchain. Second, balance transactions include a comment field. Transaction initiators can freely choose its value, which could represent a bill number or a reference number.
The network will automatically reject entirely identical transactions. If an initializer wants to send multiple transactions with the same data and the same anchor_sequence, it has to choose different comment values.
This transaction registers a new node identifier and a public key on the network. Any node can add new nodes to the network by using registration transactions. In this model, when a new node wants to join the network, it locates an existing node and makes an agreement with that node to facilitate its registration on the network. Although this process incurs a cost for the existing node, the new node can compensate by offering goods or services in exchange.
Fields of the registration transaction:
Complex transactions have a rich internal structure with optional and repeatable elements, so they don’t have a fixed size. This variability is not an issue; in fact, it is taken into account when calculating transaction fees, as all fees are based on the transaction’s size in bytes.
A complex transaction can consist of zero or more inputs and one or more outputs. A transaction with zero inputs is a special case used by the snake chain logic to re-add UTXO outputs. Both the inputs and outputs can represent balances and UTXOs. The first two fields in a transaction indicate the number of inputs (u8) and the number of outputs (u8). While the maximum number of inputs and outputs is 255, this number is typically constrained by the maximum block size. All input and output entries start with a type field (u8).
UTXO inputs (input type: 0)
Balance input (input type: 1):
The fields are the same as in node transfer transactions (anchor_sequence, initializer, amount, comment, signature). The only difference is, in this case, the signature contains the balance input and all outputs.
UTXO output (output type: 0):
It is a simple structure with just two fields:
Balance output (output type: 1):
An even simpler structure:
Fee for the complex transaction
A complex transaction is valid only if the total amount of inputs is greater than or equal to the total amount of outputs. The difference between the sum of outputs and the sum of inputs is the transaction fee.
Why don’t I use global time?
As you may have noticed, we don’t use any timestamps in the blocks. Our algorithms could be significantly simplified if every node had precisely synchronized clocks. Time synchronization protocols, like NTP, already exist, so why don’t we use them? First, I want to avoid relying on any central infrastructure. MoonBlokz aims to create a decentralized network that does not depend on centralized services. Secondly, time synchronization introduces several potential security vulnerabilities, such as the risk of clock message corruption.
The structure of balances is much simpler than the transactions. There are only two header fields:
For every NodeInfo we store the following fields:
Balances are significantly smaller than typical transactions, allowing us to add more to a block.
These blocks outline a list of configuration parameters. An additional article will cover the available configuration points and explain how to use dynamic formulas as configuration values.
Multi-signature refers to a signature created by multiple nodes, accompanied by a list of supporting nodes within an approval payload. The specific details of this payload will be discussed in a later article.
We store and transfer blocks in binary format, simply placing the elements of data structures sequentially. We utilize little-endian representation for all multi-byte fields.
Even with efficient data structures, our messages often exceed the typical physical packet size of the radio layer. For instance, in a LoRa network, the maximum packet size is approximately 256 bytes. A UTXO transaction can be larger than this limit, and we also need to include block headers. As a result, we cannot transfer an entire block between nodes in a single packet.
To address this, we implement packetization logic in the radio layer, which allows us to break blocks into smaller, manageable messages. The packet size is contingent on the radio technology being utilized, and we manage this as a compile-time parameter in the radio module.
This was a very long article with many lists and complex charts. It is not the best article to read in the series, but it is inevitable to put the pieces together and find a place for everything. Next time, we will discuss the cryptographic algorithms and technologies.
Author: Péter Sallai
Source: https://medium.com/@peter.sallai/moonblokz-series-part-v-data-structures-165f9aa480a6