MoonBlokz series part V. — Data Structures

This article is part of the MoonBlokz series, which focuses on creating a hyper-local blockchain using microcontrollers and radio communication. You can find the previous articles here:

“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”

Linus Torvalds

In MoonBlokz, we define data structures for various situations, including radio communication messages, variables stored in the device’s memory, and data in persistent storage. Our capabilities at each level are limited, so we aim to use the most compact structures possible. We also have specific requirements for each of these different contexts:

  • The most critical factor in radio communication is maintaining a small message size.
  • On-device memory is fast but small, typically just a few hundred kilobytes. We use most of this memory for the stack (local variables), which restricts our ability to store many blocks or nodes. As a result, we must implement efficient caching mechanisms.
  • While persistent storage is more extensive, it is much slower and has special limitations. Microcontrollers typically utilize flash storage with a finite number of write cycles. Therefore, we need to minimize the frequency of writes to this storage.

In this article, we will focus on the internal data structure of the blockchain, which forms the foundation for all data representations. Additionally, we will explore unique implementation strategies, such as using B-trees for flash storage and in-memory caching, in separate articles that discuss the implementation of these modules.

Blocks

The fundamental data structure in a blockchain is a block utilized across all layers. Every block has two parts: a fixed-size header and a variable-size payload.

(I use Rust-style integer notation below. The definitions are available on this site: https://doc.rust-lang.org/book/ch03-02-data-types.html)

The header has ten fields:

  • version(u8): This field represents the protocol version number, which allows flexibility for future additions. Currently, its value is fixed at 1.
  • sequence(u32): The sequence number of a given block. It is starting from zero. Sequences 0 and 1 are the genesis blocks. The maximum chain length is 2³²=4 billion. While this is a large number, it is not infinite. If a new block is created every second, the sequence will overflow in about 136 years. In a more realistic scenario, where a new block is added every minute, it would take approximately 8,171 years to reach overflow. Overflow is not a significant issue here. We can restart the sequences from zero, as we only need enough sequence numbers for the active chain.
  • creator(u32): the node’s identifier who created the block
  • mined_amount(u32): Represents the reward given to miners, excluding transaction fees. This value could be calculated if the entire history of the chain is known. However, we need to store it to facilitate calculations after the `snake_chain` consumes the beginning of our chain.
  • payload type(u8): Currently, we use four payload types: 1-transaction blocks, 2-balance blocks, 3-chain configurations, and 4-evidence blocks. In the future, additional types may be introduced.
  • consumed_votes(u32): The number of votes consumed by the creator to create the block. This field can also be calculated if we know the complete chain history.
  • first_voted_node(u32): The node that has the most votes when this block is created. Normally, it is the creator node. (if not, the approval process starts)
  • consumed_votes_from_first_voted_node(u32): The number of votes consumed from the first_voted_node. This field usually is 0. It will only be different from zero if the block is created by a node that is not the one with the most votes.
  • previous_hash([u8;32]): The hash of the previous block in the chain, including the header and the payload.
  • signature([u8;64]): Block creator’s digital signature for the entire block. This field and the parent_hash field in the next block ensure immutability.

The structure of the payload varies by type, and the total size of the block is capped by a configuration parameter (MAX_BLOCK_SIZE). We discuss the used payload types one-by-one:

Transaction block

It is the most complex block type with many possible different layouts. At the top level, the payload consists of a transaction count (represented in two bytes) along with a list of transactions. Theoretically, the maximum number of transactions a block can hold is 65535, but this is limited by the configured maximum block size.

Each transaction has a fixed header that contains the following fields:

  • type(u8): Currently, we use three transaction types: 1 – node transfer (transfer of money between nodes), 2 – registration (adding a new node to the network), 3 – complex (transactions involving UTXO inputs or outputs).
  • vote(u32): Each transaction can vote for a node, which enhances that node’s chances of creating a block in the future. A node cannot vote for itself. If the transaction is a node transfer or registration, the initializer is not eligible to receive votes. Additionally, for complex transactions that include a balance input, the initializer also cannot be voted.

The following data fields depend on the transaction type:

Node transfer

It is a straightforward money transfer between two nodes using a minimal set of data fields. Because transactions are priced according to their size, these are the least expensive transactions available. However, this type of transaction has some drawbacks. First, there is no privacy for the node transfers; all data is visible to everyone, and the participants in the network are easily identifiable. Second, each transaction permits only one sender and one receiver.

Node transfers have a fixed size (96 bytes) and have the following fields:

  • anchor_sequence(u32): The sequence number of the blockchain at the time the transaction was created. It serves a function similar to a timestamp. It also replaces transaction nonces. A transaction cannot be included in any blocks that have a smaller or equal sequence number. This field is crucial for ensuring the uniqueness of transactions and aids in optimizing the detection of duplicate instances of the same transaction. When a node adds a new transaction from its mempool, it only needs to check the blocks after the anchor_sequence for the presence of the block.
  • initializer(u32): The node that started the transaction. This node also pays the transaction fee.
  • receiver(u32): The node that receives the transfer.
  • amount(u64): The amount of money to be transferred. The transaction can only be processed if the initializer has sufficient balance to cover the amount and the fee.
  • fee(u32): The cost is paid by the initializer for the processing of the transaction. The fee structure can be defined by various configurations. Both minimum and maximum limits can be set based on the transaction size (priced per byte).
  • comment(u64): This is a flexible field for the initializer. It can represent a reference number, a counter, or any other type of information.
  • signature([u8;64]): Digital signature of the whole transaction content by the initializer.

Transaction uniqueness

Our blockchain needs a mechanism to prevent processing transactions more than once. This is straightforward for UTXO (Unspent Transaction Output) transactions, as a UTXO can only be consumed once as an input. However, a different approach is necessary for balance transactions, where sending the same amount to the same recipient multiple times can be a valid scenario.

To address this, we utilize two fields. First, the `anchor_sequence` is automatically incremented when new blocks are added to the blockchain. Second, balance transactions include a comment field. Transaction initiators can freely choose its value, which could represent a bill number or a reference number.

The network will automatically reject entirely identical transactions. If an initializer wants to send multiple transactions with the same data and the same anchor_sequence, it has to choose different comment values.

Registration

This transaction registers a new node identifier and a public key on the network. Any node can add new nodes to the network by using registration transactions. In this model, when a new node wants to join the network, it locates an existing node and makes an agreement with that node to facilitate its registration on the network. Although this process incurs a cost for the existing node, the new node can compensate by offering goods or services in exchange.

Fields of the registration transaction:

  • initializer(u32): The existing node that initiates the registration.
  • new_node_id(u32): the identifier of the new node. It is calculated as the maximum node_id on the network plus one. This is a special field generated only when the transaction is added to a block. It’s important to note that it does not actually store new information; its value can be calculated from the active chain. We keep this field to associate the new public key with the corresponding node_id in one block.
  • registration_price(u64): Registering a new public key is not free; there is a cost associated with it. The registration price is a configurable parameter defined in the chain configuration and is determined by dynamic factors such as the total number of nodes in the network. While this value can be calculated, we store it to facilitate easier balance calculations later. It’s important to note that no one benefits from this fee; instead, the network absorbs the funds.
  • fee(u64): The same transaction fee we used with node transfers.
  • new_public_key([u8;32]): the public key of the newly registered node. A public key can only be registered once, and the network verifies its uniqueness. We do not need anchor_sequence and comment fields for registration because the public key alone ensures uniqueness.
  • new_key_signature([u8;64]): Due to the Schnorr cryptography algorithm, every node must prove it has the corresponding private key for the registered public key. In this case, the proof is the signature of the given public key.
  • signature([u8;64]): Digital signature of the entire transaction content by the initializer. This is important because the initializer pays the price and the fee.

Complex transaction

Complex transactions have a rich internal structure with optional and repeatable elements, so they don’t have a fixed size. This variability is not an issue; in fact, it is taken into account when calculating transaction fees, as all fees are based on the transaction’s size in bytes.

A complex transaction can consist of zero or more inputs and one or more outputs. A transaction with zero inputs is a special case used by the snake chain logic to re-add UTXO outputs. Both the inputs and outputs can represent balances and UTXOs. The first two fields in a transaction indicate the number of inputs (u8) and the number of outputs (u8). While the maximum number of inputs and outputs is 255, this number is typically constrained by the maximum block size. All input and output entries start with a type field (u8).

UTXO inputs (input type: 0)

  • tr_hash([u8;32]): This hash identifies a complex transaction that includes the UTXO output we want to spend in this transaction. We use hashes instead of sequence numbers (like Bitcoin) to make transactions more stable (they can be re-added after a chain changes direction). However, this method requires more space, adding approximately 26 bytes per input. In a future version, it may be possible to introduce a sequence-based approach as a new input type.
  • output_index(u8): It is not enough to identify a transaction. It is also necessary to find the exact UTXO output to use as input.
  • signature([u8;64]): Signature of this input & all outputs of the transaction, signed by the corresponding private key to the public key of the used UTXO.

Balance input (input type: 1):
The fields are the same as in node transfer transactions (anchor_sequence, initializer, amount, comment, signature). The only difference is, in this case, the signature contains the balance input and all outputs.

UTXO output (output type: 0):
It is a simple structure with just two fields:

  • address([u8;32]): The public key of the UTXO.
  • amount(u64): The money of the UTXO.

Balance output (output type: 1):
An even simpler structure:

  • receiver(u32): The identifier of the receiver node.
  • amount(u64): The money to send.

Fee for the complex transaction
A complex transaction is valid only if the total amount of inputs is greater than or equal to the total amount of outputs. The difference between the sum of outputs and the sum of inputs is the transaction fee.

Why don’t I use global time?

As you may have noticed, we don’t use any timestamps in the blocks. Our algorithms could be significantly simplified if every node had precisely synchronized clocks. Time synchronization protocols, like NTP, already exist, so why don’t we use them? First, I want to avoid relying on any central infrastructure. MoonBlokz aims to create a decentralized network that does not depend on centralized services. Secondly, time synchronization introduces several potential security vulnerabilities, such as the risk of clock message corruption.

Balance payload

The structure of balances is much simpler than the transactions. There are only two header fields:

  • nodeinfo_count(u16): The number of balances in the block. Representing this field in two bytes implicitly defines the upper limit of configurable MAX_BLOCK_SIZE. Specifically, 48 bytes multiplied by the maximum number of balances (65535) equals 3,145,680 bytes, approximately 3 MB. This size is more than adequate for any foreseeable use case. Furthermore, this limit can be increased with a new protocol version.
  • max_node_id (u32): This field helps nodes verify that they have the information for every node. It is particularly important when a newly joined node collects enough blocks to form an active chain. Node identifiers are assigned sequentially, meaning this field also represents the total count of nodes.

For every NodeInfo we store the following fields:

  • owner(u32): the node’s identifier
  • balance(u64): the owner’s amount of money
  • vote_count(u32): the total number of votes for the node.
  • public_key([u8;32]): the node’s public key. We use it to validate signatures made by the node.

Balances are significantly smaller than typical transactions, allowing us to add more to a block.

Chain configuration payload

These blocks outline a list of configuration parameters. An additional article will cover the available configuration points and explain how to use dynamic formulas as configuration values.

Approval payload

Multi-signature refers to a signature created by multiple nodes, accompanied by a list of supporting nodes within an approval payload. The specific details of this payload will be discussed in a later article.

Serialization

We store and transfer blocks in binary format, simply placing the elements of data structures sequentially. We utilize little-endian representation for all multi-byte fields.

Radio packetization

Even with efficient data structures, our messages often exceed the typical physical packet size of the radio layer. For instance, in a LoRa network, the maximum packet size is approximately 256 bytes. A UTXO transaction can be larger than this limit, and we also need to include block headers. As a result, we cannot transfer an entire block between nodes in a single packet.

To address this, we implement packetization logic in the radio layer, which allows us to break blocks into smaller, manageable messages. The packet size is contingent on the radio technology being utilized, and we manage this as a compile-time parameter in the radio module.

This was a very long article with many lists and complex charts. It is not the best article to read in the series, but it is inevitable to put the pieces together and find a place for everything. Next time, we will discuss the cryptographic algorithms and technologies.

Author: Péter Sallai

Source: https://medium.com/@peter.sallai/moonblokz-series-part-v-data-structures-165f9aa480a6

Have a question?

If you are interested in our solutions, contact us via the form below.

Dorsum is proud to benefit from the European Commission’s Brexit Adjusment Reserve.
Learn more about our project aimed at aligning our Wealth Management Suite to the UK market. (Hungarian content)

dorsum-logo
We love cookies. And You?

What is a cookie?
A cookie is a small file of letters and numbers which we store in your browser or the hard drive of your computer (with your agreement). Cookies contain information which is transferred to your computer’s hard drive.

Why we use cookies?
Our website uses cookies to distinguish you from other users of our website. This helps us to provide you with a superb user experience when you browse our website and allows us to improve our internet presence

Choose whether to allow cookies or related technologies - webmasters, pixel tags, and Flash objects (cookies) - for the website as follows. Read our Privacy Policy below to learn more about how this website uses cookies and related technologies.
Before sending your approval, please set your preference in the below categories.