Skip links

Publishing Text and Images in Bitcoin

Introduction

Bitcoin is said to bank the unbanked. Why? In contrast to the traditional bank system, where new clients must provide personal information to open an account, the only requirement to join the Bitcoin financial system is of technical nature: a device to run the software to communicate with the Bitcoin network. No entity decides who is allowed to participate in the network — an interesting feature also useful for a censorship-free press application.

Blockchains are already used to store non-financial data for diverse purposes, e.g. to prove authorship of ideas or to prove the existence of a document. One of the largest files stored successfully into the Bitcoin blockchain is an image of Nelson Mandela. A user managed to insert this photo of about 14 KB. In truly decentralized blockchains, a valid transaction — the standard format of transactions defined in each blockchain — almost always passes onto the ledger. Hence, not only there is no entity who gives permissions to join the network, there is also no one who might filter out content.

The mechanism to “chain” the blocks of transactions together does not allow to tamper data, once they enter into the blockchain. And, crucial for the avoidance of single point of failure, the blockchain is replicated, i.e. nodes run a local copy of it. Therefore, blockchains are not prone to downtimes, or, in the words of a chinese student, “there is no 404 on the blockchain” (source). All this makes blockchain a promising candidate for journalists seeking censorship-free media.

Solutions in Bitcoin

Overview

Bitcoin transactions must reference to coins which have been sent previously to a public key address. To authorize a new transaction, the owner of the coin must provide a proof that this coin belongs to her public key address using her private key to generate a cryptographic signature. The relevant information to let the remaining network nodes prove the correctness of the signature is written in the input scripts of Bitcoin transactions. Furthermore, transactions specify the receivers of the transfer, again specified by public key addresses which are written into the output scripts of the transactions.

Non-financial transactions must appear valid to the Bitcoin miners so that they do not discard them. The conditions for a valid transaction are:

  • Minimum Transfer Amount: The minimum output value of a transaction is currently about 546 satoshis to be not considered as dust (as of June 7, 2019).
  • Minimum Fees: The fees are determined by the size of the output and the input script. The current average fee per byte is about 39 Satoshi (cf. bitcoinfees.info, as of June 7, 2019)
  • Maximum Data Size: The total upper limit of a standard Bitcoin transaction is 100 KB. Input and output scripts may carry specific data of limited size.
  • Unspent Coin: The input script must reference to an unspent output script.

Transactions that deviate from these rules are considered non-standard and will not be picked by most miners. As we will see below, transactions containing non-financial data may look different from standard transactions. Non-standard transactions may pass to the blockchain anyway, but some with lower probability, and are more critical to future protocol changes.

Data Insertion Methods

Output script:

In [1],[2], the authors identify 5 output script types that are template-compliant which do not involve the input script.  Since miners cannot distinguish between legitimate public key address hashes and arbitrary binary data, output scripts can easily be used to insert data indistinguishable to the miners. A disadvantage of the use of output scripts is that users must burn bitcoins as they replace valid receiver addresses with arbitrary data. The following output scripts can be used to insert arbitrary data:

Pay-to-Public-Key (P2PK): Data stored instead of an output of 33 bytes compressed key or 65 bytes uncompressed key together with a non-dust amount of bitcoin to burn.

Pay-to-Public-Key-Hash (P2PKH): Data stored instead of an output public key hash together with a non-dust amount of bitcoin to burn. This allows to store 20 Bytes per output.

OP RETURN: This is a place to store 80 bytes per transaction which is a provably unspendable UTXO that the miners do not need to track.

Multi-Signature: E.g. in case of 1 out of 3 multi-signature script, data can be stored instead of 3 public key hashes, or with 1 real and two unreal signatures in which case the transaction stays spendable (more details in [1]).

Coinbase transaction: Arbitrary data up to 100 bytes can be stored in one transaction per block, but this option is only available to miners.

Input script:

This requires a more sophisticated technique. Input scripts allow bigger size data to be inserted, but must maintain their valid semantics. To achieve this, the input script must refer to a valid output script, e.g. by using a dead branch inserted previously. These transactions are not stored in the list of unspent transaction output set. As described in [1] (see Loc. cit. for more details), there are two special ways to do so:

Pay-to-Script-Hash (P2SH): These data refer to the unspent coin. Data can be stored in the redeem script (limit 520-byte) and/or in the part of the input script followed by the redeem script (limited by the 1650 bytes total limit of the input script). More advanced methods to store data are mentioned in [1]:

  • Data Drop Method: Data get stored in the redeem script.
  • Data Hash Transaction: This uses the script following the redeem script.

Data Reconstruction:

Output scripts in the P2PKH larger than 20 Bytes must be spilt in various output scripts, either within one transaction or, if larger than the maximum size (see the table below), in various transactions. Data need to be linked together either onchain or offchain to allow a reconstruction of datasets stored in the blockchain. One may use the output script to store metadata, e.g. a reference to the transaction ID of the next chunk of data stored in another transaction.

Input scripts may store data using the methods Data Drop and/or Data Hash. As shown in [1], within a single transaction, the maximum file size can be of  96,060 bytes. Files larger than this, require again an indexing of the split data.

Selected Content Insertion Service:

Apertus: This service allows fragmenting content over multiple transactions using an arbitrary number of P2PKH output scripts. Besides further features, Apertus works also for Litecoin, Dogecoin, and others.

The authors of [1] found that the P2FKH is the method more widely spread, although, it creates the most unspendable UTXO bloat, requires the largest overhead, and is the most expensive. They argue that its popularity can be explained by its simplicity of implementation.

References

[1] A. Sward, I. Vecna, and F. Stonedahl. Data Insertion in Bitcoin’s Blockchain. Ledger Journal, 2018.

[2] R. Matzutt, J. Hiller, M. Henze, J. H. Ziegeldorf, D. Müllmann, O. Hohlfeld, and K. Wehrle. A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin. In Proceedings of the 22nd International Confer-ence on Financial Cryptography and Data Security (FC). Springer, 2018.

For our complete research on Censorship-free publishing on the blockchain click here.