Message Tagging Proposal

From Bitmessage Wiki
Jump to navigation Jump to search

Introduction

This is a draft proposal to allow Bitmessage messages to be 'tagged' so they can be received by 'lite' clients, such as mobile phones. This page describes the changes proposed and the reasoning behind them.


Summary of the proposal

  • Bitmessage messages can be 'tagged' with a hash derived from the message's destination address and a time value.
  • This tag allows 'lite' clients, such as mobile phones, to receive messages sent to them without downloading and processing large amounts of data.
  • The time component of the tag is 00:00 on the current day in Unix time.
  • Message tagging is only necessary when sending messages to 'lite' clients. Otherwise the tag field is filled with random data.
  • A lite client can signal that messages to its addresses should be tagged by setting a flag in the behavior bitfield of each of its address's pubkeys.


Reasoning behind the proposal

Currently, Bitmessage clients such as PyBitmessage act as full nodes in the Bitmessage peer-to-peer network. This involves transmitting, receiving, and processing a large amount of data. This workload may be manageable for more powerful devices, such as laptops, desktops, and servers, but it is typically too great for less powerful devices such as mobile phones and tablets to reasonably handle. This is particularly true in cases where mobile internet connections mean that bandwidth is very expensive. Because of this, a high proportion of the computing devices in use today are not suitable to act as full nodes in the Bitmessage peer-to-peer network. Therefore, in order for the users of these devices to be able to use Bitmessage, there must be some way of doing so that does not involve their device acting as a full node in the network.

One approach to solving this problem is the creation of 'lite' clients. These are Bitmessage clients which do not do all the processing and transmission work that full nodes do. Instead, the burden of this work is shifted to servers, which supply lite clients with the data they need and relay data sent by lite clients to the rest of the Bitmessage network. Thankfully the design of Bitmessage allows this to be implemented in such a way that very little security is lost. All message encryption and decryption can be done locally by the lite client, and it can retain sole control over its addresses and cryptographic keys. Data provided by servers can be cryptographically authenticated as being correct, and data sent to servers can be disseminated to several at once to ensure that it reaches the rest of the network.

However, this arrangement does require one significant compromise. In order for 'lite' clients such as mobile phones to receive messages, there needs to be some way for them to identify messages that have been sent to their addresses. Full nodes in Bitmessage do not require this because they receive and automatically attempt to decrypt all messages in the streams (segments of the network) which they are a part of. As described above however, the work required for this approach is too great for many devices, and therefore there must be some way for lite clients to identify which messages are destined for them, so that they can request those messages from a server. This identification of messages sent to lite clients can be accomplished by introducing a 'tag' to Bitmessage messages which is derived from the destination address.

Tagging messages in this way is clearly a trade-off in terms of security, as it reveals some extra information, albeit obfuscated information, about the message's destination. However, this trade-off delivers a substantial benefit in allowing mobile phones and other lite clients a viable part of Bitmessage. Adding a tag to messages when the destination is a lite client can be a user-configurable option (there is already an option for this in the PyBitmessage settings page), so it will not be forced on anyone.

Under this proposal, message tags are calculated by hashing two sets of data. The first set is data taken from the message's destination address. The second is a time value based on the day the message is sent.

Adding a time component to the data used to calculate the tag reduces the information leakage created by tagging messages. It is proposed that the time value should be based on the day on which the message is sent. Thus, if a lite client is offline for 5 days, it can calculate 5 different tag values for each of its addresses and request any messages marked with those tags from servers. The balance here is between the level of obfuscation introduced into the tag and the burden created for a lite client requesting messages with those tags. Unless an adversary discovers the address that the messages are being sent to, there should be no way to link messages sent on different days to each other.

Where a message is not being sent to a lite client's address and therefore doesn't need to be tagged, the 'tag' field of the message can be filled with random data. If the hashing function used to calculate the tag is secure, then message tags and random data of the same length should be indistinguishable, except in cases where multiple messages are sent to a particular address on the same day. Even then, this tagging scheme reveals only a little information to attackers, while massively reducing the burden of work required for devices such as mobile phones to use Bitmessage.


Proposed changes in detail

Recognizing lite client addresses

A Bitmessage client can determine whether an address is a 'lite client' address that requires messages sent to it to be tagged by examining the behavior bitfield in that address's pubkey. See https://bitmessage.org/wiki/Protocol_specification#Pubkey_bitfield_features. Bit 30, which is currently designated as 'include_destination' would be changed to 'include_tag'. The 'include_destination' flag relates to an earlier way of tagging messages (simply using the destination's address's ripe hash as the tag). This tagging scheme was never implemented, so it can be safely replaced.

Address data

The message tagging proposed is very similar to the tagging of version 4 pubkeys that is already done in Bitmessage (see https://bitmessage.org/wiki/Protocol_specification#pubkey). The 'address data' component of the data used to calculate the message tag is the same as that in an encrypted pubkey. The address version number, stream number, and ripe hash of the address are concatenated to give the 'address data'.

In order to create the input data for the hash, the address version number and stream number are encoded using variable length integer encoding (see https://bitmessage.org/wiki/Protocol_specification#Variable_length_integer) and the time value is converted to its byte value in big-endian order.

Time value

The 'time' component of the tag is proposed to be based on Unix time (see https://en.wikipedia.org/wiki/Unix_time), which is already used throughout Bitmessage. It is proposed that the time value should be the Unix timestamp for the beginning of the current day in Unix time. A pseudocode expression of the proposed formula to calculate the time value follows:

unixTime = getUnixTime()
remainderSeconds = unixTime mod 86,400
timeValue = unixTime - remainderSeconds

There are 86,400 seconds in a day, so calculating unixTime mod 86,400 gives the number of seconds since the beginning of the current day in Unix time. Subtracting that number of seconds from unixTime gives the Unix timestamp of the beginning of the current day.

For example, if the current Unix time is 1,404,786,854 then the time value would be calculated as follows:

1,404,786,854 mod 86,400 = 9,254
1,404,786,854 - 9,254 = 1,404,777,600

The time value used in the hash calculation to create the message tag would therefore be 1,404,777,600. The following day it would be 1,404,864,000,000.

Pseudocode

A pseudocode expression of the proposed procedure to calculate a message tag follows:

fullTag = sha512(sha512(addressVersionNumber + streamNumber + ripeHash + unixTimestamp))
messageTag = fullTag[0-31]

Example

For example, if the current Unix time were the same as the above (1,404,786,854), then the message tag for address BM-87ozvCK4Jkx9Pc4dP7cd6y3T33DcSdmWPaq would be as follows. Note that the values below are hexadecimal representations of the actual values, not the values themselves:

addressVersion = 04
streamNumber = 01
ripeHash = ec87a1475401c88030f0a1efd0cf85ecdfd7bbca
timeValue = 0000000053bb3480
dataToHash = 0401ec87a1475401c88030f0a1efd0cf85ecdfd7bbca0000000053bb3480
fullTag    =  8d2a1e2284791ec0fbbec41b58b1ecba341a9ccde1306d861a1702253f2a21e1f1382c3b4f4377262dabb0768bab87d56e54bb26d5c84f44b36a92c695bbb4a7
messageTag =  8d2a1e2284791ec0fbbec41b58b1ecba341a9ccde1306d861a1702253f2a21e1

Msg object

Under this proposal, a msg object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#msg.

Field Size Description Data type Comments
8 POW nonce uint64_t Random nonce used for the Proof Of Work
4 (or 8) time uint32_t The time that this message was generated and broadcast. We are transitioning to 8 byte time.
1+ streamNumber var_int The stream number of the destination address.
32 tag uchar[] The message tag, made up of bytes 0-31 of sha512(sha512(addressVersionNumber + streamNumber + ripeHash + unixTimestamp)). If the message has no tag, this field is filled with 32 bytes of random data.
? encrypted uchar[] Encrypted data. See Encrypted payload. See also Unencrypted Message Data Format

Conclusion

All feedback, ideas, and criticism about this proposal are welcome. A discussion thread can be found on the Bitmessage forum here: https://bitmessage.org/forum/index.php?topic=4075.0.