Difference between revisions of "Lite Client Message Retrieval Proposal"

From Bitmessage Wiki
Jump to navigation Jump to search
m
m (Minor formatting correction)
Line 62: Line 62:
 
! Field Size !! Description !! Data type !! Comments
 
! Field Size !! Description !! Data type !! Comments
 
|-
 
|-
| 4||prefixNonce||uint32_t||The prefix nonce.  
+
| 4||prefix_nonce||uint32_t||The prefix nonce.  
 
|-
 
|-
 
| ?||encrypted||uchar[]||Encrypted data. See [[Protocol_specification#Encrypted_payload|Encrypted payload]]. See also [[Protocol_specification#Unencrypted_Message_Data|Unencrypted Message Data Format]]
 
| ?||encrypted||uchar[]||Encrypted data. See [[Protocol_specification#Encrypted_payload|Encrypted payload]]. See also [[Protocol_specification#Unencrypted_Message_Data|Unencrypted Message Data Format]]

Revision as of 11:08, 23 December 2014

Introduction

This is a proposal to introduce a mechanism into the Bitmessage protocol that will allow 'lite' Bitmessage clients to retreive messages that have been sent to them without downloading all the messages in a stream.

Summary of the proposal

  • Lite clients can determine which messages they need to dowload using 'prefix filters'.
  • Prefix filters allow the owner of each Bitmessage address to determine the balance between anonymity and efficiency that they will have when retrieving messages.
  • Users who wish to retain the highest level of anonymity (by downloading all messages in a stream) are not affected by this change. Messages to them are 'tagged' with completely random data.
  • Users who do not care about anonymity can specify a very precise prefix for each of their addresses, allowing them to download only messages destined for them.
  • Other users can achieve a balance between anonymity and convenience.


Reasoning behind the proposal

Currently, Bitmessage clients such as PyBitmessage act as full nodes in the Bitmessage peer-to-peer network. This involves transmitting, receiving, and processing a large amount of data. This workload may be manageable for more powerful devices, such as laptops, desktops, and servers, but it is typically too great for less powerful devices such as mobile phones and tablets to reasonably handle. This is particularly true in cases where mobile internet connections mean that bandwidth is very expensive. Because of this, a high proportion of the computing devices in use today are not suitable to act as full nodes in the Bitmessage peer-to-peer network. Therefore, in order for the users of these devices to be able to use Bitmessage, there must be some way of doing so that does not involve their device acting as a full node in the network.

One approach to solving this problem is the creation of 'lite' clients. These are Bitmessage clients which do not do all the processing and transmission work that full nodes do. Instead, the burden of this work is shifted to servers, which supply lite clients with the data they need and relay data sent by lite clients to the rest of the Bitmessage network. Thankfully the design of Bitmessage allows this to be implemented in such a way that very little security is lost. All message encryption and decryption can be done locally by the lite client, and it can retain sole control over its addresses and cryptographic keys. Data provided by servers can be cryptographically authenticated as being correct, and data sent to servers can be disseminated to several at once to ensure that it reaches the rest of the network.

However, this arrangement does require one significant compromise. In order for 'lite' clients such as mobile phones to receive messages, there needs to be some way for them to identify messages that have been sent to their addresses. Full nodes in Bitmessage do not require this because they receive and automatically attempt to decrypt all messages in the streams (segments of the network) which they are a part of. As described above however, the work required for this approach is too great for many devices, and therefore there must be some way for lite clients to identify which messages are destined for them, so that they can request those messages from a server.


Proposed changes

The pubkey of every hashed address contains two new fields: a 'prefix mask' field and a 'prefix' field. The value of the prefix mask field is determined by the owner of the pubkey. The prefix field is set randomly. The value of the prefix mask field defines what balance of efficiency and anonymity the owner of that address wants. If the owner of the pubkey wants maximum anonymity, then all the bits in the prefix mask field will be set to 0, allowing the prefix to be ignored entirely. On the other extreme, if the owner of the pubkey wants maximum efficiency in retrieving data at the expense of anonymity then all bits in the prefix mask field will be set to 1. Each additional bit set to one halves the proportion of the total data set that you will have to request in order to receive data destined for you. The positions of the bits set to 1 in the prefix mask field are selected randomly.

All msgs have a 'prefix nonce' as part of their unencrypted data. This prefix nonce is always of a fixed length (32 bits) to avoid leaking information. When sending a msg to an address, the sending client examines the destination pubkey's prefix mask field and records which bit positions (if any) are set to 1. The sending client will then set the bits in the corresponding positions of the prefix nonce to match those found in the destination pubkey's prefix field. The remaining bits of the prefix nonce are set randomly.

When a lite client wishes to retrieve msgs from the network, it makes a request to a server for all msgs which have a prefix nonce where the bits set to 1 in the prefix mask of the receiving address's pubkey match the prefix field of the receiving pubkey.

Pubkey object

Under this proposal, the encrypted part of a pubkey object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#pubkey.

Field Size Description Data type Comments
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
4 prefix_mask uint32_t A bit mask that defines which bits from the prefix field should be used.
4 prefix uint32_t The prefix value. Contains 32 randomly set bits.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the object header starting with the time, then appended with the decrypted data down to the extra_bytes. This was changed in protocol v3.

Msg object

Under this proposal, a msg object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#msg.

Field Size Description Data type Comments
4 prefix_nonce uint32_t The prefix nonce.
? encrypted uchar[] Encrypted data. See Encrypted payload. See also Unencrypted Message Data Format


Examples

Example 1

If Alice does not care about anonymity, she can have her address's pubkey specify that the full 32 bits of the prefix field should be used. This is done by setting all bits of the pubkey's prefix mask field to 1. This means that all msgs sent to her will effectively be uniquely tagged for her address (there are roughly 4 billion possible prefix values). This will allow Alice to only download msgs that are destined for her, giving her maximum usability and convenience at the expense of anonymity.

Example 2

If Bob wants to achieve a balance between anonymity and convenience, he can have his address's pubkey specify that 4 bits of the prefix field should be used. This is done by setting 4 randomly selected bits of the pubkey's prefix mask field to 1. This means that all msgs sent to his address will have a prefix nonce where the 4 bits in the positions set to 1 in the prefix mask will match the corresponding bits in his pubkey's prefix field. A random distribution of prefixes in a stream should mean that roughly 6.25% of msgs in a stream will have a prefix nonce where the bits in those positions match the values of those bits in his pubkey's prefix field. Therefore Bob will be able to download roughly 6.25% of the msgs in a stream and still be guaranteed to receive all msgs destined for him.

Example 3

If Carol wants maximum anonymity then she can have her address's pubkey specify that zero bits of the prefix field should be used. This is done by setting all of the bits of the pubkey's prefix mask field to 0. This means that msgs sent to her will contain no identifying information, only a totally random prefix nonce. Carol will have to download and attempt to decrypt all msgs in a stream in order to be sure of receiving all msgs destined for her.


Notes

Other object types

Prefix filtering is not required for broadcasts, pubkeys, or getpubkeys. These object types are already tagged with an identifier.

Lite client pubkey retrieval

How can lite clients retrieve pubkeys without telling the servers who they are talking to?
In the short to medium term (until streams are implemented), it will be perfectly viable for lite clients to simply download all pubkeys in stream 1. A pubkey is 392 bytes in size. If we assume an average of 5 addresses per user, we could support a 10x increase in the number of active users (from roughly 1,000 to 10,000), and lite clients would still only have to download 19MB worth of pubkey data. This is only a rough approximation, but it seems reasonably clear that the amount of data to download and store would be acceptable.

Lite client pubkey dissemination

How can lite clients respond to getpubkey requests?
Lite clients can do this in two ways:
1. They can make periodic requests to full node servers to check for any getpubkeys for any of their addresses and then respond to any getpubkeys they receive by re-disseminating the relevant pubkeys.
2. They can periodically re-disseminate the pubkeys for each of their addresses, with the re-dissemiantion period determined by the time to live of the pubkeys they create. This will ensure that the pubkeys for all the lite client's addresses are always available in the stream(s) they are a part of.

Non-hashed addresses

If non-hashed addresses are introduced in the future, this scheme will not apply to them, as they will not have pubkeys. Users of non-hashed addresses will still need to download and attempt to decrypt all msgs in each stream in which they have an address. This makes sense though, as the whole point of introducing non-hashed addresses is to give people the ability to choose greater anonymity at the expense of convenience.

Streams

The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.