Difference between revisions of "Lite Client Message Retrieval Proposal"

From Bitmessage Wiki
Jump to navigation Jump to search
(Reverted to using prefix length instead of prefix mask)
(Added section for 'Daily changing prefix values' idea)
Line 100: Line 100:
 
===Streams===  
 
===Streams===  
 
The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.
 
The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.
 +
 +
===Possible refinement: Daily changing prefix values===
 +
XXX

Revision as of 14:35, 4 January 2015

Introduction

This is a proposal to introduce a mechanism into the Bitmessage protocol that will allow 'lite' Bitmessage clients to retreive messages that have been sent to them without downloading all the messages in a stream.

Summary of the proposal

  • Lite clients can determine which messages they need to dowload using 'prefix filters'.
  • Prefix filters allow the owner of each Bitmessage address to determine the balance between anonymity and efficiency that they will have when retrieving messages.
  • Users who wish to retain the highest level of anonymity (by downloading all messages in a stream) are not affected by this change. Messages to them are 'tagged' with completely random data.
  • Users who do not care about anonymity can specify a very precise prefix for each of their addresses, allowing them to download only messages destined for them.
  • Other users can achieve a balance between anonymity and convenience.


Reasoning behind the proposal

Currently, Bitmessage clients such as PyBitmessage act as full nodes in the Bitmessage peer-to-peer network. This involves transmitting, receiving, and processing a large amount of data. This workload may be manageable for more powerful devices, such as laptops, desktops, and servers, but it is typically too great for less powerful devices such as mobile phones and tablets to reasonably handle. This is particularly true in cases where mobile internet connections mean that bandwidth is very expensive. Because of this, a high proportion of the computing devices in use today are not suitable to act as full nodes in the Bitmessage peer-to-peer network. Therefore, in order for the users of these devices to be able to use Bitmessage, there must be some way of doing so that does not involve their device acting as a full node in the network.

One approach to solving this problem is the creation of 'lite' clients. These are Bitmessage clients which do not do all the processing and transmission work that full nodes do. Instead, the burden of this work is shifted to servers, which supply lite clients with the data they need and relay data sent by lite clients to the rest of the Bitmessage network. Thankfully the design of Bitmessage allows this to be implemented in such a way that very little security is lost. All message encryption and decryption can be done locally by the lite client, and it can retain sole control over its addresses and cryptographic keys. Data provided by servers can be cryptographically authenticated as being correct, and data sent to servers can be disseminated to several at once to ensure that it reaches the rest of the network.

However, this arrangement does require one significant compromise. In order for 'lite' clients such as mobile phones to receive messages, there needs to be some way for them to identify messages that have been sent to their addresses. Full nodes in Bitmessage do not require this because they receive and automatically attempt to decrypt all messages in the streams (segments of the network) which they are a part of. As described above however, the work required for this approach is too great for many devices, and therefore there must be some way for lite clients to identify which messages are destined for them, so that they can request those messages from a server.


Proposed changes

The pubkey of every hashed address contains two new fields: a 'prefix length' field and a 'prefix' field. The value of the prefix length field is defined by the owner of the pubkey. The prefix field is set randomly. The value of the prefix length field defines what balance of efficiency and anonymity the owner of that address wants. If the value of the prefix length field is zero, then the prefix field can be ignored entirely, giving maximum anonymity. On the other extreme, if the value of the prefix length field is 32 (the maximum), then the full 32 bits of the prefix should be used, giving maximum efficiency in retrieving data. Each additional bit used halves the proportion of the total data set that you will have to request in order to receive data destined for you.

All msgs have a 'prefix nonce' as part of their unencrypted data. This prefix nonce is always of a fixed length (32 bits) to avoid leaking information. When sending a msg to an address, the sending client examines the destination address's pubkey and sets the first n bits of the prefix nonce to match the prefix found in the pubkey. The remainder of the prefix nonce is set randomly.

When a lite client wishes to retrieve msgs from the network, it makes a request to a server for all msgs which have a prefix nonce where the first n bits matches the prefix of the lite client's address(s).

Pubkey object

Under this proposal, the encrypted part of a pubkey object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#pubkey.

Field Size Description Data type Comments
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
1 prefix_length uint8_t The number of bits from the prefix value that should be used. Must be in the range 0-32.
4 prefix uint32_t The prefix value. Contains 32 randomly set bits.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the object header starting with the time, then appended with the decrypted data down to the extra_bytes. This was changed in protocol v3.

Msg object

Under this proposal, a msg object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#msg.

Field Size Description Data type Comments
4 prefixNonce uint32_t The prefix nonce.
? encrypted uchar[] Encrypted data. See Encrypted payload. See also Unencrypted Message Data Format


Examples

Example 1

If Alice does not care about anonymity, she can have her address's pubkey specify that the full 32 bits of the prefix field should be used. This means that all msgs sent to her will effectively be uniquely tagged for her address (there are roughly 4 billion possible prefix values). This will allow Alice to only download msgs that are destined for her, giving her maximum usability and convenience at the expense of anonymity.

Example 2

If Bob wants to achieve a balance between anonymity and convenience, he can have his address's pubkey specify that 4 bits of the prefix field should be used, e.g 0110. This means that all msgs sent to his address will have a prefix nonce beginning with 0110. A random distribution of prefixes in a stream should mean that roughly 6.25% of msgs in a stream will have a prefix nonce beginning with 0110. Therefore Bob will be able to download roughly 6.25% of the msgs in a stream and still be guaranteed to receive all msgs destined for him.

Example 3

If Carol wants maximum anonymity then she can have her address's pubkey specify that zero bits of the prefix field should be used. This means that msgs sent to her will contain no identifying information, only a totally random prefix nonce. Carol will have to download and attempt to decrypt all msgs in a stream in order to be sure of receiving all msgs destined for her.


Notes

Other object types

Prefix filtering is not required for broadcasts, pubkeys, or getpubkeys. These object types are already tagged with an identifier.

Lite client pubkey retrieval

How can lite clients retrieve pubkeys without telling the servers who they are talking to?
In the short to medium term (until streams are implemented), it will be perfectly viable for lite clients to simply download all pubkeys in stream 1. A pubkey is 392 bytes in size. If we assume an average of 5 addresses per user, we could support a 10x increase in the number of active users (from roughly 1,000 to 10,000), and lite clients would still only have to download 19MB worth of pubkey data. This is only a rough approximation, but it seems reasonably clear that the amount of data to download and store would be acceptable.

Lite client pubkey dissemination

How can lite clients respond to getpubkey requests?
Lite clients can do this in two ways:
1. They can make periodic requests to full node servers to check for any getpubkeys for any of their addresses and then respond to any getpubkeys they receive by re-disseminating the relevant pubkeys.
2. They can periodically re-disseminate the pubkeys for each of their addresses, with the re-dissemiantion period determined by the time to live of the pubkeys they create. This will ensure that the pubkeys for all the lite client's addresses are always available in the stream(s) they are a part of.

Non-hashed addresses

If non-hashed addresses are introduced in the future, this scheme will not apply to them, as they will not have pubkeys. Users of non-hashed addresses will still need to download and attempt to decrypt all msgs in each stream in which they have an address. This makes sense though, as the whole point of introducing non-hashed addresses is to give people the ability to choose greater anonymity at the expense of convenience.

Streams

The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.

Possible refinement: Daily changing prefix values

XXX