Difference between revisions of "Lite Client Message Retrieval Proposal"

From Bitmessage Wiki
Jump to navigation Jump to search
(Added note about streams)
 
(6 intermediate revisions by the same user not shown)
Line 24: Line 24:
 
== Proposed changes ==
 
== Proposed changes ==
  
The pubkey of every hashed address contains two new fields: a 'prefix mask' field and a 'prefix' field. The value of the prefix mask field is determined by the owner of the pubkey. The prefix field is set randomly. The value of the prefix mask field defines what balance of efficiency and anonymity the owner of that address wants. If the owner of the pubkey wants maximum anonymity, then all the bits in the prefix mask field will be set to 0, allowing the prefix to be ignored entirely. On the other extreme, if the owner of the pubkey wants maximum efficiency in retrieving data at the expense of anonymity then all bits in the prefix mask field will be set to 1. Each additional bit set to one halves the proportion of the total data set that you will have to request in order to receive data destined for you. The positions of the bits set to 1 in the prefix mask field are selected randomly.  
+
The pubkey of every hashed address contains two new fields: a 'prefix length' field and a 'prefix' field. The value of the prefix length field is defined by the owner of the pubkey. The prefix field is set randomly. The value of the prefix length field defines what balance of efficiency and anonymity the owner of that address wants. If the value of the prefix length field is zero, then the prefix field can be ignored entirely, giving maximum anonymity. On the other extreme, if the value of the prefix length field is 32 (the maximum), then the full 32 bits of the prefix should be used, giving maximum efficiency in retrieving data. Each additional bit used halves the proportion of the total data set that you will have to request in order to receive data destined for you.  
  
All msgs have a 'prefix nonce' as part of their unencrypted data. This prefix nonce is always of a fixed length (32 bits) to avoid leaking information. When sending a msg to an address, the sending client examines the destination pubkey's prefix mask field and records which bit positions (if any) are set to 1. The sending client will then set the bits in the corresponding positions of the prefix nonce to match those found in the destination pubkey's prefix field. The remaining bits of the prefix nonce are set randomly.   
+
All msgs have a 'prefix nonce' as part of their unencrypted data. This prefix nonce is always of a fixed length (32 bits) to avoid leaking information. When sending a msg to an address, the sending client examines the destination address's pubkey and sets the first n bits of the prefix nonce to match the prefix found in the pubkey. The remainder of the prefix nonce is set randomly.   
  
When a lite client wishes to retrieve msgs from the network, it makes a request to a server for all msgs which have a prefix nonce where the bits set to 1 in the prefix mask of the receiving address's pubkey match the prefix field of the receiving pubkey.  
+
When a lite client wishes to retrieve msgs from the network, it makes a request to a server for all msgs which have a prefix nonce where the first n bits matches the prefix of the lite client's address(s).  
  
 
=== Pubkey object ===
 
=== Pubkey object ===
Line 39: Line 39:
 
| 4||[[Protocol_specification#Pubkey_bitfield_features|behavior bitfield]]||uint32_t||A bitfield of optional behaviors and features that can be expected from the node receiving the message.  
 
| 4||[[Protocol_specification#Pubkey_bitfield_features|behavior bitfield]]||uint32_t||A bitfield of optional behaviors and features that can be expected from the node receiving the message.  
 
|-
 
|-
| 4||prefix_mask||uint32_t||A bit mask that defines which bits from the prefix field should be used.  
+
| 1||prefix_length||uint8_t||The number of bits from the prefix value that should be used. Must be in the range 0-32.
 
|-
 
|-
 
| 4||prefix||uint32_t||The prefix value. Contains 32 randomly set bits.  
 
| 4||prefix||uint32_t||The prefix value. Contains 32 randomly set bits.  
Line 71: Line 71:
  
 
===Example 1===  
 
===Example 1===  
If Alice does not care about anonymity, she can have her address's pubkey specify that the full 32 bits of the prefix field should be used. This is done by setting all bits of the pubkey's prefix mask field to 1.  This means that all msgs sent to her will effectively be uniquely tagged for her address (there are roughly 4 billion possible prefix values). This will allow Alice to only download msgs that are destined for her, giving her maximum usability and convenience at the expense of anonymity.
+
If Alice does not care about anonymity, she can have her address's pubkey specify that the full 32 bits of the prefix field should be used. This means that all msgs sent to her will effectively be uniquely tagged for her address (there are roughly 4 billion possible prefix values). This will allow Alice to only download msgs that are destined for her, giving her maximum usability and convenience at the expense of anonymity.
  
 
===Example 2===  
 
===Example 2===  
If Bob wants to achieve a balance between anonymity and convenience, he can have his address's pubkey specify that 4 bits of the prefix field should be used. This is done by setting 4 randomly selected bits of the pubkey's prefix mask field to 1. This means that all msgs sent to his address will have a prefix nonce where the 4 bits in the positions set to 1 in the prefix mask will match the corresponding bits in his pubkey's prefix field. A random distribution of prefixes in a stream should mean that roughly 6.25% of msgs in a stream will have a prefix nonce where the bits in those positions match the values of those bits in his pubkey's prefix field. Therefore Bob will be able to download roughly 6.25% of the msgs in a stream and still be guaranteed to receive all msgs destined for him.  
+
If Bob wants to achieve a balance between anonymity and convenience, he can have his address's pubkey specify that 4 bits of the prefix field should be used, e.g 0110. This means that all msgs sent to his address will have a prefix nonce beginning with 0110. A random distribution of prefixes in a stream should mean that roughly 6.25% of msgs in a stream will have a prefix nonce beginning with 0110. Therefore Bob will be able to download roughly 6.25% of the msgs in a stream and still be guaranteed to receive all msgs destined for him.  
  
 
===Example 3===  
 
===Example 3===  
If Carol wants maximum anonymity then she can have her address's pubkey specify that zero bits of the prefix field should be used. This is done by setting all of the bits of the pubkey's prefix mask field to 0. This means that msgs sent to her will contain no identifying information, only a totally random prefix nonce. Carol will have to download and attempt to decrypt all msgs in a stream in order to be sure of receiving all msgs destined for her.
+
If Carol wants maximum anonymity then she can have her address's pubkey specify that zero bits of the prefix field should be used. This means that msgs sent to her will contain no identifying information, only a totally random prefix nonce. Carol will have to download and attempt to decrypt all msgs in a stream in order to be sure of receiving all msgs destined for her.
  
  
Line 99: Line 99:
  
 
===Streams===  
 
===Streams===  
This proposal is intended as a complementary system to streams (i.e. both should be implemented). It is not intended to replace them.
+
The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.
 +
 
 +
===Possible refinement: Daily changing prefix values===
 +
While the system described above gives a reasonable balance between anonymity and efficiency, it has some weaknesses. In particular, users sending messages to addresses which specify a non-zero prefix length value will tend to leak some information about who they are communicating with. By observing the prefix nonces of msgs sent by a particular client, an attacker may be able build up information about how many different addresses a given user is communicating with, who they are communicating with more often, and who they do not communicate with for certain periods of time.
 +
 
 +
One possible way to mitigate this problem would be to modify the prefix filtering system so that the prefix values used to 'tag' msgs vary according to a time value. Instead of tagging msgs with the first n bits of the prefix defined in the destination address's pubkey, msgs could be tagged with the first n bits of a hash value. The hash value would be calculated from the first n bits of the prefix concatenated with a time value. The result of this would be that the identifying bits of the prefix nonce of a msg would change every so often in a predictable way, with the frequency of change defined by the time value used.
 +
 
 +
For example, if the time value is defined by the protocol as 00:00 on the current day in Unix time, then the identifying prefix bits used to tag msgs will be different each day, in a way that can be easily and reliably calculated by both the sender and the receiver of msgs. The result of this is that rather than all msgs ever sent to a given address having the same identifying prefix bits, all msgs sent to a given address on a given day have the same identifying prefix bits.
 +
 
 +
A rough pseudocode description of the procedure to calculate the prefixNonce for a msg follows:
 +
unixTime = getUnixTime()
 +
remainderSeconds = unixTime mod 86,400
 +
timeValue = unixTime - remainderSeconds
 +
hash = sha512(sha512(prefix[0-prefixLength] + timeValue))
 +
identifyingBits = hash[0-prefixLength]
 +
prefixNonce = identifyingBits + randomBits
 +
 
 +
Advantages:
 +
* The information leaked by a client sending msgs to addresses which specify a non-zero prefix length value is greatly diminished, as the identifying bits in the prefix nonces of sent msgs change each day. Some information is still leaked, particularly when an attacker already knows the destination address(es) of sent msgs and can therefore calculate what the identifying prefix bits will be each day, but the overall amount of information leaked is substantially reduced.
 +
 
 +
Disadvantages:
 +
* When catching up with the network, lite clients must make a separate request for messages for each day in the time period that they need to check.

Latest revision as of 16:13, 4 January 2015

Introduction

This is a proposal to introduce a mechanism into the Bitmessage protocol that will allow 'lite' Bitmessage clients to retreive messages that have been sent to them without downloading all the messages in a stream.

Summary of the proposal

  • Lite clients can determine which messages they need to dowload using 'prefix filters'.
  • Prefix filters allow the owner of each Bitmessage address to determine the balance between anonymity and efficiency that they will have when retrieving messages.
  • Users who wish to retain the highest level of anonymity (by downloading all messages in a stream) are not affected by this change. Messages to them are 'tagged' with completely random data.
  • Users who do not care about anonymity can specify a very precise prefix for each of their addresses, allowing them to download only messages destined for them.
  • Other users can achieve a balance between anonymity and convenience.


Reasoning behind the proposal

Currently, Bitmessage clients such as PyBitmessage act as full nodes in the Bitmessage peer-to-peer network. This involves transmitting, receiving, and processing a large amount of data. This workload may be manageable for more powerful devices, such as laptops, desktops, and servers, but it is typically too great for less powerful devices such as mobile phones and tablets to reasonably handle. This is particularly true in cases where mobile internet connections mean that bandwidth is very expensive. Because of this, a high proportion of the computing devices in use today are not suitable to act as full nodes in the Bitmessage peer-to-peer network. Therefore, in order for the users of these devices to be able to use Bitmessage, there must be some way of doing so that does not involve their device acting as a full node in the network.

One approach to solving this problem is the creation of 'lite' clients. These are Bitmessage clients which do not do all the processing and transmission work that full nodes do. Instead, the burden of this work is shifted to servers, which supply lite clients with the data they need and relay data sent by lite clients to the rest of the Bitmessage network. Thankfully the design of Bitmessage allows this to be implemented in such a way that very little security is lost. All message encryption and decryption can be done locally by the lite client, and it can retain sole control over its addresses and cryptographic keys. Data provided by servers can be cryptographically authenticated as being correct, and data sent to servers can be disseminated to several at once to ensure that it reaches the rest of the network.

However, this arrangement does require one significant compromise. In order for 'lite' clients such as mobile phones to receive messages, there needs to be some way for them to identify messages that have been sent to their addresses. Full nodes in Bitmessage do not require this because they receive and automatically attempt to decrypt all messages in the streams (segments of the network) which they are a part of. As described above however, the work required for this approach is too great for many devices, and therefore there must be some way for lite clients to identify which messages are destined for them, so that they can request those messages from a server.


Proposed changes

The pubkey of every hashed address contains two new fields: a 'prefix length' field and a 'prefix' field. The value of the prefix length field is defined by the owner of the pubkey. The prefix field is set randomly. The value of the prefix length field defines what balance of efficiency and anonymity the owner of that address wants. If the value of the prefix length field is zero, then the prefix field can be ignored entirely, giving maximum anonymity. On the other extreme, if the value of the prefix length field is 32 (the maximum), then the full 32 bits of the prefix should be used, giving maximum efficiency in retrieving data. Each additional bit used halves the proportion of the total data set that you will have to request in order to receive data destined for you.

All msgs have a 'prefix nonce' as part of their unencrypted data. This prefix nonce is always of a fixed length (32 bits) to avoid leaking information. When sending a msg to an address, the sending client examines the destination address's pubkey and sets the first n bits of the prefix nonce to match the prefix found in the pubkey. The remainder of the prefix nonce is set randomly.

When a lite client wishes to retrieve msgs from the network, it makes a request to a server for all msgs which have a prefix nonce where the first n bits matches the prefix of the lite client's address(s).

Pubkey object

Under this proposal, the encrypted part of a pubkey object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#pubkey.

Field Size Description Data type Comments
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
1 prefix_length uint8_t The number of bits from the prefix value that should be used. Must be in the range 0-32.
4 prefix uint32_t The prefix value. Contains 32 randomly set bits.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the object header starting with the time, then appended with the decrypted data down to the extra_bytes. This was changed in protocol v3.

Msg object

Under this proposal, a msg object would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#msg.

Field Size Description Data type Comments
4 prefixNonce uint32_t The prefix nonce.
? encrypted uchar[] Encrypted data. See Encrypted payload. See also Unencrypted Message Data Format


Examples

Example 1

If Alice does not care about anonymity, she can have her address's pubkey specify that the full 32 bits of the prefix field should be used. This means that all msgs sent to her will effectively be uniquely tagged for her address (there are roughly 4 billion possible prefix values). This will allow Alice to only download msgs that are destined for her, giving her maximum usability and convenience at the expense of anonymity.

Example 2

If Bob wants to achieve a balance between anonymity and convenience, he can have his address's pubkey specify that 4 bits of the prefix field should be used, e.g 0110. This means that all msgs sent to his address will have a prefix nonce beginning with 0110. A random distribution of prefixes in a stream should mean that roughly 6.25% of msgs in a stream will have a prefix nonce beginning with 0110. Therefore Bob will be able to download roughly 6.25% of the msgs in a stream and still be guaranteed to receive all msgs destined for him.

Example 3

If Carol wants maximum anonymity then she can have her address's pubkey specify that zero bits of the prefix field should be used. This means that msgs sent to her will contain no identifying information, only a totally random prefix nonce. Carol will have to download and attempt to decrypt all msgs in a stream in order to be sure of receiving all msgs destined for her.


Notes

Other object types

Prefix filtering is not required for broadcasts, pubkeys, or getpubkeys. These object types are already tagged with an identifier.

Lite client pubkey retrieval

How can lite clients retrieve pubkeys without telling the servers who they are talking to?
In the short to medium term (until streams are implemented), it will be perfectly viable for lite clients to simply download all pubkeys in stream 1. A pubkey is 392 bytes in size. If we assume an average of 5 addresses per user, we could support a 10x increase in the number of active users (from roughly 1,000 to 10,000), and lite clients would still only have to download 19MB worth of pubkey data. This is only a rough approximation, but it seems reasonably clear that the amount of data to download and store would be acceptable.

Lite client pubkey dissemination

How can lite clients respond to getpubkey requests?
Lite clients can do this in two ways:
1. They can make periodic requests to full node servers to check for any getpubkeys for any of their addresses and then respond to any getpubkeys they receive by re-disseminating the relevant pubkeys.
2. They can periodically re-disseminate the pubkeys for each of their addresses, with the re-dissemiantion period determined by the time to live of the pubkeys they create. This will ensure that the pubkeys for all the lite client's addresses are always available in the stream(s) they are a part of.

Non-hashed addresses

If non-hashed addresses are introduced in the future, this scheme will not apply to them, as they will not have pubkeys. Users of non-hashed addresses will still need to download and attempt to decrypt all msgs in each stream in which they have an address. This makes sense though, as the whole point of introducing non-hashed addresses is to give people the ability to choose greater anonymity at the expense of convenience.

Streams

The system described in this proposal is intended to be complementary to streams (i.e. both should be implemented). It is not intended to replace them.

Possible refinement: Daily changing prefix values

While the system described above gives a reasonable balance between anonymity and efficiency, it has some weaknesses. In particular, users sending messages to addresses which specify a non-zero prefix length value will tend to leak some information about who they are communicating with. By observing the prefix nonces of msgs sent by a particular client, an attacker may be able build up information about how many different addresses a given user is communicating with, who they are communicating with more often, and who they do not communicate with for certain periods of time.

One possible way to mitigate this problem would be to modify the prefix filtering system so that the prefix values used to 'tag' msgs vary according to a time value. Instead of tagging msgs with the first n bits of the prefix defined in the destination address's pubkey, msgs could be tagged with the first n bits of a hash value. The hash value would be calculated from the first n bits of the prefix concatenated with a time value. The result of this would be that the identifying bits of the prefix nonce of a msg would change every so often in a predictable way, with the frequency of change defined by the time value used.

For example, if the time value is defined by the protocol as 00:00 on the current day in Unix time, then the identifying prefix bits used to tag msgs will be different each day, in a way that can be easily and reliably calculated by both the sender and the receiver of msgs. The result of this is that rather than all msgs ever sent to a given address having the same identifying prefix bits, all msgs sent to a given address on a given day have the same identifying prefix bits.

A rough pseudocode description of the procedure to calculate the prefixNonce for a msg follows:

unixTime = getUnixTime()
remainderSeconds = unixTime mod 86,400
timeValue = unixTime - remainderSeconds
hash = sha512(sha512(prefix[0-prefixLength] + timeValue))
identifyingBits = hash[0-prefixLength]
prefixNonce = identifyingBits + randomBits

Advantages:

  • The information leaked by a client sending msgs to addresses which specify a non-zero prefix length value is greatly diminished, as the identifying bits in the prefix nonces of sent msgs change each day. Some information is still leaked, particularly when an attacker already knows the destination address(es) of sent msgs and can therefore calculate what the identifying prefix bits will be each day, but the overall amount of information leaked is substantially reduced.

Disadvantages:

  • When catching up with the network, lite clients must make a separate request for messages for each day in the time period that they need to check.