Bitmessage Protocol Version 3

From Bitmessage Wiki
Revision as of 03:53, 12 June 2016 by Bmng-dev (talk | contribs) (Cleaned and updated Protocol v3 for posterity)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Common standards

Hashes

Most of the time SHA-512 hashes are used, however RIPEMD-160 is also used when creating an address.

A double-round of SHA-512 is used for the Proof Of Work. Example of double-SHA-512 encoding of string "hello":

hello
9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043(first round of sha-512)
0592a10584ffabf96539f3d780d776828c67da1ab5b169e9e8aed838aaecc9ed36d49ff1423c55f019e050c66c6324f53588be88894fef4dcffdb74b98e2b200(second round of sha-512)

For Bitmessage addresses (RIPEMD-160) this would give:

hello
9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043(first round is sha-512)
79a324faeebcbf9849f310545ed531556882487e (with ripemd-160)


Common structures

All integers are encoded in big endian

Message structure

Field Size Description Data type Comments
4 magic uint32_t Magic value indicating message origin network, and used to seek to next message when stream state is unknown
12 command char[12] ASCII string identifying the packet content, NULL padded (non-NULL padding results in packet rejected)
4 length uint32_t Length of payload in number of bytes. The maximum allowed value is 1,600,003 bytes
4 checksum uint32_t First 4 bytes of sha512(payload)
? message_payload uchar[] The actual data, a message. Not to be confused with objectPayload.

Known magic values:

Magic value Sent over wire as
0xE9BEB4D9 E9 BE B4 D9

Variable length integer

An integer can be encoded depending on the represented value to save space. Variable length integers always precede an array/vector of a type of data that may vary in length. Varints MUST use the minimum possible number of bytes to encode a value. For example; the value 6 can be encoded with one byte therefore a varint that uses three bytes to encode the value 6 is malformed and the decoding task must be aborted.

Value Storage length Format
< 0xfd 1 uint8_t
<= 0xffff 3 0xfd followed by the integer as uint16_t
<= 0xffffffff 5 0xfe followed by the integer as uint32_t
- 9 0xff followed by the integer as uint64_t

Variable length string

A variable length string can be stored using a variable length integer to encode the length followed by the string itself.

Field Size Description Data type Comments
1+ length var_int Length of the string
length string char[] The string itself (can be empty)

Variable length list of integers

n integers can be stored using n+1 variable length integers where the first var_int equals n.

Field Size Description Data type Comments
1+ count var_int Number of var_ints below
1+ var_int The first value stored
1+ var_int The second value stored...
1+ var_int etc...

Network address

When a network address is needed somewhere, this structure is used. Network addresses are not prefixed with a timestamp or stream in the version message.

Field Size Description Data type Comments
8 time uint64 the Time.
4 stream uint32 Stream number for this node
8 services uint64_t same service(s) listed in version
16 IPv6/4 char[16] IPv6 address. IPv4 addresses are written into the message as a 16 byte IPv4-mapped IPv6 address

(12 bytes 00 00 00 00 00 00 00 00 00 00 FF FF, followed by the 4 bytes of the IPv4 address). Hidden Service addresses can be represented as an IPv6 address with a 48-bit routing prefix of fd87:d87e:eb43::48 under the Unique Local Address block (fc00::/7) with the remaining 10 bytes being the Base256 encoding of the Hidden Service address

2 port uint16_t port number

Inventory Vectors

Inventory vectors are used for notifying other nodes about objects they have or data which is being requested. Two rounds of SHA-512 are used, resulting in a 64 byte hash. Only the first 32 bytes are used; the remaining 32 bytes are ignored.

Inventory vectors consist of the following data format:

Field Size Description Data type Comments
32 hash char[32] Hash of the object

Envelope

Bitmessage uses ECIES to encrypt its messages. For more information see Encryption

Plain Envelope

Field Size Description Data type Comments
16 IV uchar[16] Initialization Vector used for AES-256-CBC
2 elliptic curve uint16_t Elliptic Curve secp256k1. This is the NID (numerical identifier) 714 (0x02CA) assigned by OpenSSL to represent secp256k1
2 X length uint16_t Length of X component of public key R
X length X uchar[X length] X component of public key R
2 Y length uint16_t Length of Y component of public key R
Y length Y uchar[Y length] Y component of public key R
? encrypted uchar[] Cipher text
32 mac uchar[32] HMACSHA256 Message Authentication Code

Tagged Envelope

A tagged envelope is identical to an plain envelope but prepended with a tag. Tagged envelopes are only used by v4 pubkeys and v5 broadcasts.

Field Size Description Data type Comments
32 tag uchar[32] The recipients tag
1+ envelope plain_envelope


Enumerations and Flags

Message Encodings

Value Name Description
0 IGNORE Any data with this number may be ignored. The sending node might simply be sharing its public key with you.
1 TRIVIAL UTF-8. No 'Subject' or 'Body' sections. Useful for simple strings of data, like URIs or magnet links.
2 SIMPLE UTF-8. Uses 'Subject' and 'Body' sections. No MIME is used.

messageToTransmit = 'Subject:' + subject + '\n' + 'Body:' + message

3 EXTENDED A data structure in bencode, then compressed with zlib. Null data type is encoded as an empty string, and booleans as an integer 0 (false) or 1 (true). Text fields are encoded using UTF-8. v5 and newer address versions MUST support this. Proposal, exact structure pending standardisation.

Further values for the message encodings can be decided upon by the community. Any MIME or MIME-like encoding format, should they be used, should make use of Bitmessage's 8-bit bytes.

Identity bitfield features

Bit Name Description
0 undefined The most significant bit at the beginning of the structure. Undefined
1 undefined The next most significant bit. Undefined
... ... ...
30 include_destination Receiving node expects that the RIPE hash encoded in their address preceedes the encrypted message data of msg messages bound for them.
31 does_ack If true, the receiving node does send acknowledgements (rather than dropping them).

Node services

The following services are currently assigned:

Value Name Description
1 NODE_NETWORK This is a normal network node.
2 NODE_SSL This node supports SSL/TLS in the current connect

Object Types

Value Name Description
0 getpubkey
1 pubkey
2 msg A msg or msg ack
3 broadcast

Error Levels

Value Name Description
0 WARNING
1 ERROR
2 FATAL A fatal or fatal-like error has occured. The connection usually terminated following this error.


Message types

Undefined messages received on the wire must be ignored.

error

The only error PyBitmessage sends is a FATAL error when it receives a version message where the timestamp is out by more than hour from its own

Field Size Description Data type Comments
1+ error level var_int The error level of this error
1+ ban time var_int The length of time the emitting node will refuse connections from the receiving node
1+ inv_vector_length var_int The length of the inventory vector (max: 100) Inventory vectors are fixed length at 32 bytes. Why does the size need to specified? Perhaps this should be a boolean value to indicate the presence of an inventory vector instead?
inv_vector_length inv_vecter inv_vect The inventory vector of the offending object this error relates to
1+ errorText var_str The error text (max length: 1000)

version

When a node creates an outgoing connection, it will immediately advertise its version. The remote node will respond with its version. No further communication is possible until both peers have exchanged their version. A PyBitmessage server responds with verack then version. A bmd server responds with version then verack

Field Size Description Data type Comments
4 version int32_t Identifies protocol version being used by the node. The current protocol version is 3. Nodes should disconnect if the remote node's version is lower but continue with the connection if it is higher. What is the intent here?
8 services uint64_t bitfield of features to be enabled for this connection
8 timestamp int64_t standard UNIX timestamp in seconds
26 addr_recv net_addr The network address of the node receiving this message (not including the time or stream number)
26 addr_from net_addr The network address of the node emitting this message (not including the time or stream number and the ip itself is ignored by the receiver)
8 nonce uint64_t Random nonce used to detect connections to self.
1+ user_agent var_str User Agent generally in the form of /Application:Version/ (max length: 5000)
1+ stream_numbers var_int_list The stream numbers that the emitting node is interested in. Sending nodes must not include more than 160,000 stream numbers.

A "verack" packet shall be sent if the version packet was accepted. Once you have sent and received a verack messages with the remote node, send an addr message advertising up to 1,000 peers of which you are aware, and one or more inv messages advertising all of the valid objects of which you are aware.

verack

This message is sent in reply to version and has no payload. The TCP timeout starts out at 20 seconds; after verack messages are exchanged, the timeout is raised to 10 minutes.

If both sides announce that they support SSL, they MUST perform a SSL handshake immediately after they both send and receive verack. During this SSL handshake, the TCP client acts as a SSL client, and the TCP server acts as a SSL server. PyBitmessage v0.5.4 or later requires the AECDH-AES256-SHA cipher over TLSv1, and prefers the secp256k1 curve (but other curves may be accepted, depending on the version of python and OpenSSL used).

addr

Provide information on known nodes of the network. Only nodes that have been known to be on the network in the last 3 hours should be advertised. This command is easily abused and any entries should be treated as unreliable

Field Size Description Data type Comments
1+ count var_int Number of address entries (max: 1,000)
38 x count list of net_addr net_addr[] Address of other nodes on the network.

inv

Allows a node to advertise its knowledge of one or more objects.

Field Size Description Data type Comments
1+ count var_int Number of inventory entries (max: 50,000)
32 x count list of inv_vect inv_vect[] Inventory vectors

getdata

getdata is used in response to an inv message to retrieve the content of a specific object after filtering known elements.

Field Size Description Data type Comments
1+ count var_int Number of inventory entries (max: 50,000)
32 x count list of inv_vect inv_vect[] Inventory vectors

Current usage reveals getdata to only ever contain 1 entry

object

An object is a message which is shared throughout a stream. It is the only message which propagates; all others are only between two nodes. Objects have a type, like 'msg', or 'broadcast'. To be a valid object, the Proof Of Work must be done. The maximum allowable length of an object (not to be confused with the objectPayload) is 218 bytes (256 KiB).

Field Size Description Data type Comments
8 nonce uint64_t A nonce that satisfies the Proof Of Work
8 expiresTime uint64_t The "end of life" time of this object. Objects shall be shared with peers until its end-of-life time has been reached. The node should store the inventory vector of that object for some extra period of time to avoid reloading it from another node with a small time delay. The time may be no further than 28 days + 3 hours in the future.
4 objectType uint32_t The object type. Nodes should relay objects even if they use an undefined object type.
1+ version var_int The object's version.
1+ stream number var_int The stream number in which this object may propagate
? objectPayload uchar[] This field varies depending on the object type; see below.


Object types

Here are the payloads for various object types.

getpubkey

When a node has the hash of a public key (from an address) but not the public key itself, it must send out a request for the public key.

v2 and v3 getpubkey

Field Size Description Data type Comments
20 ripe uchar[20] The ripemd hash of the public key

v4 getpubkey

Field Size Description Data type Comments
32 tag uchar[32] The tag derived from the address version, stream number, and ripe


pubkey

v2 pubkey

This is still in use and supported by current clients but new v2 addresses are not generated by clients.

Field Size Description Data type Comments
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
64 public signing key uchar[64] The ECC public key used for signing in uncompressed format without the point compression prefix
64 public encryption key uchar[64] The ECC public key used for encryption in uncompressed format without the point compression prefix

v3 pubkey

Field Size Description Data type Comments
132 public keys v2 pubkey This is the same three fields as a v2 pubkey
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ signature var_str The ECDSA signature covering this structure prepended with the object header (excluding the nonce). The signature is actually two signed integers r and s encoded in ASN.1 according to DER

v4 pubkey

Field Size Description Data type Comments
? envelope tagged_envelope Encrypted pubkey data

A decrypted v4 pubkey is identical to a v3 pubkey (except the version is 4).

When version 4 pubkeys are created, most of the data in the pubkey is encrypted. This is done in such a way that only someone who has the Bitmessage address which corresponds to a pubkey can decrypt and use that pubkey. This prevents people from gathering pubkeys sent around the network and using the data from them to create messages to be used in spam or in flooding attacks.

In order to encrypt the pubkey data, a double SHA-512 hash is calculated from the address version number, stream number, and ripe hash of the Bitmessage address that the pubkey corresponds to. The first 32 bytes of this hash are used to create a public and private key pair with which to encrypt and decrypt the pubkey data, using the same algorithm as message encryption (see Encryption). The remaining 32 bytes of this hash are added to the unencrypted part of the pubkey and used as a tag, as above. This allows nodes to determine which pubkey to decrypt when they wish to send a message.

In PyBitmessage, the double hash of the address data is calculated using the python code below:

doubleHashOfAddressData = hashlib.sha512(hashlib.sha512(encodeVarint(addressVersionNumber) + encodeVarint(streamNumber) + hash).digest()).digest()


msg

Used for person-to-person messages.

v1 msg

Field Size Description Data type Comments
? envelope plain_envelope Encrypted msg data

Decrypted msg

Field Size Description Data type Comments
1+ sender pubkey Sender's pubkey. A v4 pubkey should be not be encrypted
20 destination ripe uchar[20] The ripe hash of the public key of the receiver of the message
1+ encoding var_int Message encoding
1+ message var_str The message encoded as per encoding
1+ ack_data var_str The acknowledgement data to be transmitted. This is a fully qualified object with Proof of Work completed
1+ signature var_str The ECDSA signature covering this structure prepended with the object header (excluding the nonce). The signature is actually two signed integers r and s encoded in ASN.1 according to DER

msg ack

A special form of msg used as an acknowledgement receipt. The objectType and version fields in the object header are set exactly the same as for a v1 msg

Field Size Description Data type Comments
32 ack data uchar[32] A random sequence of bytes that the sender waits for as an indication that the recipient has received their msg


broadcast

Users who are subscribed to the sending address will see the message appear in their inbox.

Pubkey objects and v5 broadcast objects are encrypted the same way: The data encoded in the sender's Bitmessage address is hashed twice. The first 32 bytes of the resulting hash constitutes the "private" encryption key and the last 32 bytes constitute a tag so that anyone listening can easily decide if this particular message is interesting. The sender calculates the public key from the private key and then encrypts the object with this public key. Thus anyone who knows the Bitmessage address of the sender of a broadcast or pubkey object can decrypt it.

Having a broadcast version of 5 indicates that a tag is used which, in turn, is used when the sender's address version is >=4.

v4 broadcast

Field Size Description Data type Comments
? envelope plain_envelope

v5 broadcast

Field Size Description Data type Comments
? envelope tagged_envelope

Decrypted broadcast

A decrypted broadcast is nearly identical to a decrypted msg. The decrypted broadcast does not have destination ripe field nor an acknowlegement field.

Field Size Description Data type Comments
1+ sender pubkey Sender's pubkey. A v4 pubkey should be not be encrypted
1+ encoding var_int Message encoding
1+ message var_str The message encoded as per encoding
1+ signature var_str The ECDSA signature covering this structure prepended with the object header (excluding the nonce). The signature is actually two signed integers r and s encoded in ASN.1 according to DER