Argothas Architecture Proposal
Contents
Introduction
After a lot of reading and thinking, I have an idea that might help resolve some of the issues that we are encountering. I am calling the collection of these the new "system", however it is more a change of both architecture and protocol. The proposed system is still very much an idea forming so please bear with me.
This post has used ideas from:
- BitBoards
- Message tagging proposal
- The non-stream proposal for streams
- Onion routing
One of the main issues we are encountering is the ability to make the BM protocol smartphone friendly whilst keeping the recipient hidden. I beleive this is flawed as the it requires a user to trust the server they are using as their bitboard. The operator of a BitBoard will have vision over all messages sent to any of their addresses, and although these messages are encrypted, vision of this data will reveal some information about the user.
The Proposed System
Overview
The system offers the following advantages:
- Scalability
- Spam protection
- Distributed trust of server (i.e. not trusting a single server)
- The ability for businesses (and individuals) to operate their own servers
The system makes the following trade-offs:
- Complete anonyminity of recpient, the ability to "check" for messages
The system has the following diadvantages:
- See trade-offs
- ???
Custom Notation
KEY1, KEY2 -> DATA1, DATA2 = Table of keys mapped to data
[TYPE | FIELD1, FIELD2] = Message with type and fields
E:OBJECT = Encrypted object
S:OBJECT = Signed object (i.e. tamper-proof)
IP(MACHINE) = IP address of a machine
TO>> FROM MESSAGE = message transmission
HASH(OBJECT) = Hash of an object
BMADDR = BitMessage Address
BMMESSAGE = BitMessage style message
NMESSAGE = Notification message
MTAG = Message tag
C = Client
S = Sender
R = Reciever
TTL = Time to live
TTD = Time at which point to die
NONCE = Random data
PERIOD = Period of time in seconds
POWVAL = Proof of work value (as described in the POW description)
System Description
The Building Blocks
The new system is designed in a much more client/server architecture (with similarities to hidden services in TOR), and is done so to enable light-weight clients (lite clients). Whilst I have seperated out clients and servers, they would likely be able to be merged, although I have not thought of the ramifactions of merging them. I have also seperated out server roles, this is to help define roles and processes, agian they may be able to be merged.
Data Server (DS) - These hold the actual messages
Introduction Server (IS) - These hold notifcation messages
Directory Server (DIR) - These hold information about registered servers for the use of clients to connect to their desired service.
New Address Process
- The client creates a new BMADDR as per usual
- The client connects to a DIR to request a list of known ISs
- The client selects a number of ISs that it wishes to use
- The client connects to each IS and requests to use it
- The IS accepts or rejects the request
- On denial the client should add another IS to its pool
- On acceptance the client prepares to store the IS, TTD, NONCE, PERIOD and POWVAL with its BMADDR
- The client publishes its BMADDR and IS data (to a DIR)
Why do we use multiple ISs? There are two reasons, the first is redundancy. If a sender cannot connect to one IS is can choose another (or many others), also if an IS is down when a client attempts to recieve, it messages sent to multiple IS can be recieved. The second reason stop a single IS from seeing all messaes that are sent, by distrbuting the "load" among many. A business could operate its own IS, only accepting requests from employees, employees would only need to use the one IS for their BMADDR.
What is the data being stored with the BMADDR? For a sender to send messages to the reciever, the sender must know which ISs the reciever is using. So the IP of the IS is stored. The TTD indicated at what point in time this IS should stop being used. The NONCE is used in the calculation of the MTAG for this IS. This is so that if the same NMESSAGE is sent to multiple ISs it will have different MTAGs and as such reduce the amount of information available. PERIOD is used in the MTAG calculation. The current time should be modded with the PERIOD, with this result being subtracted from the current time. This has been described in the MTAG proposal. The reason for including this data is it allows a cleint to specify different PERIODs for different ISs.
Possible Attack: An attacker that wishes to bring attention to a particular MTAG (and hence potentially the BMADDR) could send messages to all of its listed ISs. The operators of each IS would then be able to see a large volume of messages destined for one MTAG.
How does the IS and the client negotiate use? The client connects to the IS and provides it with a passphrase (of preset number of bytes). Based on this passphrase a server can accept or deny the request.
Possible Attack: A malicious IS operator may only accept one request. As a result the number of NMESSAGEs sent to this IS will be aproximately equal to the number of NMESSAGEs sent to the single client (aproxiately, as junk messages can be sent and will be accepted). This is why it is important for a client to use multiple ISs
Possible Attack: A malicious IS opertor may only accept one request. the operator can then monitor the IP address of any connections. This is why onion routing is encourage (to be discussed later)
Possible Attacks: Other possible attacks using knowledge obtained from connections and MTAGs have been discussed in the MTAG proposal thread. Most of these can be mitigated through the use of onion routing and VPNs.
Possible Attack: A malicious IS operator may accept requests but then delete any NMESSAGEs sent to the IS. This is why it is important for a client to use multiple ISs
Why isnt the server notified of the TTD? In order to protect the clients, the server does hold any information it can use to identify the clients (except possibly client specific passwords). By ensuring this, the IS cannot make assocatations between MTAGs over a period of time (except in the case that a static IP is used). As a result, the server cannot determine what MTAGs are for a particular client and has no way to selectively remove them.
If the server can't stop messages for a particular client, how can it prevent itself from being DOSed (potentially through normal use)? As shown later, the server will indicate the minimum proof of work value that it will accept. Thus a server under highload may continue to increase its POW until it can sustain its use. Clients should be aware of this and add more IS servers to its pool if it wishes to keep a low POW for some of its senders.
New Message Process
- The sender creates the BMMESSAGE
- The sender connects to a DIR to get a list of DSs
- The sender selects a DS (*) and requests to store the size of the BMMESSAGE on it
- The DS accepts or denies the request
- On denial, the sender must select a new DS
- On acceptance the DS replues with its minimum POWVAL
- The sender calculates POW and uploads the encrypted BMESSAGE to the DS
- The sender prepares the NMESSAGE
- The sender connects to a DIR to collect the Rs ISs
- The sender selects one or more (but prefably not all) of the ISs
- For each IS, the sender connects and requests the ISs POWVAL
- The sender calculates the POW and uploads the NMESSAGE
Why seperate DS and IS? This has been done as an IS has an obligation to store every NMESSAGE it recieves. If it were to also store the BMESSAGEs it could rapidly run out of room for NMESSAGEs. It also prevent the operator of a node to guess teh size of a BMESSAGE being sent to a particular address.
* Variation: Instead of using one DS a sender could use multiple, this creates redundancy. This would require the content of the NMESSAGE to map multiple DSs to portnetially different hashes of the same message.
What is a NMESSAGE? This iss similar to a BMMESSAGE in terms of security, however its content is in a strict format. The content of the message will tell the reciever which HASH to request from each DS. It will also reveal the MTAG
Variation: The format of the NMESSAGE could allow the sender to bundle multiple BMESSAGESs into one NMESSAGE. This would disguise any multipart messages. It could also be used, by a client/server that only sends messages periodically and bundles where it can. This would allow a sender to obscure that they are sending multipart messages (see below).
Possible attack: By examining the size of the NMESSAGE, an attacker could information about the contents. For example, a larger NMESSAGE would indicate the possibility of a multipart message. This could be partially mitigated by including padding up to the average size. (Of course by increasing the the size of messages to the average size, will slowly increase the message size to the max size).
Possible attack:If the size of NMESSAGES is not fixed (in order to allow multipart messages, and redancy of parts), then an attacker could take advantage of this to use up an ISs resources. As such NMESSAGES must be a fixed size or must have an upperlimit on their size.
NMESSAGES vs BMMESSAGE: Based on creating a low maximum size for NMESSAGE to allow ISs to store a high volume of them, DSs could allow a much larger maximum size of BMMESSAGEs. Unlike the current network a 10mb message would not severly damage the overal network health, indeed it may not even effect the node that it is uploading to).
Recieving Messages
- Periodically the reciever connects to its ISs (*) and requests any NMESSAGEs for its MTAGS
- The reciever decrypts the NMESSAGE
- The reciever connects to the listed DSs to get the lsited BMMESSAGEs
* Variation: A IS could require that a valid server pasphrase is supplied in order to retrieve NMESSAGES. This comes at a cost to a users anonyminity as if different passphrases are supplied to different clients, a operator could the determine what MTAGs are destined for the same client. It would then be preferable for anyone wishing to operate a private server (e.g. businesses) to use BitMessage software that supports ?SOCKS?.
Possible attack: If a client does not using a changing IP address or reuses the same connection in its request for MTAGs, the IS (or DS in the case of hashes), would be able to determine that the objects are linked to the one address (and potentially IP).
Where does onion routing come into all of this?
With the above architecture, an attack can oobserve connections to the various servers to determine who is using it (and by examning the size of data exchanges, what they are using it for). (Note: it would be highly suggested that servers self publish their public key to a DIR so that clients can establish secure communications). In BitMessage, this was mitigated by all nodes transferring all data, irrespective of whether it was destined to them or not. This however does not scale well, and makes smartphone cleints prohibitive. This new system has a heavy need for onion routing, so that the IP address of a conenction cannot be used to identify a user, or group different MTAGs or BMMESSAGE hashes together.
This then asks an important question: Should clients know to use TOR (and hence it would be prefferable for the maintainers of client software to build in TOR), or should the BM network build its own onion routing network?.
Conclusion
Whilst I have done my best to nut out most of the questions, possibilities and issues I doubt I have covered them all. If I have missed anything feel free to point it out. Also, although this may seem like a proposal, it is a rather drastic change to the concept of BitMessage and should be up for lots of discussion.