Implement Dandelion message routing #1049

Closed
opened 2017-09-13 23:27:52 +02:00 by kewde · 9 comments
kewde commented 2017-09-13 23:27:52 +02:00 (Migrated from github.com)

When a BitMessage node receives a message, it will not instantly relay that message to its peers. Instead it will add a randomized delay. I'm not sure of the exact specifics (is this a delay for all peers? Or is the randomized delay unique for each peer?). It's a security measure to prevent statistical/timing attacks against adversaries.

However, there is a new approach on the block called Dandelion. Which was initially created for Bitcoin, but the same properties apply to the BitMessage network.

When a BitMessage node receives a message, it will not instantly relay that message to its peers. Instead it will add a randomized delay. I'm not sure of the exact specifics (is this a delay for all peers? Or is the randomized delay unique for each peer?). It's a security measure to prevent statistical/timing attacks against adversaries. However, there is a new approach on the block called [Dandelion]. Which was initially created for Bitcoin, but the same properties apply to the BitMessage network. [Dandelion]: https://arxiv.org/pdf/1701.04439.pdf
PeterSurda commented 2017-09-14 10:14:10 +02:00 (Migrated from github.com)

There's already a bitfield reserved for dandelion in the protocol specification.

There's already a bitfield reserved for dandelion in the protocol specification.
PeterSurda commented 2017-09-24 14:09:41 +02:00 (Migrated from github.com)
  • Dandelion bitfield definition and handling in version command
  • Dandelion probability config option as percentage int, 0 to disable
  • Route selector (per-epoch per-source static routes, don't send to the same one it came from)
  • New protocol command dinv, similar to inv. Probably the route selector would be triggered from this command
  • Track objects which are in stem phase. Using a singleton DandelionStems
  • Add stem phase handling to antiIntersectionDelay in getdata
  • Add handling of stem objects in data command
  • Add dinv sending to InvThread
  • Add downloading of stem objects to DownloadThread
  • Force redownload of objects in stem phase even if we already have them
  • Exclude stem objects from sendBigInv
  • Fluff trigger on probability, in dinv handler
  • Fluff trigger on timeout. Not sure where to put this one. In the worst case into the singleCleaner thread.
  • Fluff trigger on cycle detection. In inv/dinv, if we already have the object
  • Don't forget to delete the object from state.dandelion on each fluff trigger
  • Send dinv command for stem objects
  • route also for locally generated objects
- [ ] Dandelion bitfield definition and handling in version command - [ ] Dandelion probability config option as percentage int, 0 to disable - [ ] Route selector (per-epoch per-source static routes, don't send to the same one it came from) - [ ] New protocol command dinv, similar to inv. Probably the route selector would be triggered from this command - [ ] Track objects which are in stem phase. Using a singleton DandelionStems - [ ] Add stem phase handling to antiIntersectionDelay in getdata - [ ] Add handling of stem objects in data command - [ ] Add dinv sending to InvThread - [ ] Add downloading of stem objects to DownloadThread - [ ] Force redownload of objects in stem phase even if we already have them - [ ] Exclude stem objects from sendBigInv - [ ] Fluff trigger on probability, in dinv handler - [ ] Fluff trigger on timeout. Not sure where to put this one. In the worst case into the singleCleaner thread. - [ ] Fluff trigger on cycle detection. In inv/dinv, if we already have the object - [ ] Don't forget to delete the object from state.dandelion on each fluff trigger - [ ] Send dinv command for stem objects - [ ] route also for locally generated objects
amiller commented 2017-09-25 20:46:15 +02:00 (Migrated from github.com)

Awesome checklist! Really eager to review this, but it may need to wait until after my imminent deadlines :)

Some things I'm interested in.....

  • is there any notion of "related" or "dependent" transactions in BitMessage, that could lean to orphan transactions? Or are all transactions essentially separate.
  • It looks like your approach is similar to the "mempool embargo" approach, in that you suppress stem objects from ordinary propagation like sendBigInv. Are there any other ways that a BitMessage node could use a side channel to figure out if a peer has a stem object?
Awesome checklist! Really eager to review this, but it may need to wait until after my imminent deadlines :) Some things I'm interested in..... - is there any notion of "related" or "dependent" transactions in BitMessage, that could lean to orphan transactions? Or are all transactions essentially separate. - It looks like your approach is similar to the "mempool embargo" approach, in that you suppress stem objects from ordinary propagation like sendBigInv. Are there any other ways that a BitMessage node could use a side channel to figure out if a peer has a stem object?
PeterSurda commented 2017-09-25 22:29:11 +02:00 (Migrated from github.com)

@amiller Thank you for the reply.

With the 3 commits, I believe all the points are implemented, however I have done no testing and there probably are bugs, so I haven't made any ticks yet.

Regarding your questions:

  • there is no equivalent to consensus mechanism in Bitmessage and the objects are completely independent. After you decrypt the objects, there may be some relationships, but this is separate from the network subsystem, and at least in PyBitmessage it's handled asynchronously to the network subsystem.

  • Bitmessage has had much less security review than Bitcoin. There was some criticism in the past, much of it is similar to issues with Bitcoin, and some of it has been addressed since it was reported. Since there are no interdependecies between objects (indeed, almost all objects are encrypted so you can only verify if it hasn't expired, if it has a sufficient PoW, a correct checksum and stream), this makes certain attacks impossible which would have worked on Bitcoin.

When the intersection attack was brought to my attention last year in February, we discussed it internally (@mirrorwish, @Atheros1 and me) and I wrote code for mitigating it by stalling the attacker here: https://github.com/Bitmessage/PyBitmessage/blob/v0.6/src/network/tcp.py#L79 . In summary, if a node receives a getdata request and it doesn't have the object (and now also if it's a stem object that isn't on its route), it will skip processing getdata requests from that node until it estimates that a small object would propagate across the whole network on average. This skip is also triggered upon connection to prevent an attacker from quickly connecting, sending a getdata and disconnecting.

The older implementation in the network subsystem (until 0.6.2) didn't allow the mitigation to work cleanly, as the processing of each connection was synchronous, however with switching the network subsystem to asyncore, the processing is more asynchronous (it can be improved even more in the future) and it can ignore getdata requests while processing others.

I also made a minor change in one of the patches above, that a response to getdata is processed in randomised order, and the first missing/stem object will trigger the skipping, so if the attacker requests multiple objects, he'll only receive a random subset of the valid ones, so he can't be sure which you don't have or if there were network delays.

I am not sure if it's better to skip or delay the getdata requests after the trigger. I decided to skip, since if I was delaying them, my intuition tells me that there probably would be security issues depending on how it's implemented that I can't foresee at the moment, and I may end up making the attack worse. I'll reevaluate it in the future or if someone does a proper analysis.

@amiller Thank you for the reply. With the 3 commits, I believe all the points are implemented, however I have done no testing and there probably are bugs, so I haven't made any ticks yet. Regarding your questions: - there is no equivalent to consensus mechanism in Bitmessage and the objects are completely independent. After you decrypt the objects, there may be some relationships, but this is separate from the network subsystem, and at least in PyBitmessage it's handled asynchronously to the network subsystem. - Bitmessage has had much less security review than Bitcoin. There was some criticism in the past, much of it is similar to issues with Bitcoin, and some of it has been addressed since it was reported. Since there are no interdependecies between objects (indeed, almost all objects are encrypted so you can only verify if it hasn't expired, if it has a sufficient PoW, a correct checksum and stream), this makes certain attacks impossible which would have worked on Bitcoin. When the intersection attack was brought to my attention last year in February, we discussed it internally (@mirrorwish, @Atheros1 and me) and I wrote code for mitigating it by stalling the attacker here: https://github.com/Bitmessage/PyBitmessage/blob/v0.6/src/network/tcp.py#L79 . In summary, if a node receives a getdata request and it doesn't have the object (and now also if it's a stem object that isn't on its route), it will skip processing getdata requests from that node until it estimates that a small object would propagate across the whole network on average. This skip is also triggered upon connection to prevent an attacker from quickly connecting, sending a getdata and disconnecting. The older implementation in the network subsystem (until 0.6.2) didn't allow the mitigation to work cleanly, as the processing of each connection was synchronous, however with switching the network subsystem to asyncore, the processing is more asynchronous (it can be improved even more in the future) and it can ignore getdata requests while processing others. I also made a minor change in one of the patches above, that a response to getdata is processed in randomised order, and the first missing/stem object will trigger the skipping, so if the attacker requests multiple objects, he'll only receive a random subset of the valid ones, so he can't be sure which you don't have or if there were network delays. I am not sure if it's better to skip or delay the getdata requests after the trigger. I decided to skip, since if I was delaying them, my intuition tells me that there probably would be security issues depending on how it's implemented that I can't foresee at the moment, and I may end up making the attack worse. I'll reevaluate it in the future or if someone does a proper analysis.
PeterSurda commented 2017-09-28 14:45:12 +02:00 (Migrated from github.com)

I'll post here as I'm having troubles with sending to @illinois.edu email addresses.

I did find the time to review the additional information (Greg's posts, Giulia's updated announcement), and have a couple of questions:

  • it looks like the sample code uses two child stems. I thought I read an explanation for that, but I can't find it. Should I implement two or one child stems?

  • the sample code appears to ensure that after reshuffling the new path, it doesn't include any of the current ones, am I correct?

  • it looks like it only uses outbound (and whitelisted, which is a concept that currently doesn't exist in PyBitmessage) connections? I presume this is a defense. What to do with nodes that don't make outbound connections, or don't currently have one? Like during an early phase of startup. Turn dandelion off? Delay (if it's a temporary issue)?

  • regarding shifting the fluff mode to a non-dandelion node, how is that handled probability wise? Is there another variable, or a compile time (C) value?

I'll post here as I'm having troubles with sending to @illinois.edu email addresses. I did find the time to review the additional information (Greg's posts, Giulia's updated announcement), and have a couple of questions: - it looks like the sample code uses two child stems. I thought I read an explanation for that, but I can't find it. Should I implement two or one child stems? - the sample code appears to ensure that after reshuffling the new path, it doesn't include any of the current ones, am I correct? - it looks like it only uses outbound (and whitelisted, which is a concept that currently doesn't exist in PyBitmessage) connections? I presume this is a defense. What to do with nodes that don't make outbound connections, or don't currently have one? Like during an early phase of startup. Turn dandelion off? Delay (if it's a temporary issue)? - regarding shifting the fluff mode to a non-dandelion node, how is that handled probability wise? Is there another variable, or a compile time (C) value?
amiller commented 2017-09-28 16:08:45 +02:00 (Migrated from github.com)

Hey Peter,

  • Yes there should be two child stems. We call this the "4 regular graph" approach, since there's typically two incoming and two outgoing stem edges for each node... And our analysis says it helps privacy when the attacker can learn some of the edges. For each transaction, only one is chosen though.

  • The current shuffling does do this, but I don't think it's necessary

  • outbound only is important because Core nodes accept incoming connections from anywhere, a super node could easily make 100 connections to each node and get an advantage in being chosen as a stem edge. It is harder to get nodes to make outgoing connections to you. I would say, delay until having more outgoing edges. Or at least until having 2 and then let those be the stem edges.

  • I'm not sure I follow the last question, but we have a new command like flag and global parameter for the fluff probability

Hey Peter, - Yes there should be two child stems. We call this the "4 regular graph" approach, since there's typically two incoming and two outgoing stem edges for each node... And our analysis says it helps privacy when the attacker can learn some of the edges. For each transaction, only one is chosen though. - The current shuffling does do this, but I don't think it's necessary - outbound only is important because Core nodes accept incoming connections from anywhere, a super node could easily make 100 connections to each node and get an advantage in being chosen as a stem edge. It is harder to get nodes to make outgoing connections to you. I would say, delay until having more outgoing edges. Or at least until having 2 and then let those be the stem edges. - I'm not sure I follow the last question, but we have a new command like flag and global parameter for the fluff probability
PeterSurda commented 2017-09-29 19:20:09 +02:00 (Migrated from github.com)

For each transaction, only one is chosen though.

Ok, this is the point I missed, thanks. Have to fix it.

I'm not sure I follow the last question, but we have a new command like flag and global parameter for the fluff probability.

In his bitcoin-dev post, gmaxwell said this:

An alternative construction would be that when a stem transaction goes out there is a random chance that the stem flag is not set (with suitable adjustment to keep the same expected path length)

To which Giulia replied:

Agreed, this is actually what we have implemented.

So to reformulate it, there are two types of stem randomly turning to fluff:

  1. I fluff with a probability x.

  2. With a probability x, I announce the transaction as a non-dandelion one to the edge, and assume he fluffs (which normally he would)

How does the code distinguish which one to take? Or did I misunderstand it, and the first type doesn't exist? That actually makes sense in retrospect, as that would automatically get rid of some of the bugs that exist in my implementation at the moment, and why I can't find how your code handles it. It would also explain some of Greg's comments, regarding something being unclear in the specification, and how attackers may be able to block propagation.

> For each transaction, only one is chosen though. Ok, this is the point I missed, thanks. Have to fix it. > I'm not sure I follow the last question, but we have a new command like flag and global parameter for the fluff probability. In his bitcoin-dev post, gmaxwell said this: > An alternative construction would be that when a stem transaction goes out there is a random chance that the stem flag is not set (with suitable adjustment to keep the same expected path length) To which Giulia replied: > Agreed, this is actually what we have implemented. So to reformulate it, there are two types of stem randomly turning to fluff: 1. I fluff with a probability x. 2. With a probability x, I announce the transaction as a non-dandelion one to the edge, and assume he fluffs (which normally he would) How does the code distinguish which one to take? Or did I misunderstand it, and the first type doesn't exist? That actually makes sense in retrospect, as that would automatically get rid of some of the bugs that exist in my implementation at the moment, and why I can't find how your code handles it. It would also explain some of Greg's comments, regarding something being unclear in the specification, and how attackers may be able to block propagation.
PeterSurda commented 2017-09-30 13:46:37 +02:00 (Migrated from github.com)

Partial fix in b1442ecb0a

  • it now only selects one stem for each object
  • rng fluff mode is triggered by sending an inv

Todo:

  • only select outbound connections for stem
  • delay if fewer than 2 outbound connections
  • turn off dandelion if outbound connections are disabled or in bootstrap provider mode
Partial fix in b1442ecb0a1e8c383040a071c648624c38fef5c5 - [x] it now only selects one stem for each object - [x] rng fluff mode is triggered by sending an inv Todo: - [ ] only select outbound connections for stem - [ ] delay if fewer than 2 outbound connections - [ ] turn off dandelion if outbound connections are disabled or in bootstrap provider mode
PeterSurda commented 2018-02-03 11:48:41 +01:00 (Migrated from github.com)

Ready in fd1a6c1fa1.

Ready in fd1a6c1fa14ab719f43d97ad52f464826fb32b4c.
This repo is archived. You cannot comment on issues.
No Milestone
No project
No Assignees
1 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Bitmessage/PyBitmessage-2024-12-23#1049
No description provided.