Strange error: Thread crashes on startup #530

Closed
opened 2013-10-16 02:16:23 +02:00 by AyrA · 10 comments
AyrA commented 2013-10-16 02:16:23 +02:00 (Migrated from github.com)

I just noticed this error showing up during startup.

Usually when working with threads please check periodically if they are still running and in case they are not, put the failed task to the end of the queue, so it does not blocks others. Then restart the thread.

For some reason POW is no longer done for any messages.

making request for v4 pubkey with tag: 50136c5a38f928e991b3061c3c3e3492d76cf7202f393133c44b356bd96a584d
Done in 1.15400004387: 1186106 with value 3562510383726 < 4102923503938
Found proof of work 3562510383726 Nonce: 1186106
Exception in thread Thread-2:
Traceback (most recent call last):
  File "Q:\Python27\lib\threading.py", line 808, in __bootstrap_inner
    self.run()
  File "Q:\PyBitmessage\src\class_singleWorker.py", line 60, in run
    self.sendMsg()
  File "C:\AdminRoot\TBM\PyBitmessage\src\class_singleWorker.py", line 503, in sendMsg
    self.requestPubKey(toaddress)
  File "Q:\PyBitmessage\src\class_singleWorker.py", line 897, in requestPubKey
    shared.inventorySets[streamNumber].add(inventoryHash)
KeyError: 1
I just noticed this error showing up during startup. Usually when working with threads please check periodically if they are still running and in case they are not, put the failed task to the end of the queue, so it does not blocks others. Then restart the thread. For some reason POW is no longer done for any messages. ``` making request for v4 pubkey with tag: 50136c5a38f928e991b3061c3c3e3492d76cf7202f393133c44b356bd96a584d Done in 1.15400004387: 1186106 with value 3562510383726 < 4102923503938 Found proof of work 3562510383726 Nonce: 1186106 Exception in thread Thread-2: Traceback (most recent call last): File "Q:\Python27\lib\threading.py", line 808, in __bootstrap_inner self.run() File "Q:\PyBitmessage\src\class_singleWorker.py", line 60, in run self.sendMsg() File "C:\AdminRoot\TBM\PyBitmessage\src\class_singleWorker.py", line 503, in sendMsg self.requestPubKey(toaddress) File "Q:\PyBitmessage\src\class_singleWorker.py", line 897, in requestPubKey shared.inventorySets[streamNumber].add(inventoryHash) KeyError: 1 ```
AyrA commented 2013-10-16 02:30:01 +02:00 (Migrated from github.com)

I found a dirty solution: just comment out the line 897 (and 808, it appears there too) and it works. message is properly delivered and ack is received.
What are the consequences of removing those lines?

I found a dirty solution: just comment out the line 897 (and 808, it appears there too) and it works. message is properly delivered and ack is received. What are the consequences of removing those lines?
Atheros1 commented 2013-10-16 05:48:57 +02:00 (Migrated from github.com)

Are you running custom code? I don't recognize where this line came from:

Done in 1.15400004387: 1186106 with value 3562510383726 < 4102923503938

I don't see why shared.inventorySets wouldn't have a key of "1" already set. This data structure is used very frequently: whenever you receive an inv message.

To answer your question, the consequence is that the part of the client that processes incoming inv messages won't be aware that you already have this inventory object. This will cause a several unnecessary inventory lookups but it is harmless.

Are you running custom code? I don't recognize where this line came from: ``` Done in 1.15400004387: 1186106 with value 3562510383726 < 4102923503938 ``` I don't see why shared.inventorySets wouldn't have a key of "1" already set. This data structure is used very frequently: whenever you receive an inv message. To answer your question, the consequence is that the part of the client that processes incoming inv messages won't be aware that you already have this inventory object. This will cause a several unnecessary inventory lookups but it is harmless.
AyrA commented 2013-10-16 06:45:43 +02:00 (Migrated from github.com)

The only custom part I am running is the GPU based PoW function and the FastCPU based function. Since it is much faster than the built in PoW, I don't know if it finishes too early, before there is a key 1 created.

The only custom part I am running is the GPU based PoW function and the FastCPU based function. Since it is much faster than the built in PoW, I don't know if it finishes too early, before there is a key 1 created.
Atheros1 commented 2013-10-16 17:25:15 +02:00 (Migrated from github.com)

Key 1 is created at startup.

Key 1 is created at startup.
PeterSurda commented 2015-11-12 00:16:51 +01:00 (Migrated from github.com)

@AyrA does this still happen with the latest code?

@AyrA does this still happen with the latest code?
AyrA commented 2015-11-12 09:00:01 +01:00 (Migrated from github.com)

I don't know. I have stopped using the bitmessage client completely after I implemented the gateway.

You can test this, by saving an empty knownnodes.dat (just pickle.dump an empty list) and then reopening the client if I remember correctly.

It boild down to the fact, that if there is a network problem, bitmessage slowly deletes each entry, as it can no longer connect, eventually leaving the list empty.

I don't know. I have stopped using the bitmessage client completely after I implemented the gateway. You can test this, by saving an empty knownnodes.dat (just pickle.dump an empty list) and then reopening the client if I remember correctly. It boild down to the fact, that if there is a network problem, bitmessage slowly deletes each entry, as it can no longer connect, eventually leaving the list empty.
PeterSurda commented 2015-11-12 17:43:07 +01:00 (Migrated from github.com)

@AyrA Thanks, I think I understand your explanation.

@AyrA Thanks, I think I understand your explanation.
PeterSurda commented 2015-11-27 14:50:42 +01:00 (Migrated from github.com)

I'm not fully sure, but I think that this happens when you launch PyBitmessage and your worker thread starts creating proof of work for pending messages or pubkey requests before the outbound network connections initialise. This is in theory possible, because the main thread loads your encryption keys between launching the worker thread and calling connectToStream (where inventorySet is properly initialised). If that's the cause, a fix is super easy, just initialise the inventorySet earlier to a state that does not have a missing key for stream 1.

You are pretty much the only one that will realistically trigger this, because you have many addresses in keys.dat and a GPU PoW at the same time. So your worker thread is fast, and the time between launching it and initialising the inventorySet is long. I doubt that I will be able to reproduce this without recreating such an environment. So I made the probable fix without trying to reproduce it.

If anyone still has this problem, let me know, including how to reproduce it.

I'm not fully sure, but I think that this happens when you launch PyBitmessage and your worker thread starts creating proof of work for pending messages or pubkey requests before the outbound network connections initialise. This is in theory possible, because the main thread loads your encryption keys between launching the worker thread and calling connectToStream (where inventorySet is properly initialised). If that's the cause, a fix is super easy, just initialise the inventorySet earlier to a state that does not have a missing key for stream 1. You are pretty much the only one that will realistically trigger this, because you have many addresses in keys.dat and a GPU PoW at the same time. So your worker thread is fast, and the time between launching it and initialising the inventorySet is long. I doubt that I will be able to reproduce this without recreating such an environment. So I made the probable fix without trying to reproduce it. If anyone still has this problem, let me know, including how to reproduce it.
AyrA commented 2015-11-27 15:42:32 +01:00 (Migrated from github.com)

I solved it myself by fiddling with the thread start order. Has worked for now but certainly is not professional. At the moment, there are close to 15000 active addresses in the system, so if you want to simulate it yourself, you could generate many dummy addresses in your client and try it I assume.
For obvious reasons I can't give you my address list.

I solved it myself by fiddling with the thread start order. Has worked for now but certainly is not professional. At the moment, there are close to 15000 active addresses in the system, so if you want to simulate it yourself, you could generate many dummy addresses in your client and try it I assume. For obvious reasons I can't give you my address list.
PeterSurda commented 2015-11-27 16:35:22 +01:00 (Migrated from github.com)

I don't want to allocate more time on this for the time being unless I know someone is affected.

I don't want to allocate more time on this for the time being unless I know someone is affected.
This repo is archived. You cannot comment on issues.
1 Participants
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Bitmessage/PyBitmessage-2025-02-27#530
No description provided.