list index out of range traceback when no connects found or file corrupt. #1335

Closed
opened 2018-08-09 13:51:49 +02:00 by surbhicis · 10 comments
surbhicis commented 2018-08-09 13:51:49 +02:00 (Migrated from github.com)

random choice need a list to Generates a random sample from a list, but in connection pool it find self.stream empty and through list index out of range traceback:

Exception in thread Asyncore:
Traceback (most recent call last):
PyBitmessage/src/helper_random.py", line 65, in randomchoice
return random.choice(population) # nosec
File "/usr/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range

2018-08-09 13:28:22,901 - WARNING - No notification.sound plugin found

random choice need a list to Generates a random sample from a list, but in connection pool it find self.stream empty and through list index out of range traceback: Exception in thread Asyncore: Traceback (most recent call last): PyBitmessage/src/helper_random.py", line 65, in randomchoice return random.choice(population) # nosec File "/usr/lib/python2.7/random.py", line 275, in choice return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty IndexError: list index out of range 2018-08-09 13:28:22,901 - WARNING - No notification.sound plugin found
PeterSurda commented 2018-08-14 11:30:19 +02:00 (Migrated from github.com)

Looks like when knownnodes is empty, an exception is thrown.

Looks like when `knownnodes` is empty, an exception is thrown.
g1itch commented 2018-08-15 17:19:01 +02:00 (Migrated from github.com)

I suspect that it caused by wrong import order in bitmessagemain. Can you please provide a steps to reproduce?

I suspect that it caused by wrong import order in `bitmessagemain`. Can you please provide a steps to reproduce?
milahu commented 2018-10-01 20:09:47 +02:00 (Migrated from github.com)

knownnodes is empty

yes.
this happens, when knownnodes.dat is empty or corrupted*,
but parsing dont raise an exception
to trigger createDefaultKnownNodes.

patch:
in knownnodes.py
after createDefaultKnownNodes()
insert:

    node_count = 0
    for stream in knownNodes.keys():
        for node in knownNodes[stream].keys():
            node_count += 1
    if node_count == 0:
        createDefaultKnownNodes()
    else:
        print("restored %i nodes" % node_count)

*corrupted?
blame the file, or blame the parser?
see knownnodes.py

def pickle_deserialize_old_knownnodes(source):
    knownNodes = pickle.load(source)
    for stream in knownNodes.keys():
        for node, params in knownNodes[stream].items():
            if isinstance(params, (float, int)):
                addKnownNode(stream, node, params)

the if condition is never satisfied,
so old nodes are not added.

somehow, old pickle and new json format got mixed up.

in my pickle-coded knownnodes.dat
params is a dict, and looks like this
{'rating': 0, 'self': False, 'lastseen': 1538113792}

i suggest:

  1. check for duplicate nodes
  2. add elif condition
def pickle_deserialize_old_knownnodes(source):
    knownNodes = pickle.load(source)
    for stream in knownNodes.keys():
        for node, params in knownNodes[stream].items():
            # find duplicate nodes
            found_node = False
            for old_stream in knownNodes.keys():
                for old_node in knownNodes[old_stream].keys():
                    if old_node == node:
                        found_node = True
                        break
                if found_node:
                    break
            if found_node:
                continue

            if isinstance(params, (float, int)):
                # add old format
                addKnownNode(stream, node, params)
            elif isinstance(params, dict):
                # add new format
                knownNodes[stream][peer] = params

ps, have a look at El Paquete Semanal,
for 'more' decentral file sharing.
our main problem is,
we still depend on 'their' infrastructure.
see also Freifunk.
would love to see 'bridges' like XMPP transports.
kinda like a high-latency broadcast protocol.

hope to help ^^

ps2, still waiting for bm to connect ....
it did fall back to the nine hard-coded DEFAULT_NODES,
but none shows up in network status.
ping-ing the nodes does work.

> knownnodes is empty yes. this happens, when knownnodes.dat is empty or corrupted*, but parsing dont raise an exception to trigger createDefaultKnownNodes. patch: in knownnodes.py after createDefaultKnownNodes() insert: ``` node_count = 0 for stream in knownNodes.keys(): for node in knownNodes[stream].keys(): node_count += 1 if node_count == 0: createDefaultKnownNodes() else: print("restored %i nodes" % node_count) ``` *corrupted? blame the file, or blame the parser? see knownnodes.py ``` def pickle_deserialize_old_knownnodes(source): knownNodes = pickle.load(source) for stream in knownNodes.keys(): for node, params in knownNodes[stream].items(): if isinstance(params, (float, int)): addKnownNode(stream, node, params) ``` the if condition is never satisfied, so old nodes are not added. somehow, old pickle and new json format got mixed up. in my pickle-coded knownnodes.dat params is a dict, and looks like this ```{'rating': 0, 'self': False, 'lastseen': 1538113792}``` i suggest: 1. check for duplicate nodes 2. add elif condition ``` def pickle_deserialize_old_knownnodes(source): knownNodes = pickle.load(source) for stream in knownNodes.keys(): for node, params in knownNodes[stream].items(): # find duplicate nodes found_node = False for old_stream in knownNodes.keys(): for old_node in knownNodes[old_stream].keys(): if old_node == node: found_node = True break if found_node: break if found_node: continue if isinstance(params, (float, int)): # add old format addKnownNode(stream, node, params) elif isinstance(params, dict): # add new format knownNodes[stream][peer] = params ``` ps, have a look at El Paquete Semanal, for 'more' decentral file sharing. our main problem is, we still depend on 'their' infrastructure. see also Freifunk. would love to see 'bridges' like XMPP transports. kinda like a high-latency broadcast protocol. hope to help ^^ ps2, still waiting for bm to connect .... it did fall back to the nine hard-coded DEFAULT_NODES, but none shows up in network status. ping-ing the nodes does work.
g1itch commented 2018-10-01 21:05:00 +02:00 (Migrated from github.com)

@milahu Too many questions and unrelated stuff (:

Do you indeed update from BM < 6 so it has an old pickle knownnodes format? Could you please provide such file?

what a problem to import zero nodes from empty list? Yes default nodes should be added in this case. Though if the knownnodes.dat is really empty file it crashes with EOFError

No need to deduplicate dict because its keys are unique. What old_stream, did you implement streams?

@milahu Too many questions and unrelated stuff (: Do you indeed update from BM < 6 so it has an old pickle knownnodes format? Could you please provide such file? what a problem to import zero nodes from empty list? Yes default nodes should be added in this case. Though if the `knownnodes.dat` is really empty file it crashes with `EOFError` No need to deduplicate dict because its keys are unique. What `old_stream`, did you implement streams?
milahu commented 2018-10-01 23:37:30 +02:00 (Migrated from github.com)

BM < 6

no .... i guess it was 0.6.3.2

in the logfile, i could not find the bm version.
in debug.py line 130 insert before 'if msg:'

    import version
    logger.log(logging.WARNING, "starting %s-%s" % (
        version.softwareName, version.softwareVersion))

my knownnodes.dat.bak has around 700kB with around 6k nodes.
sounds much to me ....

here is a sample file, with randomized ip addresses.
knownnodes.dat.bak.2.gz
created with
my_pickle.py.txt

but my last patch dont work.
all nodes seem duplicates.
that is, because pickle_deserialize_old_knownnodes
does overwrite knownNodes

knownnodes.py should look like this

def pickle_deserialize_old_knownnodes(source):
    new_knownNodes = pickle.load(source)
    for stream in new_knownNodes.keys():
        for node, params in new_knownNodes[stream].items():
            # find duplicate nodes
            found_node = False
            for old_stream in knownNodes.keys():
                for old_node in knownNodes[old_stream].keys():
                    if old_node == node:
                        found_node = True
                        print("unpickle: skip %s" % repr(node))
                        break
                if found_node:
                    break
            if found_node:
                continue

            if isinstance(params, (float, int)):
                # add old format
                addKnownNode(stream, node, params)
            elif isinstance(params, dict):
                # add new format
                knownNodes[stream][node] = params

why i still look for duplicates?
cos knownNodes has 2 keys, stream and node.
so one node can be in multiple streams.

old_stream and old_node are indices for the 'old' knownNodes,
where stream and node are indices for the 'new' new_knownNodes.

and ta-daa, its working : D
restored all 6k nodes + is connecting
im happy.

> BM < 6 no .... i guess it was 0.6.3.2 in the logfile, i could not find the bm version. in debug.py line 130 insert before 'if msg:' import version logger.log(logging.WARNING, "starting %s-%s" % ( version.softwareName, version.softwareVersion)) my knownnodes.dat.bak has around 700kB with around 6k nodes. sounds much to me .... here is a sample file, with randomized ip addresses. [knownnodes.dat.bak.2.gz](https://github.com/Bitmessage/PyBitmessage/files/2435630/knownnodes.dat.bak.2.gz) created with [my_pickle.py.txt](https://github.com/Bitmessage/PyBitmessage/files/2435640/my_pickle.py.txt) but my last patch dont work. all nodes seem duplicates. that is, because pickle_deserialize_old_knownnodes does overwrite knownNodes knownnodes.py should look like this ``` def pickle_deserialize_old_knownnodes(source): new_knownNodes = pickle.load(source) for stream in new_knownNodes.keys(): for node, params in new_knownNodes[stream].items(): # find duplicate nodes found_node = False for old_stream in knownNodes.keys(): for old_node in knownNodes[old_stream].keys(): if old_node == node: found_node = True print("unpickle: skip %s" % repr(node)) break if found_node: break if found_node: continue if isinstance(params, (float, int)): # add old format addKnownNode(stream, node, params) elif isinstance(params, dict): # add new format knownNodes[stream][node] = params ``` why i still look for duplicates? cos knownNodes has 2 keys, stream and node. so one node can be in multiple streams. old_stream and old_node are indices for the 'old' knownNodes, where stream and node are indices for the 'new' new_knownNodes. and ta-daa, its working : D restored all 6k nodes + is connecting im happy.
PeterSurda commented 2018-10-02 08:56:37 +02:00 (Migrated from github.com)

#1336 is an attempt to fix it but it's wrong. See my comments there. I haven't tried reproducing it myself, but what could happen is that all known nodes expire when your internet is down.

If the list is empty, it should re-bootsrap in my opinion.

#1336 is an attempt to fix it but it's wrong. See my comments there. I haven't tried reproducing it myself, but what could happen is that all known nodes expire when your internet is down. If the list is empty, it should re-bootsrap in my opinion.
g1itch commented 2018-10-02 10:45:50 +02:00 (Migrated from github.com)

Oh, you trying to use knownnodes.dat populated with random IPs. What you expect?
I see no point to repopulate dict with its values. Because for the latest pickle format pickle.load() should be sufficient.

Oh, you trying to use `knownnodes.dat` populated with random IPs. What you expect? I see no point to repopulate dict with its values. Because for the latest pickle format `pickle.load()` should be sufficient.
milahu commented 2018-10-02 11:00:20 +02:00 (Migrated from github.com)

If the list is empty, it should re-bootsrap in my opinion.

no. if my knownnodes.dat has 6k nodes, but zero are loaded,
there must be something wrong with the parser.

knownnodes.dat populated with random IPs

that was a sample file, to test parsing.
i have the original file here, did apply the patch above, and it works.
all nodes are loaded, bm connects, and stores nodes in new json format.

but i still have no idea, why it did store in pickle format.

repopulate dict with its values

i only let you know, how i did the ip randomization.

as i said, im happy. do what you want.

> If the list is empty, it should re-bootsrap in my opinion. no. if my `knownnodes.dat` has 6k nodes, but zero are loaded, there must be something wrong with the parser. > knownnodes.dat populated with random IPs that was a sample file, to test parsing. i have the original file here, did apply the patch above, and it works. all nodes are loaded, bm connects, and stores nodes in new json format. but i still have no idea, why it did store in pickle format. > repopulate dict with its values i only let you know, how i did the ip randomization. as i said, im happy. do what you want.
PeterSurda commented 2018-10-02 12:06:46 +02:00 (Migrated from github.com)

There are actually two problems. The first one is in parsing. The second one is that if knownnodes for whatever reason becomes empty, the Asyncore loop will start consuming a lot of CPU.

There are actually two problems. The first one is in parsing. The second one is that if knownnodes for whatever reason becomes empty, the Asyncore loop will start consuming a lot of CPU.
g1itch commented 2018-10-10 13:30:10 +02:00 (Migrated from github.com)

It was probably introduced in 342e2a2

It was probably introduced in 342e2a2
This repo is archived. You cannot comment on issues.
No Milestone
No project
No Assignees
1 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Bitmessage/PyBitmessage-2024-11-28#1335
No description provided.