implement netcat operating mode #1222

2018-04-16T12:51:17+02:00

f97ada87 commented

2018-04-16 12:51:17 +02:00

(Migrated from github.com)

This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.

This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.

This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network. This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.

PeterSurda (Migrated from github.com) reviewed 2018-04-16 12:51:17 +02:00

f97ada87 commented

2018-04-16 13:10:55 +02:00

(Migrated from github.com)

a line-too-long codacy issue was fixed

g1itch (Migrated from github.com) reviewed 2018-04-16 15:20:37 +02:00

g1itch (Migrated from github.com) left a comment

How it stops? Maybe handle SIGINT? I cannot stop it either by SIGINT or SIGTERM.

omkar1117 (Migrated from github.com) requested changes 2018-04-16 16:25:02 +02:00

src/std_io.py

omkar1117 (Migrated from github.com) commented

2018-04-16 14:51:43 +02:00

PEP8 validation missing here

omkar1117 (Migrated from github.com) commented

2018-04-16 14:56:21 +02:00

why can't we use
if not line or if not line.strip() ?

why can't we use if not line or if not line.strip() ?

src/std_io.py

						
				@ -0,0 +11,4 @@

				  outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object

				"""

				import threading

omkar1117 (Migrated from github.com) commented

2018-04-16 14:53:23 +02:00

Maintain a order all the imports should be first.
from statements should be written after import statements.

Maintain a order all the imports should be first. from statements should be written after import statements.

src/std_io.py

						
				@ -0,0 +40,4 @@

				        self.inputSrc = inputSrc

				        logger.info('stdInput thread started.')

				    def run(self):

omkar1117 (Migrated from github.com) commented

2018-04-16 14:57:32 +02:00

Please write a method with only 15 or 20 lines.

src/std_io.py

						
				@ -0,0 +128,4 @@

				    """

				    The objectStdOut thread receives network objects from the receiveDataThreads.

				    """

				    def __init__(self):

omkar1117 (Migrated from github.com) commented

2018-04-16 16:24:00 +02:00

Follow Pep8 styling format.

src/std_io.py

						
				@ -0,0 +138,4 @@

				            objectType, data = row

				            queues.objectProcessorQueue.put((objectType, data))

				        sqlExecute('''DELETE FROM objectprocessorqueue''')

				        logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))

omkar1117 (Migrated from github.com) commented

2018-04-16 16:24:18 +02:00

Pep8 styling missing

f97ada87 commented

2018-04-16 17:19:06 +02:00

(Migrated from github.com)

@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN.
Do you think it's needed? If yes, I can use a non-blocking read instead.

@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN. Do you think it's needed? If yes, I can use a non-blocking read instead.

f97ada87 commented

2018-04-16 17:20:42 +02:00

(Migrated from github.com)

@omkar1117 human or robot? If human, we need to talk.

g1itch commented

2018-04-16 17:28:15 +02:00

(Migrated from github.com)

Of course, Ctrl+D. Yes, it works. It's enough for me.

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:38:01 +02:00

src/std_io.py

f97ada87 (Migrated from github.com) commented

2018-04-16 17:38:01 +02:00

done

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:38:10 +02:00

src/std_io.py

						
				@ -0,0 +11,4 @@

				  outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object

				"""

				import threading

f97ada87 (Migrated from github.com) commented

2018-04-16 17:38:10 +02:00

done

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:39:16 +02:00

src/std_io.py

f97ada87 (Migrated from github.com) commented

2018-04-16 17:39:16 +02:00

"if not line" - can't strip an EOF :)

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:40:56 +02:00

src/std_io.py

						
				@ -0,0 +40,4 @@

				        self.inputSrc = inputSrc

				        logger.info('stdInput thread started.')

				    def run(self):

f97ada87 (Migrated from github.com) commented

2018-04-16 17:40:56 +02:00

No. Go big or go home :)

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:41:04 +02:00

src/std_io.py

						
				@ -0,0 +128,4 @@

				    """

				    The objectStdOut thread receives network objects from the receiveDataThreads.

				    """

				    def __init__(self):

f97ada87 (Migrated from github.com) commented

2018-04-16 17:41:04 +02:00

done

f97ada87 (Migrated from github.com) reviewed 2018-04-16 17:41:10 +02:00

src/std_io.py

						
				@ -0,0 +138,4 @@

				            objectType, data = row

				            queues.objectProcessorQueue.put((objectType, data))

				        sqlExecute('''DELETE FROM objectprocessorqueue''')

				        logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))

f97ada87 (Migrated from github.com) commented

2018-04-16 17:41:10 +02:00

done

f97ada87 commented

2018-04-16 18:05:10 +02:00

(Migrated from github.com)

all pep8 issues are now fixed (except line length which is endemic) - thanks @omkar1117

PeterSurda (Migrated from github.com) reviewed 2018-04-16 18:17:02 +02:00

src/std_io.py

						
				@ -0,0 +40,4 @@

				        self.inputSrc = inputSrc

				        logger.info('stdInput thread started.')

				    def run(self):

PeterSurda (Migrated from github.com) commented

2018-04-16 18:17:01 +02:00

Over longer term, I would also prefer to have shorter methods, some parts of the old code, like the class_objectProcessor are too long, but for this PR it's fine the way it is.

f97ada87 (Migrated from github.com) reviewed 2018-04-17 04:23:22 +02:00

src/std_io.py

						
				@ -0,0 +40,4 @@

				        self.inputSrc = inputSrc

				        logger.info('stdInput thread started.')

				    def run(self):

f97ada87 (Migrated from github.com) commented

2018-04-17 04:23:21 +02:00

Noted with thanks. I think the ad-hoc object parsing accounts for a lot of avoidable LLOC wastage, however, as discussed in PR #1149 , there was no readily usable parser function to use instead.

g1itch commented

2018-04-19 14:06:14 +02:00

(Migrated from github.com)

It could be useful to have an option to dump the messages which caused an exceptions with help of this std_io module.

It could be useful to have an option to dump the messages which caused an exceptions with help of this `std_io` module.

f97ada87 commented

2018-04-19 16:05:01 +02:00

(Migrated from github.com)

I'm sorry, I'm not sure what you mean, in a number of ways :)

by "messages which caused an exception", are you referring to:

the objects received from userspace via stdin thread, which fail unhexlify or other pre-parsing operations, or
the objects received from the network layer, which fail early sanity checks in bmproto.py and are dropped before even reaching inventory?

by "an option to dump", do you mean the objects should be:

displayed in raw form, and if yes, where exactly, considering that all STDIO is busy; just send them to logger.info?, or
dropped and not processed further, which is pretty much what we already do? :)

Asking because these are all things that I've considered. :)

I'm sorry, I'm not sure what you mean, in a number of ways :) 1) by "messages which caused an exception", are you referring to: - the objects received from userspace via stdin thread, which fail unhexlify or other pre-parsing operations, or - the objects received from the network layer, which fail early sanity checks in bmproto.py and are dropped before even reaching inventory? 2) by "an option to dump", do you mean the objects should be: - displayed in raw form, and if yes, where exactly, considering that all STDIO is busy; just send them to logger.info?, or - dropped and not processed further, which is pretty much what we already do? :) Asking because these are all things that I've considered. :)

g1itch commented

2018-04-19 16:24:06 +02:00

(Migrated from github.com)

messages received from network which caused an exception in bmproto (with ERROR - Error processing in debug.log)
dump them to files in format produced by your std_io module for analysis

It may be another application for this new code when it will be merged.

1. messages received from network which caused an exception in `bmproto` (with `ERROR - Error processing` in debug.log) 1. dump them to files in format produced by your `std_io` module for analysis It may be another application for this new code when it will be merged.

f97ada87 commented

2018-04-20 11:44:34 +02:00

(Migrated from github.com)

Yes, this makes sense. As a matter of fact, the initial commit from PR #1149 was tapping into bmproto.py upstream of the sanity checks, and was capturing broken objects along with the good ones. It was only after updating the code to tap into the ObjectProcessorQueue instead (downstream of bmproto) that we stopped capturing them. Anyway, this should be easy to add in the future if needed, with a couple of conditionals inside bmproto.py

f97ada87 commented

2018-05-02 11:32:04 +02:00

(Migrated from github.com)

Hi guys, not trying to rush things, just double-checking if there are any open issues requiring attention from my end.

PeterSurda commented

2018-05-02 11:54:59 +02:00

(Migrated from github.com)

Haven't had time yet to check it out

coffeedogs (Migrated from github.com) requested changes 2018-05-09 12:49:06 +02:00

coffeedogs (Migrated from github.com) left a comment

I haven't tested yet, will do after the sql refactor. Any reason to think it wouldn't work cross-platform? Note I'm new to the codebase so apologies for any dumb points I raise.

src/bitmessagemain.py

coffeedogs (Migrated from github.com) commented

2018-05-09 11:11:47 +02:00

The option '-n' often has connotations of 'dry run' or 'numeric only'. While we don't have anything using 'n' right now I'm thinking of avoiding future confusion if we do. Is there another letter that you would say makes as much sense as 'n'?

src/bitmessagemain.py

coffeedogs (Migrated from github.com) commented

2018-05-09 11:15:42 +02:00

The other options are modes too. Perhaps you could drop the "mode-"?

src/std_io.py

coffeedogs (Migrated from github.com) commented

2018-05-09 11:41:06 +02:00

According to the docs unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.

According to the [docs](https://docs.python.org/2/library/binascii.html) unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.

coffeedogs (Migrated from github.com) commented

2018-05-09 11:55:26 +02:00

Can we just catch struct.error here? Oh, plus TypeError for unhexlify.

coffeedogs (Migrated from github.com) commented

2018-05-09 12:24:11 +02:00

Not sure how many of these one might expect during 'normal' operation. Is 'info' the right log level? Maybe this and the next log statement could be debug level if it would otherwise lead to info being spammed. Maybe we expect people running in this mode to not care about other info level statements but be very interested in knowing that messages were or were not added, I don't know.

src/std_io.py

						
				@ -0,0 +70,4 @@

				                logger.info("STDIN: Invalid object size")

				                continue

				            if not state.enableNetwork and state.enableGUI:

coffeedogs (Migrated from github.com) commented

2018-05-09 11:42:44 +02:00

You set state.enableGUI = False above. Will this conditional ever be reached?

You set `state.enableGUI = False` above. Will this conditional ever be reached?

src/std_io.py

						
				@ -0,0 +134,4 @@

				        # REFACTOR THIS with objectProcessor into objectProcessorQueue

				        queryreturn = sqlQuery(

				            '''SELECT objecttype, data FROM objectprocessorqueue''')

				        for row in queryreturn:

coffeedogs (Migrated from github.com) commented

2018-05-09 12:39:24 +02:00

By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring.

Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.

By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring. Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.

src/std_io.py

						
				@ -0,0 +142,4 @@

				        # /REFACTOR THIS

				    def run(self):

				        while True:

coffeedogs (Migrated from github.com) commented

2018-05-09 12:42:07 +02:00

When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop?

Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.

When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop? Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.

src/std_io.py

						
				@ -0,0 +146,4 @@

				            objectType, data = queues.objectProcessorQueue.get()

				            # Output in documented format

				            print "%016x" % int(time.time()) + '\t' + hexlify(data)

coffeedogs (Migrated from github.com) commented

2018-05-09 12:44:50 +02:00

Should we use sys.stdout.write() instead of print here?

src/std_io.py

						
				@ -0,0 +161,4 @@

				                        numberOfObjectsInObjProcQueue += 1

				                logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %

				                             str(numberOfObjectsInObjProcQueue))

				                # /REFACTOR THIS

coffeedogs (Migrated from github.com) commented

2018-05-09 12:43:18 +02:00

Yes please!

f97ada87 commented

2018-05-13 05:05:34 +02:00

(Migrated from github.com)

@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time".
This PR is ported from of a private fork that has several special operating modes:

--mode-netcat / -n described here,
--mode-airgap / -a (no network, STDIO to ObjectProcessor) which will be ported next
--mode-router / -r and --mode-seeder / -s which may never be ported

As much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork.

@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time". This PR is ported from of a private fork that has several special operating modes: - `--mode-netcat / -n` described here, - `--mode-airgap / -a` (no network, STDIO to ObjectProcessor) which will be ported next - `--mode-router / -r` and `--mode-seeder / -s` which may never be ported As much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork.

👍 1

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:11:16 +02:00

src/bitmessagemain.py

f97ada87 (Migrated from github.com) commented

2018-05-13 05:11:15 +02:00

I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context.
A second best option would be "-c" (netCat) which is already taken.

I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context. A second best option would be "-c" (netCat) which is already taken.

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:12:09 +02:00

src/bitmessagemain.py

f97ada87 (Migrated from github.com) commented

2018-05-13 05:12:08 +02:00

See general comment, there's a naming theme.

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:16:40 +02:00

src/std_io.py

						
				@ -0,0 +70,4 @@

				                logger.info("STDIN: Invalid object size")

				                continue

				            if not state.enableNetwork and state.enableGUI:

f97ada87 (Migrated from github.com) commented

2018-05-13 05:16:40 +02:00

state.enableGUI is only false in netcat mode; std_io may have other uses outside of netcat mode, which are not currently included. Yes, the conditional makes sense from a logical perspective.

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:22:20 +02:00

src/std_io.py

						
				@ -0,0 +134,4 @@

				        # REFACTOR THIS with objectProcessor into objectProcessorQueue

				        queryreturn = sqlQuery(

				            '''SELECT objecttype, data FROM objectprocessorqueue''')

				        for row in queryreturn:

f97ada87 (Migrated from github.com) commented

2018-05-13 05:22:20 +02:00

The block marked by refactor ... /refactor is duplicated verbatim from class objectProcessor; the comment indicates my intention to refactor it into class objectProcessorQueue, a task which is outside the scope of this PR.

The block marked by `refactor ... /refactor` is duplicated verbatim from class `objectProcessor`; the comment indicates my intention to refactor it into class `objectProcessorQueue`, a task which is outside the scope of this PR.

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:24:46 +02:00

src/std_io.py

						
				@ -0,0 +146,4 @@

				            objectType, data = queues.objectProcessorQueue.get()

				            # Output in documented format

				            print "%016x" % int(time.time()) + '\t' + hexlify(data)

f97ada87 (Migrated from github.com) commented

2018-05-13 05:24:46 +02:00

It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and not wrong.

It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and _not wrong_.

f97ada87 (Migrated from github.com) reviewed 2018-05-13 05:36:45 +02:00

src/std_io.py

f97ada87 (Migrated from github.com) commented

2018-05-13 05:36:45 +02:00

You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug.
I guess what I really need here is not logger.info pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that.
EDIT - Done: #1246

You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug. I guess what I really need here is not `logger.info` pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that. EDIT - Done: #1246

This repo is archived. You cannot comment on pull requests.

No reviewers

PeterSurda

omkar1117

coffeedogs