implement netcat operating mode #1222
No reviewers
Labels
No Label
bug
build
dependencies
developers
documentation
duplicate
enhancement
formatting
invalid
legal
mobile
obsolete
packaging
performance
protocol
question
refactoring
regression
security
test
translation
usability
wontfix
No Milestone
No project
No Assignees
1 Participants
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Bitmessage/PyBitmessage-2024-12-19#1222
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "f97ada87/opmode-netcat-v2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.
This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.
a line-too-long codacy issue was fixed
How it stops? Maybe handle SIGINT? I cannot stop it either by SIGINT or SIGTERM.
PEP8 validation missing here
why can't we use
if not line or if not line.strip() ?
@ -0,0 +11,4 @@
outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object
"""
import threading
Maintain a order all the imports should be first.
from statements should be written after import statements.
@ -0,0 +40,4 @@
self.inputSrc = inputSrc
logger.info('stdInput thread started.')
def run(self):
Please write a method with only 15 or 20 lines.
@ -0,0 +128,4 @@
"""
The objectStdOut thread receives network objects from the receiveDataThreads.
"""
def __init__(self):
Follow Pep8 styling format.
@ -0,0 +138,4 @@
objectType, data = row
queues.objectProcessorQueue.put((objectType, data))
sqlExecute('''DELETE FROM objectprocessorqueue''')
logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))
Pep8 styling missing
@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN.
Do you think it's needed? If yes, I can use a non-blocking read instead.
@omkar1117 human or robot? If human, we need to talk.
Of course, Ctrl+D. Yes, it works. It's enough for me.
done
@ -0,0 +11,4 @@
outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object
"""
import threading
done
"if not line" - can't strip an EOF :)
@ -0,0 +40,4 @@
self.inputSrc = inputSrc
logger.info('stdInput thread started.')
def run(self):
No. Go big or go home :)
@ -0,0 +128,4 @@
"""
The objectStdOut thread receives network objects from the receiveDataThreads.
"""
def __init__(self):
done
@ -0,0 +138,4 @@
objectType, data = row
queues.objectProcessorQueue.put((objectType, data))
sqlExecute('''DELETE FROM objectprocessorqueue''')
logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))
done
all pep8 issues are now fixed (except line length which is endemic) - thanks @omkar1117
@ -0,0 +40,4 @@
self.inputSrc = inputSrc
logger.info('stdInput thread started.')
def run(self):
Over longer term, I would also prefer to have shorter methods, some parts of the old code, like the class_objectProcessor are too long, but for this PR it's fine the way it is.
@ -0,0 +40,4 @@
self.inputSrc = inputSrc
logger.info('stdInput thread started.')
def run(self):
Noted with thanks. I think the ad-hoc object parsing accounts for a lot of avoidable LLOC wastage, however, as discussed in PR #1149 , there was no readily usable parser function to use instead.
It could be useful to have an option to dump the messages which caused an exceptions with help of this
std_io
module.I'm sorry, I'm not sure what you mean, in a number of ways :)
Asking because these are all things that I've considered. :)
bmproto
(withERROR - Error processing
in debug.log)std_io
module for analysisIt may be another application for this new code when it will be merged.
Yes, this makes sense. As a matter of fact, the initial commit from PR #1149 was tapping into bmproto.py upstream of the sanity checks, and was capturing broken objects along with the good ones. It was only after updating the code to tap into the ObjectProcessorQueue instead (downstream of bmproto) that we stopped capturing them. Anyway, this should be easy to add in the future if needed, with a couple of conditionals inside bmproto.py
Hi guys, not trying to rush things, just double-checking if there are any open issues requiring attention from my end.
Haven't had time yet to check it out
I haven't tested yet, will do after the sql refactor. Any reason to think it wouldn't work cross-platform? Note I'm new to the codebase so apologies for any dumb points I raise.
The option '-n' often has connotations of 'dry run' or 'numeric only'. While we don't have anything using 'n' right now I'm thinking of avoiding future confusion if we do. Is there another letter that you would say makes as much sense as 'n'?
The other options are modes too. Perhaps you could drop the "mode-"?
According to the docs unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.
Can we just catch struct.error here? Oh, plus TypeError for unhexlify.
Not sure how many of these one might expect during 'normal' operation. Is 'info' the right log level? Maybe this and the next log statement could be debug level if it would otherwise lead to info being spammed. Maybe we expect people running in this mode to not care about other info level statements but be very interested in knowing that messages were or were not added, I don't know.
@ -0,0 +70,4 @@
logger.info("STDIN: Invalid object size")
continue
if not state.enableNetwork and state.enableGUI:
You set
state.enableGUI = False
above. Will this conditional ever be reached?@ -0,0 +134,4 @@
# REFACTOR THIS with objectProcessor into objectProcessorQueue
queryreturn = sqlQuery(
'''SELECT objecttype, data FROM objectprocessorqueue''')
for row in queryreturn:
By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring.
Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.
@ -0,0 +142,4 @@
# /REFACTOR THIS
def run(self):
while True:
When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop?
Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.
@ -0,0 +146,4 @@
objectType, data = queues.objectProcessorQueue.get()
# Output in documented format
print "%016x" % int(time.time()) + '\t' + hexlify(data)
Should we use sys.stdout.write() instead of print here?
@ -0,0 +161,4 @@
numberOfObjectsInObjProcQueue += 1
logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %
str(numberOfObjectsInObjProcQueue))
# /REFACTOR THIS
Yes please!
@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time".
This PR is ported from of a private fork that has several special operating modes:
--mode-netcat / -n
described here,--mode-airgap / -a
(no network, STDIO to ObjectProcessor) which will be ported next--mode-router / -r
and--mode-seeder / -s
which may never be portedAs much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork.
I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context.
A second best option would be "-c" (netCat) which is already taken.
See general comment, there's a naming theme.
@ -0,0 +70,4 @@
logger.info("STDIN: Invalid object size")
continue
if not state.enableNetwork and state.enableGUI:
state.enableGUI is only false in netcat mode; std_io may have other uses outside of netcat mode, which are not currently included. Yes, the conditional makes sense from a logical perspective.
@ -0,0 +134,4 @@
# REFACTOR THIS with objectProcessor into objectProcessorQueue
queryreturn = sqlQuery(
'''SELECT objecttype, data FROM objectprocessorqueue''')
for row in queryreturn:
The block marked by
refactor ... /refactor
is duplicated verbatim from classobjectProcessor
; the comment indicates my intention to refactor it into classobjectProcessorQueue
, a task which is outside the scope of this PR.@ -0,0 +146,4 @@
objectType, data = queues.objectProcessorQueue.get()
# Output in documented format
print "%016x" % int(time.time()) + '\t' + hexlify(data)
It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and not wrong.
You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug.
I guess what I really need here is not
logger.info
pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that.EDIT - Done: #1246