Network I/O buffer operations slow #1379

Open
opened 2018-10-28 07:03:43 +01:00 by PeterSurda · 0 comments
PeterSurda commented 2018-10-28 07:03:43 +01:00 (Migrated from github.com)

Operations like slice_write_buf, slice_read_buf, append_write_buf, read_buf.extend are slow because they copy data around and reallocate memory. There are several things which can be done to improve this without major refactoring:

  • preallocate bytearray, buffer = bytearray(max_size). This has some tradeoffs, e.g. more memory required, and removing data from the buffer then takes more time (so we need to avoid removing data from it).
  • use recv_into/recvfrom_into instead of recv/recvfrom. These put data directly into the buffer rather than allocating new strings
  • use memoryview instead of array slice when parsing data
  • instead of slicing the front of the buffer, use some other method, e.g. using memoryview as well, see here: https://stackoverflow.com/questions/15962119/using-bytearray-with-socket-recv-into#15964489
  • if we can somehow use a separate buffer for each command, we can avoid locking, allowing more CPU to be used on multi-core systems.

I ran some benchmarks, using bytearray slicing and appending has a performance of about 1MB/s, even when using bytearrays. Preallocating buffers can do about 20GB/s (20k times better), and using a slice of memoryview about 6GB/s (6k times better). Obviously it depends on other criteria, I was using 1kB chunks of data within the buffer.

Some operations don't work on buffers, e.g. you can't use a buffer slice as a dict key, but I think most of these have already been addressed earlier.

Edit: Appending to bytearrays doesn't seem to cause performance problems, only slicing from the beginning.

Operations like `slice_write_buf`, `slice_read_buf`, ~~`append_write_buf`~~, ~~`read_buf.extend`~~ are slow because they copy data around and reallocate memory. There are several things which can be done to improve this without major refactoring: - preallocate bytearray, `buffer = bytearray(max_size)`. This has some tradeoffs, e.g. more memory required, and removing data from the buffer then takes more time (so we need to avoid removing data from it). - use `recv_into`/`recvfrom_into` instead of `recv`/`recvfrom`. These put data directly into the buffer rather than allocating new strings - use memoryview instead of array slice when parsing data - instead of slicing the front of the buffer, use some other method, e.g. using memoryview as well, see here: https://stackoverflow.com/questions/15962119/using-bytearray-with-socket-recv-into#15964489 - if we can somehow use a separate buffer for each command, we can avoid locking, allowing more CPU to be used on multi-core systems. I ran some benchmarks, using bytearray slicing and appending has a performance of about 1MB/s, even when using bytearrays. Preallocating buffers can do about 20GB/s (20k times better), and using a slice of memoryview about 6GB/s (6k times better). Obviously it depends on other criteria, I was using 1kB chunks of data within the buffer. Some operations don't work on buffers, e.g. you can't use a buffer slice as a dict key, but I think most of these have already been addressed earlier. **Edit:** Appending to bytearrays doesn't seem to cause performance problems, only slicing from the beginning.
This repo is archived. You cannot comment on issues.
No Milestone
No project
No Assignees
1 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Bitmessage/PyBitmessage-2024-12-25#1379
No description provided.