Network I/O buffer operations slow #1379

New Issue

2018-10-28T07:03:43+01:00

PeterSurda commented

2018-10-28 07:03:43 +01:00

(Migrated from github.com)

Operations like slice_write_buf, slice_read_buf, ~~append_write_buf~~, ~~read_buf.extend~~ are slow because they copy data around and reallocate memory. There are several things which can be done to improve this without major refactoring:

preallocate bytearray, buffer = bytearray(max_size). This has some tradeoffs, e.g. more memory required, and removing data from the buffer then takes more time (so we need to avoid removing data from it).
use recv_into/recvfrom_into instead of recv/recvfrom. These put data directly into the buffer rather than allocating new strings
use memoryview instead of array slice when parsing data
instead of slicing the front of the buffer, use some other method, e.g. using memoryview as well, see here: https://stackoverflow.com/questions/15962119/using-bytearray-with-socket-recv-into#15964489
if we can somehow use a separate buffer for each command, we can avoid locking, allowing more CPU to be used on multi-core systems.

I ran some benchmarks, using bytearray slicing and appending has a performance of about 1MB/s, even when using bytearrays. Preallocating buffers can do about 20GB/s (20k times better), and using a slice of memoryview about 6GB/s (6k times better). Obviously it depends on other criteria, I was using 1kB chunks of data within the buffer.

Some operations don't work on buffers, e.g. you can't use a buffer slice as a dict key, but I think most of these have already been addressed earlier.

Edit: Appending to bytearrays doesn't seem to cause performance problems, only slicing from the beginning.

Operations like `slice_write_buf`, `slice_read_buf`, ~~`append_write_buf`~~, ~~`read_buf.extend`~~ are slow because they copy data around and reallocate memory. There are several things which can be done to improve this without major refactoring: - preallocate bytearray, `buffer = bytearray(max_size)`. This has some tradeoffs, e.g. more memory required, and removing data from the buffer then takes more time (so we need to avoid removing data from it). - use `recv_into`/`recvfrom_into` instead of `recv`/`recvfrom`. These put data directly into the buffer rather than allocating new strings - use memoryview instead of array slice when parsing data - instead of slicing the front of the buffer, use some other method, e.g. using memoryview as well, see here: https://stackoverflow.com/questions/15962119/using-bytearray-with-socket-recv-into#15964489 - if we can somehow use a separate buffer for each command, we can avoid locking, allowing more CPU to be used on multi-core systems. I ran some benchmarks, using bytearray slicing and appending has a performance of about 1MB/s, even when using bytearrays. Preallocating buffers can do about 20GB/s (20k times better), and using a slice of memoryview about 6GB/s (6k times better). Obviously it depends on other criteria, I was using 1kB chunks of data within the buffer. Some operations don't work on buffers, e.g. you can't use a buffer slice as a dict key, but I think most of these have already been addressed earlier. **Edit:** Appending to bytearrays doesn't seem to cause performance problems, only slicing from the beginning.

This repo is archived. You cannot comment on issues.