Bitmessage hangs when 'disk full' condition occurrs #572

Closed
opened 2013-11-27 03:15:12 +01:00 by yurivict · 13 comments
yurivict commented 2013-11-27 03:15:12 +01:00 (Migrated from github.com)

I accidentally had the 'disk full' condition on the disk where bitmessage stored data. After deleting large files and clearing this condition, bitmessage remained frozed with 0% CPU use. GUI was repainting, but not clickable.

All other running programs continued fine, but bitmessage.

I accidentally had the 'disk full' condition on the disk where bitmessage stored data. After deleting large files and clearing this condition, bitmessage remained frozed with 0% CPU use. GUI was repainting, but not clickable. All other running programs continued fine, but bitmessage.
AyrA commented 2013-11-29 13:20:43 +01:00 (Migrated from github.com)

The SQL qriter thread probably crashes and then SQL commands "stack up" and are no longer processed. I think, this has already been reported once but I do not know, if a solution was found.

The SQL qriter thread probably crashes and then SQL commands "stack up" and are no longer processed. I think, this has already been reported once but I do not know, if a solution was found.
Atheros1 commented 2013-12-01 00:15:57 +01:00 (Migrated from github.com)

yurivict, did you restart Bitmessage?

yurivict, did you restart Bitmessage?
yurivict commented 2013-12-01 01:00:57 +01:00 (Migrated from github.com)

I did.
I am just worried that it is not robust enough to survive disk full condition. I also doubt this is SQLite problem, since they are among the best tested packages around with extremely extensive QA suite.

Some SQL command must have failed in SQLite and BitMessage ignored the failure, etc. Or maybe this is Python binding issue.

Some strategy of handling disk full condition should be developed. The easiest way is to go offline, notify the user with the prominently displayed message (preferably of some other color), and to keep checking for disk full condition and come back when it is cleared. In many systems GUI fails to report disk full condition, and user only knows when something silently malfunctions.

I did. I am just worried that it is not robust enough to survive disk full condition. I also doubt this is SQLite problem, since they are among the best tested packages around with extremely extensive QA suite. Some SQL command must have failed in SQLite and BitMessage ignored the failure, etc. Or maybe this is Python binding issue. Some strategy of handling disk full condition should be developed. The easiest way is to go offline, notify the user with the prominently displayed message (preferably of some other color), and to keep checking for disk full condition and come back when it is cleared. In many systems GUI fails to report disk full condition, and user only knows when something silently malfunctions.
Atheros1 commented 2013-12-02 00:57:56 +01:00 (Migrated from github.com)

If Bitmessage is just sitting there doing something like syncing to the network and fills the disk, it is programmed to display an alert, wait for the user to hit "ok" and then immediately exit. If the instant the disk fills up happens to be when you are doing something in the UI which adds information to the database, like adding an address book entry, then the UI thread will be blocked while waiting for data from the SQL thread but it will never get it because the SQL thread will have already exited. The UI will thus will appear to freeze.
What is the best way to solve this? Is it really a common problem? Should the UI thread be programmed to checks a status variable to see if the query was successful in each place that we make one?

If Bitmessage is just sitting there doing something like syncing to the network and fills the disk, it is programmed to display an alert, wait for the user to hit "ok" and then immediately exit. If the instant the disk fills up happens to be when you are doing something in the UI which adds information to the database, like adding an address book entry, then the UI thread will be blocked while waiting for data from the SQL thread but it will never get it because the SQL thread will have already exited. The UI will thus will appear to freeze. What is the best way to solve this? Is it really a common problem? Should the UI thread be programmed to checks a status variable to see if the query was successful in each place that we make one?
yurivict commented 2013-12-02 01:17:12 +01:00 (Migrated from github.com)

While working with large files, I see the disk full problem at least twice a month. It is practically unavoidable.
80+% of software around was never ever tested for this condition.

Solution: do any data entry in a transaction-like way. You either succeeded to add the new address or failed. In case of failure GUI goes back to the "dirty-GUI" state when user just typed the entry but it wasn't saved yet.
Also when you get the packet with inventory records, do it in a transaction, if it failed then reject the whole packet and go offline with disk-full status. And SQLite, by the way, supports transactions.

While working with large files, I see the disk full problem at least twice a month. It is practically unavoidable. 80+% of software around was never ever tested for this condition. Solution: do any data entry in a transaction-like way. You either succeeded to add the new address or failed. In case of failure GUI goes back to the "dirty-GUI" state when user just typed the entry but it wasn't saved yet. Also when you get the packet with inventory records, do it in a transaction, if it failed then reject the whole packet and go offline with disk-full status. And SQLite, by the way, supports transactions.
yurivict commented 2014-02-16 02:04:05 +01:00 (Migrated from github.com)

Attaching the screenshot how disk full failure looks like now.

bm-disk-full-failure

Attaching the screenshot how disk full failure looks like now. ![bm-disk-full-failure](https://f.cloud.github.com/assets/271906/2179350/2df765ee-96a6-11e3-952a-73f151529b20.png)
PeterSurda commented 2015-10-18 12:09:03 +02:00 (Migrated from github.com)

I'll see if I can reproduce and fix this.

I'll see if I can reproduce and fix this.
yurivict commented 2016-06-29 00:35:35 +02:00 (Migrated from github.com)

Now I got disk full with 0.6.0
Problems:

  • It stops with not one but two pop messages. There is no need to have more than one
  • Can't close the popup, pressing the button doesn't do anything

Also, it doesn't need to stop permanently, because disk full condition can later go away, The correct behavior is to check the available disk space every minute, and resume operation if there is significant space available.

Now I got disk full with 0.6.0 Problems: - It stops with not one but two pop messages. There is no need to have more than one - Can't close the popup, pressing the button doesn't do anything Also, it doesn't need to stop permanently, because disk full condition can later go away, The correct behavior is to check the available disk space every minute, and resume operation if there is significant space available.
bmng-dev commented 2016-06-29 05:17:34 +02:00 (Migrated from github.com)

A disk full condition is commonly encountered when VACUUMing the database because according to the VACUUM documentation:

as much as twice the size of the original database file is required in free disk space

I think going forward VACUUM should be performed sparingly and only when the user allows it. This would require a non-modal notification/prompt. In place of the current usage of VACUUM, auto_vacuum could be set to incremental and incremental_vacuum performed (after ensuring freelist_count is greater than zero) whenever the user invokes 'Delete all trashed messages' and the cleaner thread purges the inventory of expired items. Unfortunately enabling auto_vacuum for existing users will require the use of VACUUM

All this does not resolve the issue of how PyBitmessage behaves when the disk is full but it should alleviate the occurance in the short term so a proper solution can be implemented.

A disk full condition is commonly encountered when VACUUMing the database because according to the [VACUUM documentation](https://sqlite.org/lang_vacuum.html): > as much as twice the size of the original database file is required in free disk space I think going forward VACUUM should be performed sparingly and only when the user allows it. This would require a non-modal notification/prompt. In place of the current usage of VACUUM, [auto_vacuum](https://sqlite.org/pragma.html#pragma_auto_vacuum) could be set to incremental and [incremental_vacuum](https://sqlite.org/pragma.html#pragma_incremental_vacuum) performed (after ensuring [freelist_count](https://sqlite.org/pragma.html#pragma_freelist_count) is greater than zero) whenever the user invokes 'Delete all trashed messages' and the cleaner thread purges the inventory of expired items. Unfortunately enabling auto_vacuum for existing users will require the use of VACUUM All this does not resolve the issue of how PyBitmessage behaves when the disk is full but it should alleviate the occurance in the short term so a proper solution can be implemented.
PeterSurda commented 2016-06-29 17:24:45 +02:00 (Migrated from github.com)

I'll look at this, but other than fixing the popup I probably won't do anything else (e.g. fiddling around with VACUUM). Have your computer notify you when you run low on disk space.

I'll look at this, but other than fixing the popup I probably won't do anything else (e.g. fiddling around with VACUUM). Have your computer notify you when you run low on disk space.
yurivict commented 2016-06-29 20:45:32 +02:00 (Migrated from github.com)

The user doesn't look at notifications all the time. I come back in a few hours, plenty of disk space is available, but BM displays the messages that can't be closed and needs to be killed.

For example, some runaway process will use all memory and get killed by the system, but BM will stop as a result.

The user doesn't look at notifications all the time. I come back in a few hours, plenty of disk space is available, but BM displays the messages that can't be closed and needs to be killed. For example, some runaway process will use all memory and get killed by the system, but BM will stop as a result.
PeterSurda commented 2016-06-29 22:04:40 +02:00 (Migrated from github.com)

Looking at the source, PyBitmessage is supposed to shutdown if disk is full, but I think it freezes instead because only the SQL thread ends while other keep running. A clean shutdown cannot happen because it needs to writes data.

If instead the SQL thread would wait, then the problem with inability to cleanly shutdown remains. The disk being full can also cause problems with updating keys.dat and known addresses when triggered by other means than during the shutdown. It also cannot log anything and even communicate correctly with other nodes because it cannot update its own inventory.

So to sum it up, I think that it should immediately quit and not even try to display the message, just like it does when it runs without the GUI.

Looking at the source, PyBitmessage is supposed to shutdown if disk is full, but I think it freezes instead because only the SQL thread ends while other keep running. A clean shutdown cannot happen because it needs to writes data. If instead the SQL thread would wait, then the problem with inability to cleanly shutdown remains. The disk being full can also cause problems with updating keys.dat and known addresses when triggered by other means than during the shutdown. It also cannot log anything and even communicate correctly with other nodes because it cannot update its own inventory. So to sum it up, I think that it should immediately quit and not even try to display the message, just like it does when it runs without the GUI.
PeterSurda commented 2016-06-29 22:12:00 +02:00 (Migrated from github.com)

@yurivict use other sysadmin tools to prevent disk being full, if notifications don't work then you can use quotas.

@yurivict use other sysadmin tools to prevent disk being full, if notifications don't work then you can use quotas.
This repo is archived. You cannot comment on issues.
No Milestone
No project
No Assignees
1 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Bitmessage/PyBitmessage-2024-11-28#572
No description provided.