You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-11-26 20:12:41.236 (MainThread) ERROR [chip.storage] Expecting ',' delimiter: line 29 column 111 (char 4111)
2024-11-26 20:12:41.236 (MainThread) CRITICAL [chip.storage] Could not load configuration from /data/chip.json - resetting configuration...
It looks like the configuration file was present, but had invalid data in it. The user was able to restore a backup correct the issue, but when I looked at the code to persist the storage: (edit: this code doesn't seem to be responsible for the user's issue, which happened in the Matter SDK code - but this code is incorrect.)
I noticed that the logic here is not power loss safe, and might result in a partially written or missing file for unlucky users.
The current logic does:
Rename chip.json to chip.json.backup. Following this, if power is lost, there may be no chip.json file present at all.
Write a new chip.json file. If the write is interrupted, or power is lost before the system flushes the write cache, a partially written or corrupt configuration file may be present on reboot, resulting in errors like this user saw.
I'd recommend the following logic instead:
(optional) Copy chip.json to chip.json.backup, leaving the existing file in place. (Ideally, run fsync on the newly written file to ensure the backup is updated before the original file gets replaced.)
Write the new data to a temporary file, like chip.json.new.
Run fsync (or fdatasync) on the newly written file to ensure the file contents are persisted on the disk. (see the notes on https://docs.python.org/3/library/os.html#os.fsync regarding flushing the python buffers). This is needed to ensure ordering - making sure that the file contents will be written before the file gets renamed.
Rename the temporary file to the final filename, atomically replacing the existing file.
(optional) Run fsync on the renamed file to ensure the rename is committed to storage. (This is only needed if data stored elsewhere - e.g. in a sqlite database - might depend on this data having been persisted)
The text was updated successfully, but these errors were encountered:
kepstin
changed the title
Persistent data storage writes are not power-loss safe
CHIP Persistent data storage writes are not power-loss safe
Nov 27, 2024
… Actually I think I'm looking at the wrong code here - the error messages don't match up. Is the code in matter-server/server/storage.py used? is it for something else?
Unfortunately, this code has the same issue, where a power loss during write (prior to the fsync) can result in an incompletely written file which will be corrupt on restart. It's arguably worse in a different way, since it doesn't remove/truncate the file before doing a write, so on a power loss you can get a mix of the old and new files present.
In this discord thread: https://discord.com/channels/330944238910963714/1311124568081174628 a user reported an issue where following a power loss, all of their matter devices disappeared. From the Matter server logs:
It looks like the configuration file was present, but had invalid data in it. The user was able to restore a backup correct the issue, but when I looked at the code to persist the storage: (edit: this code doesn't seem to be responsible for the user's issue, which happened in the Matter SDK code - but this code is incorrect.)
python-matter-server/matter_server/server/storage.py
Lines 149 to 163 in a55bb8b
I noticed that the logic here is not power loss safe, and might result in a partially written or missing file for unlucky users.
The current logic does:
chip.json
tochip.json.backup
. Following this, if power is lost, there may be nochip.json
file present at all.chip.json
file. If the write is interrupted, or power is lost before the system flushes the write cache, a partially written or corrupt configuration file may be present on reboot, resulting in errors like this user saw.I'd recommend the following logic instead:
chip.json
tochip.json.backup
, leaving the existing file in place. (Ideally, run fsync on the newly written file to ensure the backup is updated before the original file gets replaced.)chip.json.new
.The text was updated successfully, but these errors were encountered: