-
Notifications
You must be signed in to change notification settings - Fork 309
Disk storage and backup #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Perhaps it's a bit early for me to jump in on this before the people more involved in the project, but have you had a look at the backup script that takes a db snapshot (~/IOTstack/scripts/backup_influxdb.sh) mentioned here? |
@petinox - you are spot on. @stevenveenma - this is a response to your second dot point. I'm still thinking about your first dot point.
It is rarely safe to backup databases at the file system level while the DB engine is active. That's because a single transaction will often involve multiple writes to many files (journals, actual inserts/deletes/modifies, index updates, triggered actions). It's really only the DB engine that fully understands the "state" at any given moment. I used to think it would always be safe if I stopped the DB engine before copying the file system but a long-time DBA once told me that you could be still caught out at restore time if the DB engine was taking shortcuts with iNodes. Whether that's actually true is an open question.
All the database packages I've ever used offer their own internal support for backup and restore.
In the case of InfluxDB and in the particular context of IOTstack, the journey begins at the "influxdb" section of "docker-compose.yml", where you will find this "volumes" definition:
In words: the absolute path /var/lib/influxdb/backup inside the container is mapped to the relative path ./backups/influxdb/db outside the container, where the leading "." implies "the directory containing docker-compose.yml" and, accordingly, means the (almost) absolute external path:
That directory is where the output from InfluxDB backups turns up, and where you need to place any backups you want to restore. To take a manual backup of your InfluxDB databases, do this:
The reason for the (somewhat dangerous) "sudo rm" is because the backup process does not manage the backup directory for you. Every backup produces a complete set of files with a unique timestamp prefix. You get a mess if the backup directory isn't empty before you start.
Once the backup directory has been populated, the simplest way to proceed is to "tar" its contents. Something like:
Restoring your InfluxDB databases involves:
I posted my own backup and restore scripts in response to a Discord question. They are at Paraphraser/IOTstackBackup. They are specific to my needs and are not intended as a general purpose solution, just a source of ideas. I run my backup once a day as a cron job. I routinely restore "the most-recent backup" several times a week to a "test" RPi so I know it works. In thinking about your NAS arrangement, I might try something like:
A few things would worry me:
Anyway, I hope this helps you get a bit further down the track. |
Here are some musings on your first dot point. I'm running from a 450GB SSD so I don't have quite the same space constraints but, even so, I keep a careful watch on where space is disappearing. As you've narrowed it down to Docker, this is what I would try next. Start with:
You'll get something like this:
Whenever you do a:
or, if you're updating Node-Red:
and a new image comes down (or a new iotstack_nodered is built) you're going to see double-entries such as in the above for Mosquitto and Grafana. You will want to delete the obsolete versions (marked "<none>") by using the IMAGE ID, as in:
Sometimes, when you do that, you'll get an error like this:
and you solve that by:
then retry the Any time you have ever needed to use
The desired response from that command is an empty list, as in:
If you get anything in the list, first try:
which will remove whatever it can remove. Retry
That may chuck up a dependency, which you deal with by trying Each time you are successful in removing something, retry the:
Eventually, you'll achieve an empty list. If what you're experiencing with space going walkabout is due to any/all of the above then you'll probably find, like I did, that you suddenly have a whole lot of extra space. In theory there are "prune" commands that automate all of this but my experience with those has been, shall we say, less than stellar, so I'm sticking with the primitive commands as per the above. |
Thank you very much for your very elaborate answers. It will take me some time to interpret all the information and test things. I will first focus on the disk storage problem so that the operation of the RPI is guaranteed. I did the docker images command and found 7 images each containing 715MB with the none tag that I deleted. I rebooted but the filesystem still was completely occupied Then I did docker volume ls. Two volumes show up. But unfortunately I couldn't remove them using your instructions as the volume is in use. I searched for instructions and found docker container stop $(docker container ls -aq). Then I did the prune command on volumes and the ls was empty. After rebooting df showed that the volume has 50% of its space restored! But..... now docker doesn't start any more. I try to connect to portainer but nothing. I found out that an existing docker.pid file might obstruct the restart of docker daemon. So I renamed it and rebooted. Now Grafana showed up but with many errors. Portainer and the other applications don't show up. I messed things up. Enough for today. |
I mentioned I had tried the
When you run it, it responds:
I've run it on all of my Pis. Two (RPi4s) were all in a "clean" state as per my earlier reply so it found nothing to do but on an RPi3 (running from SD) that I knew had some dangling images, it promptly reclaimed 150MB. No muss, no fuss. It remains to be seen whether it really goes all-the-way and cleans up dangling volumes - are those implied in "build cache"? If it does, it will be a very useful command. Also take a look at:
|
I was disappointed that your instructions on this didn't lead to the right result (released space). So when I found prune instructions on https://linuxize.com/post/how-to-remove-docker-images-containers-volumes-and-networks/ I tried this. A rough road for a task that seems so easy. Ok, my docker seems to be broken now. Any chance to repair this or should I just reinstall the whole thing? An image and docker should be done easy but restoring all scripts and settings need some attention. I will try to make a backup in advance and then see if this can be restored without issues. |
I'm not sure I understand. just worked through every "prune" in that linuxize link you provided (thanks) but my three RPis each responded with "Total reclaimed space: 0B". I had taken Are you saying that I think the answer to your second question depends on how you've been taking backups. Graham Garner's backup script(s) produce a single tar.gz which contain the current docker-compose.yml, everything in services, everything in volumes with the exception of influxdb and nextcloud, plus the result of telling influxdb to dump its databases. If you have been running those as well then you should be able to extract the contents and move those into place after you've done a clean install. You'll probably want to take a look at my restore script (link earlier in this issue) to get some ideas of how to proceed (mainly the approach to preserving permissions, and the how-to of reloading influxdb. I've been tinkering with those scripts and have realised something else. When docker-compose does its thing, it gets upset if anything referred to in the "services" area isn't present. Conversely, it auto-creates anything referred to in the "volumes" area and, in the case of influx, that includes the path ~/IOTstack/backups/influxdb/db. The problem is that it gets the permissions wrong (backups needs pi:pi, while influxdb & db need root:root). I think that's something to be aware of if you're trying a bare-metal restore. If you do nothing else, you'll probably want to:
I'm not sure whether I said this before but a backup omits volumes/influxdb so there's nothing there when volumes is moved into place on a restore. Bringing the influxdb container up creates volumes/influxdb/data and then the daemon running inside the container initialises some empty structures. Then it's ready to be told to restore from backups/influxdb/db. Once or twice when I've been testing things I've done this:
and I have never had any trouble with either that instantiation, or when I blow it away and put the IOTstack.off back into place. I think it's pretty robust, all things considered. If you don't have what I will call a "classic" backup and everything is in your NAS then, if it were me, I'd probably do a clean checkout, configure things how I wanted, then start pulling stuff back from the NAS, resolving ownership and permissions mismatches in favour of what I saw in the clean install. You could try recovering InfluxDB from NAS (in the sense of copying stuff into volumes/influxdb/data) but I'll be extremely surprised if it actually works. Pleased for you. But still surprised.
|
Sorry for my late reply, it have been some rough weeks. I just installed docker again with your instructions and at least I have pi-hole working now. If I have some more time I will dive into influxdb and mqtt again. Thanks so far. |
I have had similar problems running out of disk space from original IOTstack. One thing I think to note is that if you run out of disk space then some docker instances will not be running and could then be removed using the Prune command. I was able to use the the scripts to refresh the docker containers and with backup to get running again. |
Stale issue. Can be closed. |
First of all great that you forked this initiative so that this is still alive!
I've been using the IOTstack since the fall of 2019 and build various MQTT readings, Python scripts that write to Influxdb and Grafana visualizations. Recently I got some issues.
The text was updated successfully, but these errors were encountered: