We are working to fix it, and we have worked to fix it. We are using a replication method of backing up, which is to the second. This time, however the replication failed two days ago, and no one noticed. We now have a check in place that text messages either myself or Alex when the replication breaks. (There isn't an easy way to check. And it's not something you remember to do when you're checking things, because it doesn't show up, it LOOKS like it's working)Sunwolf wrote:How about backups taken in a timely manner, or taken correctly? At first it was "server crashed, so we're going to do a backup of a few seconds", then it was "ten minutes", and now it's "30 minutes." Fixing whatever the hell this problem is would go a long way in keeping user trust.
And, we also have a kernel level backup, that takes the backup and sends it to an off-server location, provided by this service. The problem was, that this service crashed when we tried to retrieve the backup, we called our technician, who had to call them, and that's how we got it up. We had a backup, and that's what the site came back up to. We don't just go 'boom' and the backup pops up though, we're talking about 20+GB of data that needs to be populated into a database. It doesn't take five minutes for MySQL to load all of that data.