2013-09-11

jones_supa writes
"The sudden death of a solid-state drive in Linus Torvalds' main workstation has led to the work on the 3.12 Linux kernel being temporarily suspended. Torvalds has not been able to recover anything from the drive. Subsystem maintainers who have outstanding pull requests may need to re-submit their requests in the coming days. If the SSD isn't recoverable he will finish out the Linux 3.12 merge window from a laptop."

Re:RAID

By Trogre



2013-Sep-11 18:09

• Score: 4, Informative
• Thread

You guys should really look at the --backup and --backup-dir options in rsync.

I use them in conjunction with --delete to always have a "current" copy of the data, along with any old files (ie that have been updated or deleted) in a separate backup folder, named after the current day of the month.

That way you get a directory structure as follows:
01
02
03
04 ...
31
Current

You can restore the up-to-date set from Current at any time, and if you want to retrieve a file you deleted or over-wrote five days ago, go look in folder 06.

RAID != Operating System

By dutchwhizzman



2013-Sep-11 18:17

• Score: 5, Interesting
• Thread

You have a software feature in a server OS that supports certain client OSes to do backups to the server. RAID may be a software feature, but even if it's "software raid", you often have BIOS bootable raids that even work with one of the drives missing. This essentially means that you can work OS agnostic on a lower level than "I have a backup system that works". For Linux, you can have a backup system too that will restore from a LiveCD/USB stick and stores on a remote server. The same amount of time roughly will be needed to backup and restore, differential, incremental, full backups, the works. The solution you are providing is really nothing comparable to RAID. It's fundamentally different because it works on a totally different layer, doesn't prevent downtime and it's not OS agnostic. RAID should prevent downtime, making working backups should prevent data loss. Maybe WHS is the shizniz, you rock for making actual backups, but other than that, your post is totally offtopic in this context and doesn't even begin to solve a problem that Linus was facing with his desktop.

I'm not modding you down, even though I have mod-points, but I'm telling you exactly why I think you shouldn't have posted this. I hope you learned something from it and in the future will implement both backups and RAID when unscheduled downtime is important. Maybe you would even implement a system that works for all relevant OSes in the environment you have to do it for, without relying on a single vendor that offers a closed source product. It's a risk that means you'll have to support their product and licencing and other requirements until the data isn't relevant anymore, even after you have migrated to a competing product.

Will never work with modern drives

By dutchwhizzman



2013-Sep-11 18:20

• Score: 5, Informative
• Thread

Modern drives for the last five years at least, have calibration factors for platter/head packs on the EEPROM on the controller board. If you swap boards, the board most likely won't be able to read the data on the disk, since it's not calibrated to the head/platter kit.

Re:RAID

By Miamicanes



2013-Sep-11 18:50

• Score: 5, Interesting
• Thread

The thing that really sucks about SSDs (at least, Sandforce-based drives) is the fact that 99% of their failures are due to firmware bugs that can be simultaneously triggered on an entire array at once (especially the sleep-related bugs). It's a mode of failure the creators of RAID 1, 5, and 10 never anticipated.

IMHO, the worst thing about SSDs (at least, those with Sandforce controllers) is the fact that they have mandatory full-drive encryption that can't be disabled, using a key you aren't allowed to set or recover, and gets blown away whenever you reflash the firmware. This means, among other things, if the drive's controller gets itself confused:

* You can't reflash data-recovery firmware onto the drive. The act flashing it would blow away the encryption key and render the data gone forever.

* If the drive decides you're trying "too hard" to systematically extract data from it while it's in a confused state, it'll go into "panic mode" by blowing away the encryption key. If this happens, your data is gone forever AND you have to send the drive back to OCZ or whomever you got it from in order to get it unlocked. For your protection, of course. And Hollywood's. Among other things, dd_rescue/ddrecover can trigger panic mode.

* You can't even do the equivalent of removing the platters from a conventional drive in a clean room and mount them to another drive for reading, because the data on the flash chips is all encrypted, and the key is unrecoverable.

This is BULLSHIT, and it's why I refuse to buy any more SSDs. I, as an end user, should be able to download a utility from somewhere, reflash the drive to firmware that includes an offline recovery mode that simply dumps the flash chip content from start to finish, and either disable the encryption or set it to a key *I* control, so the 99.99999% of the data on the drive that's good when the embedded firmware freaks out can be dumped and recovered offline.

If there's a God, Linus will go NUCLEAR over this, get a few seconds on CNN & other networks to rant about the unreliability of SSDs, and scare enough consumers to hit the industry HARD where it'll hurt the most... their bank accounts.

It might not be possible to make SSDs reliable, but DAMMIT, they should at least be RECOVERABLE. There were goddamn hard drives with recoverable data pulled out of laptops left in safes in the Vistamark hotel when a tower sheared it in half and buried it under flaming rubble, yet a SSD that dies if you so much as look at it the wrong way due to firmware bugs ends up being fundamentally unrecoverable for no hard technical reason.

And yes, I'm bitter about having my hard drive commit suicide for no reason besides Sandforce Business Policy. As long as they keep making controllers that cause drives to self-destruct at the drop of a hat, I'll keep doing my best to talk people out of buying drives tainted by their controller chips. Sandforce sucks.

Re:RAID

By Solandri



2013-Sep-11 19:39

• Score: 4, Informative
• Thread

I stopped using RAID in any of my systems after I started using WHSv1. WHS2011 has the same feature -- live system backups. If a drive fails, I pop in a new one (of any type/size), boot a CD that came with WHS (essentially a WinPE environment with a recovery software baked in), select my backup (I save 7-10 days -- I forget what it's set to), and in about an hour my system is back to the state of the last backup.

There's the operative phrase. RAID is for systems where you can't have or don't want an hour of downtime while restoring from a backup. The R in RAID stands for redundant. As in you can have a failure and keep going.

Note that this is the converse of "RAID is not a backup!" Just like RAID is not a replacement for a backup, a backup is not a replacement for RAID either. They do different things (and if you're smart, you will also backup your RAID). From your own description, you wanted a backup. RAID was never the correct solution for your needs.

Show more