Synology Crash Recovery

So, you own a Synology Disk Station and a disk broke. You messed up by trying to replace the wrong disk. You tried to fix it by reinserting the right disk but that didn’t help. Then you messed up even further by trying to change the volume – or something. You can’t even remember exactly just WHAT you did because of the shock, afterwards, when you found out the system just stopped working.
Well, here’s how to proceed. Sort of. It’s not so much of a step by step guide, but it could help you get your data back. This is not magic and you should not rely on it if you don’t know Linux. It stems from helping a friend with his Synology, when he had tried very hard to get rid of his data. He even had some backups and I bet you don’t. By the way, if you do know Linux, you should still not rely on my information: you rely on your own knowledge, and keep an eye on the information below, strictly as a guide line. If something goes wrong – YOU are the one to blame: for taking the following silly advice from the internet. Also, it is work in progress, the story ends without real help now, but I hope to be able to write more in the coming months – let’s say this should be ready by march 2015. I will tell you how to recover from a wiped out raid configuration and a changed volume group, but it takes time. If you need more information NOW, please contact my company, openoffice.nl, and I’ll see what I can do (I can obviously spend more time writing about rescueing raid arrays when it’s paid).

Step 0. Get some tools. You are going to need: a Linux computer; a whole lot of Linux knowledge (a Disk Station could be described as a nicely polished Linux box with a web frontend). Please be aware that this is a sort of Synology Disk Station recovery guide, but unfortunately, not a Linux For Starters handbook. Get enough SATA cables (data and power) to attach all the disks from your Synology to the several SATA controller ports that you are also going to need; and a paint marker and/or a camera of some sort for administrative use. You will also need enough storage to store your salvaged files. In the example below, I will rely rather heavily on a LVM group named “vg0”, to be able to make temporary storage. Oh, get a piece of paper and a pen too. Make sure that your Linux computer is OK. If you have, for example, a broken power supply, a flakey SATA controller, a nasty power grid or other stuff that may mess up recovery – please use your ejection seat now and get someone with the proper tools in your pilot seat. The last thing you want is a recovery that destroys yet another disk.

Step 1a. Stop doing things – to be able to start thinking. Well, there could be one more thing you should do, which is turn off the power of your Synology, because it will prevent you from doing more silly things. I can’t think of any reason your Synology should keep running, but beware: you are doing this on your own account – i.e. if something goes wrong, don’t blame me.

Step 1b. Write down as much as you can remember about what happened. Mark the disks in your Synology (1, 2, 3, …), mark the cables (if there are any, which is the case in older disk stations, newer ones have a backplane). Take photos. Make sure you document as much as you can, before you forget what you were doing. Maybe even print this web site, because I won’t guarantee its uptime and I don’t take phone calls at night.

Step 2. Install software. Make sure your Linux computer has all the packages you need. You are probably going to need mdadm, lvm and friends (lvm2 package), smartctl (smartmontools package); dmsetup, kpartx and if you really, really messed up, you might want to install “foremost”. Also, there is a chance you are going to have to compile the mdadm package by hand – but more on that later.

Step 3. Collect disk information You have all your packages ready. Turn off your Linux computer. Connect Synology Disk 1. Turn on your Linux computer. Let it start. Don’t be alarmed by messages that there’s an invalid RAID array. If the new disk just sits there, starts clicking and doesn’t show up in the kernel log, this disk is probably toast. Anyway, let’s suppose there is a working disk. Find out the name of the disk you just connected (for example, if your root FS is /dev/sda, your Synology disk might be sdb, but, depending on controller and port, it could also be sdq or sdh – or anything else). Let’s say it’s disk sdb. We’re going to use $DISK for this, so type ‘export DISK=/dev/sdb’ and ‘export RAIDPARTITION=/dev/sdb5’, then proceed, with sudo or as root:

#First, check its health:
smartctl -a $DISK
# see if there is a partition table
fdisk -l $DISK
# and see if there is an array on partition 5
mdadm --examine $RAIDPARTITION

If all is well, the output from smartctl would show SMART overall-health self-assessment test result: PASSED on top of the SMART DATA SECTION, fdisk will probably show partitions 1, 2 and 5, and mdadm output will show output that starts with “Magic” and ends with an Array State. If smartctl doesn’t show anything, the disk is probably still toast (clicking or not), but you might want to check, and re-check your cables and controller.

Output from smartctl is explained elsewhere on the internet better; but you can keep an eye on the raw values for “Reported_Uncorrect” (which is the number of uncorrectable sectors reported back to the Synology), “Current_Pending_Sector” (which is also uncorrectable sectors, but the ones your disk has stubbornly decided to keep trying for a while. If “Reported_Uncorrect” are dead sectors, you can think of “Current_Pending_Sectors” as “resting” – in the Blue Parrot sense, that is).

If mdadm shows output (and I hope it does), then mark down the values of “Raid Level” (raid1, raid5, raid6), “Device Role”, “Array State” and “Events”. On a valid array, all “Events” numbers are equal. On an invalid array, there are a couple of disks that have a higher “events” number than others. As far as I know, higher means newer here. Also, you can see from the “Array State” if Disk 1 thinks the array is OK. A RAID5 array can have 1 missing disk, a RAID6 array can have 2; RAID1 is mirroring and RAID0 (you don’t have that, do you?) doesn’t have any redundancy. So “.AA.” for RAID6 is OK, but for RAID5, it’s not.

If the array shows “.AA.” and you have raid 5, it’s not the end of the world; there is still a chance that you can get your data back, IF there are enough disks that can be read from.

Step 4. Repeat step 3 for every disk you have.

Step 5. Administration. You should have a pretty good idea about how things are now. If you have a raid 5 array and you have just one disk missing, nothing is wrong. If you have 2 disks missing, but all disks are readable, check the “Events” count. If they are alike or if they are alike on three out of four disks, your data can be recovered. Same for raid 6 and two missing disks.

Step 6. Attaching all disks. Turn off your Linux machine again, attach all disks that you are going to use and turn the Linux machine back on again. If initrd asks if it should start broken arrays, answer NO. ANSWER. NO. Thank you. After Linux has started, you must first make sure that nothing will write on the disks. For sake of brevity, I’ll semi-script this step, below. The script supposes that you have sdb, sdc, sde and sdf for Synology Disk 1, Disk 2, Disk 3 and Disk 4.

# let's use variables
disk1=/dev/sdb
disk2=/dev/sdc
disk3=/dev/sdd
disk4=/dev/sde
# first, forbid everyone from writing to our disks
blockdev --setro $disk1
blockdev --setro $disk2
# repeat for $disk3 and $disk4
#
# now make a few devices where we can write to
lvcreate --size 100G --name 'disk1_writedev' vg0
lvcreate --size 100G --name 'disk2_writedev' vg0
# repeat for $disk3 and $disk4
#
# here comes the magic trick we use dmsetup to "clone" our disks
# first, get the disk size
disk1size=$(blockdev --getsz $disk1)
echo "0 $disk1size snapshot $disk1 /dev/vg0/disk1_writedev n 4"|dmsetup create disk1
disk2size=$(blockdev --getsz $disk2)
echo "0 $disk2size snapshot $disk2 /dev/vg0/disk2_writedev n 4"|dmsetup create disk2
# repeat for disks 3 and 4

Step 7. Activate the partitions on the disks (if any). Use kpartx to add the partitions to the file system: kpartx -av /dev/mapper/disk1. Repeat for disk2, disk3 and disk4. This will give you /dev/mapper/disk1p1 (the Synology root partition), disk1p2 (extended partition) and disk1p5 (raid partition).

Step 8. A working raid set? Suppose you found out that you actually have a working RAID set with enough disks. You can now try to assemble the array, with
mdadm. Let’s say that disk1 and disk2 have an “Events” count of 764441; disk3 has a count of 764439 and disk4 is totally inaccessible. We’ll assemble the array by adding disk 1, 2 and 3:
mdadm --assemble --run --force /dev/md99 /dev/mapper/disk1 /dev/mapper/disk2 /dev/mapper/disk3
If all goes well, you’ll see something like:
mdadm: forcing event count in /dev/mapper/disk3(3) from 764439 upto 764441
mdadm: clearing FAULTY flag for device 3 in /dev/md99 for /dev/mapper/disk3
mdadm: Marking array /dev/md99 as 'clean'
mdadm: /dev/md99 has been started with 3 drives (out of 4).

So it seems we’re back in business – as far as the raid is concerned, that is.

Warning. The raid device we have started is only partially writable: in step 6, we created 400G in total for writable storage. This is more than enough to test stuff, repair the FS and do other things, but it won’t be enough for a raid resync! So please please do not try to assemble an array with more than the minimum number of disks, i.e. if you have raid6 with 4 disks, only add 2 to the array here; if you have raid 5 and 4 disks, only add 3. DO NOT TRY TO RESYNC as the data simply won’t fit our writable devices.

So far, so good. If your array is up and running, you may want to check with pvs, vgs and lvs if you see a volume group and volumes. If not, wait here for instructions. We’ll cover both I can’t find any raid information on the disks and my volume group has disappeared completely. Oh, in the mean time, you may want to check out https://raid.wiki.kernel.org/index.php/RAID_Recovery.

To be continued.

Tags: , , , , , , ,

Comments are closed.