If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Problems with Limux software RAID after OS upgrade (long)
Having some trouble with Linux software RAID after an OS update, and would be grateful for any insights. Machine is an AMD 64-bit PC running 32-bit Linux. The machine was previously running Fedora Core 4 with no problems. Two 500GB hard drives were added to the onboard Promise controller and the Promise section of the machine's BIOS configured for JBOD. On boot, as expected, two new SCSI disk devices could be seen - sda and sdb. These were partitioned using fdisk, a single partition occupying the entire disk created, and the partition type set to 0xfd (Linux RAID autodetect). mdadm was used to create a RAID1 (mirror) using /dev/sda and /dev/sdb. I can't remember for certain if I used the raw devices (/dev/sda) or the partitions (/dev/sda1) to create the array, and my notes aren't clear. The resulting RAID device, /dev/md0, had an ext3 filesystem created on it and was mounted on a mount point. /etc/fstab was edited to mount /dev/md0 on boot. This arrangement worked well until recently, when the root partition on the (separate) boot drive was trashed and Fedora Core 6 installed by someone else, so I have only their version of events to go by. The array did not reappear after FC6 was installed. The /etc/raidtab and/or /dev/mdadm.conf files were not preserved, so I am working blind to reassemble and remount the array. Now things are confused. The way Linux software RAID works seems to have changed in FC6. On boot, dmraid is run by rc.sysinit and discovers the two members of the array OK and mounts it on /dev/mapper/pdc_eejidjjag, where pdc_eejidjjag is the array's name: [root@linuxbox root]# dmraid -r /dev/sda: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0 /dev/sdb: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0 [root@linuxbox root]# dmraid -ay -v INFO: Activating mirror RAID set "pdc_eejidjjag" ERROR: dos: partition address past end of RAID device [root@linuxbox root]# ls -l /dev/mapper/ total 0 crw------- 1 root root 10, 63 Jul 5 16:59 control brw-rw---- 1 root disk 253, 0 Jul 6 03:11 pdc_eejidjjag [root@linuxbox root]# fdisk -l /dev/mapper/pdc_eejidjjag Disk /dev/mapper/pdc_eejidjjag: 500.0 GB, 500000000000 bytes 255 heads, 63 sectors/track, 60788 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device /dev/mapper/pdc_eejidjjag1 Boot Start 1 End 60801 Blocks 488384001 Id fd System Linux raid autodetect I cannot mount /dev/mapper/pdc_eejidjjag1: [root@linuxbox root]# mount -v -t auto /dev/mapper/pdc_eejidjjag1 /mnt/test mount: you didn't specify a filesystem type for /dev/mapper/pdc_eejidjjag1 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: special device /dev/mapper/pdc_eejidjjag1 does not exist 'fdisk -l /dev/mapper/pdc_eejidjjag' shows that one partition of type 0xfd (Linux raid autodetect) is filling the disk. Surely this should be type 0x83, since the device is the RAIDed disk as presented to the user? And why does mount say the device /dev/mapper/pdc_eejidjjag1 does not exist? This may be due to my unfamiliarity with dmraid. I can find little about it on the internet. I'm uncertain if it is meant to be used in conjunction with mdadm, or whether it's either/or. In the past, Linux software RAID has Just Worked for me using mdadm. If I disregard dmraid, disabling the array with 'dmraid -an /dev/md0' and use the more familiar mdadm instead, first checking with fdisk that the disks have the correct RAID autodetect partitions: [root@linuxbox root]# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60801 488384001 fd Linux raid autodetect [root@linuxbox root]# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60801 488384001 fd Linux raid autodetect then try to assemble the RAID with those, it fails: [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda1 /dev/sdb1 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sda1: No such device or address mdadm: /dev/sda1 has no superblock - assembly aborted Perhaps I should be using the raw devices? [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda /dev/sdb mdadm: looking for devices for /dev/md0 mdadm: /dev/sda is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1. mdadm: added /dev/sdb to /dev/md0 as 1 mdadm: added /dev/sda to /dev/md0 as 0 mdadm: /dev/md0 has been started with 2 drives. [root@linuxbox root]# mdadm -E /dev/sda /dev/sda: Magic : a92b4efc Version : 00.90.01 UUID : c4344083:a8d8cf32:3f00e0db:8765b21b Creation Time : Thu Mar 22 15:26:52 2007 Raid Level : raid1 Device Size : 488386496 (465.76 GiB 500.11 GB) Array Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu Jul 5 16:58:02 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 864ad759 - correct Events : 0.4 Number Major Minor RaidDevice State this 0 8 0 0 active sync /dev/sda 0 0 8 0 0 active sync /dev/sda 1 1 8 16 1 active sync /dev/sdb [root@linuxbox root]# mdadm -E /dev/sdb /dev/sdb: Magic : a92b4efc Version : 00.90.01 UUID : c4344083:a8d8cf32:3f00e0db:8765b21b Creation Time : Thu Mar 22 15:26:52 2007 Raid Level : raid1 Device Size : 488386496 (465.76 GiB 500.11 GB) Array Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu Jul 5 16:58:02 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 864ad76b - correct Events : 0.4 Number Major Minor RaidDevice State this 1 8 16 1 active sync /dev/sdb 0 0 8 0 0 active sync /dev/sda 1 1 8 16 1 active sync /dev/sdb so that looks OK. Let's see what /dev/md0 looks like: [root@linuxbox root]# fdisk -l /dev/md0 Disk /dev/md0: 500.1 GB, 500107771904 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/md0p1 1 60801 488384001 fd Linux raid autodetect That doesn't look right; I would have expected to see a partition of type 0x83, since /dev/md0p1 is the RAID as presented to the user according to fdisk. Trying to mount it anyway: [root@linuxbox root]# mount -v -t auto /dev/md0 /mnt/test mount: you didn't specify a filesystem type for /dev/md0 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: you must specify the filesystem type [root@linuxbox root]# mount -v -t auto /dev/md0p1 /mnt/test mount: you didn't specify a filesystem type for /dev/md0p1 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: special device /dev/md0p1 does not exist mdadm --examine /dev/sd* shows both members of the array as correct, with the same serial number. "cat /proc/mdstat" shows the array as complete and OK with two members as expected. /proc/mdstat shows: [root@linuxbox root]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda[0] sdb[1] 488386496 blocks [2/2] [UU] unused devices: none I'm confused. I can't find much information on dmraid; the man page seems to imply that it's for use with hardware RAID controllers, and I don't know if I should be using that or mdadm, or both. Previously I just used mdadm and everything Just Worked. I don't know why assembling and starting the array doesn't present the contents of the md device as expected, and why fdisk shows special devices in /dev which the mount command says don't exist. The user of the machine is getting worried as there's a lot of data on this array, and of course, he has no backup. I'm at the point of taking the disks out and trying them in a machine running FC4. Any ideas or suggestions please before I do that? -- (\__/) Bunny says NO to Windows Vista! (='.'=) http://www.cs.auckland.ac.nz/~pgut00...ista_cost.html (")_(") |
#2
|
|||
|
|||
Problems with Limux software RAID after OS upgrade (long)
In comp.sys.ibm.pc.hardware.storage Mike Tomlinson wrote:
Having some trouble with Linux software RAID after an OS update, and would be grateful for any insights. Machine is an AMD 64-bit PC running 32-bit Linux. The machine was previously running Fedora Core 4 with no problems. Two 500GB hard drives were added to the onboard Promise controller and the Promise section of the machine's BIOS configured for JBOD. I assume that is individual disks, instead of the JBOD "RAID" mode? On boot, as expected, two new SCSI disk devices could be seen - sda and sdb. These were partitioned using fdisk, a single partition occupying the entire disk created, and the partition type set to 0xfd (Linux RAID autodetect). Ok. mdadm was used to create a RAID1 (mirror) using /dev/sda and /dev/sdb. I can't remember for certain if I used the raw devices (/dev/sda) or the partitions (/dev/sda1) to create the array, and my notes aren't clear. That is important. With partitions the RAID would start automatically because of type 0xfd. With whole drives it woulrd not and require some start script. Also the partitioning left on the disks if you used the whole disk will confuse RAID auto-detectors. The resulting RAID device, /dev/md0, had an ext3 filesystem created on it and was mounted on a mount point. /etc/fstab was edited to mount /dev/md0 on boot. ok. This arrangement worked well until recently, when the root partition on the (separate) boot drive was trashed and Fedora Core 6 installed by someone else, so I have only their version of events to go by. The array did not reappear after FC6 was installed. The /etc/raidtab and/or /dev/mdadm.conf files were not preserved, so I am working blind to reassemble and remount the array. Should not be a problem. If you try to reassemble, any part not having a valid RAID signature will be rejected. Now things are confused. The way Linux software RAID works seems to have changed in FC6. On boot, dmraid is run by rc.sysinit and discovers the two members of the array OK and mounts it on /dev/mapper/pdc_eejidjjag, where pdc_eejidjjag is the array's name: Hmmm. From what I can see dmraid is not intended for normal software RAID, but rather for fakeRAID controllers (software RAID done by BIOS code). It may also be able to handle normal software RAID, but I have never used it. [root@linuxbox root]# dmraid -r /dev/sda: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0 /dev/sdb: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0 [root@linuxbox root]# dmraid -ay -v INFO: Activating mirror RAID set "pdc_eejidjjag" ERROR: dos: partition address past end of RAID device [root@linuxbox root]# ls -l /dev/mapper/ total 0 crw------- 1 root root 10, 63 Jul 5 16:59 control brw-rw---- 1 root disk 253, 0 Jul 6 03:11 pdc_eejidjjag [root@linuxbox root]# fdisk -l /dev/mapper/pdc_eejidjjag Disk /dev/mapper/pdc_eejidjjag: 500.0 GB, 500000000000 bytes 255 heads, 63 sectors/track, 60788 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device /dev/mapper/pdc_eejidjjag1 Boot Start 1 End 60801 Blocks 488384001 Id fd System Linux raid autodetect I cannot mount /dev/mapper/pdc_eejidjjag1: [root@linuxbox root]# mount -v -t auto /dev/mapper/pdc_eejidjjag1 /mnt/test mount: you didn't specify a filesystem type for /dev/mapper/pdc_eejidjjag1 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: special device /dev/mapper/pdc_eejidjjag1 does not exist 'fdisk -l /dev/mapper/pdc_eejidjjag' shows that one partition of type 0xfd (Linux raid autodetect) is filling the disk. Surely this should be type 0x83, since the device is the RAIDed disk as presented to the user? And why does mount say the device /dev/mapper/pdc_eejidjjag1 does not exist? Because this works differently. The problem is that the check for partitions is done by the pernel. Itt seems thet it is done before assembly of the RAID array, and hence no partition discovery is done for it. This may be due to my unfamiliarity with dmraid. I can find little about it on the internet. I'm uncertain if it is meant to be used in conjunction with mdadm, or whether it's either/or. In the past, Linux software RAID has Just Worked for me using mdadm. By all means go back to mdadm. dmraid has no business being run automatically. The people that configured it that way screwed up IMO. If I disregard dmraid, disabling the array with 'dmraid -an /dev/md0' and use the more familiar mdadm instead, first checking with fdisk that the disks have the correct RAID autodetect partitions: [root@linuxbox root]# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60801 488384001 fd Linux raid autodetect [root@linuxbox root]# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60801 488384001 fd Linux raid autodetect then try to assemble the RAID with those, it fails: [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda1 /dev/sdb1 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sda1: No such device or address mdadm: /dev/sda1 has no superblock - assembly aborted Perhaps I should be using the raw devices? [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda /dev/sdb mdadm: looking for devices for /dev/md0 mdadm: /dev/sda is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1. mdadm: added /dev/sdb to /dev/md0 as 1 mdadm: added /dev/sda to /dev/md0 as 0 mdadm: /dev/md0 has been started with 2 drives. So you definitely used the whole devices (a mistake with software RAID IMO, but you can do it), and the partition tables are only left because they have not yet been overwritten. They do confuse the autodetection script, though. [root@linuxbox root]# mdadm -E /dev/sda /dev/sda: Magic : a92b4efc Version : 00.90.01 UUID : c4344083:a8d8cf32:3f00e0db:8765b21b Creation Time : Thu Mar 22 15:26:52 2007 Raid Level : raid1 Device Size : 488386496 (465.76 GiB 500.11 GB) Array Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu Jul 5 16:58:02 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 864ad759 - correct Events : 0.4 Number Major Minor RaidDevice State this 0 8 0 0 active sync /dev/sda 0 0 8 0 0 active sync /dev/sda 1 1 8 16 1 active sync /dev/sdb [root@linuxbox root]# mdadm -E /dev/sdb /dev/sdb: Magic : a92b4efc Version : 00.90.01 UUID : c4344083:a8d8cf32:3f00e0db:8765b21b Creation Time : Thu Mar 22 15:26:52 2007 Raid Level : raid1 Device Size : 488386496 (465.76 GiB 500.11 GB) Array Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Thu Jul 5 16:58:02 2007 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 864ad76b - correct Events : 0.4 Number Major Minor RaidDevice State this 1 8 16 1 active sync /dev/sdb 0 0 8 0 0 active sync /dev/sda 1 1 8 16 1 active sync /dev/sdb so that looks OK. Let's see what /dev/md0 looks like: [root@linuxbox root]# fdisk -l /dev/md0 Disk /dev/md0: 500.1 GB, 500107771904 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/md0p1 1 60801 488384001 fd Linux raid autodetect You do not have that partition! Unless you did partition /dev/md0? If not, this is leftover junk from your first partitioning that you then did not use. It confises dmraid and should be removed, see below. That doesn't look right; I would have expected to see a partition of type 0x83, since /dev/md0p1 is the RAID as presented to the user according to fdisk. Trying to mount it anyway: [root@linuxbox root]# mount -v -t auto /dev/md0 /mnt/test mount: you didn't specify a filesystem type for /dev/md0 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: you must specify the filesystem type [root@linuxbox root]# mount -v -t auto /dev/md0p1 /mnt/test mount: you didn't specify a filesystem type for /dev/md0p1 I will try all types mentioned in /etc/filesystems or /proc/filesystems Trying hfsplus mount: special device /dev/md0p1 does not exist mdadm --examine /dev/sd* shows both members of the array as correct, with the same serial number. "cat /proc/mdstat" shows the array as complete and OK with two members as expected. /proc/mdstat shows: [root@linuxbox root]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sda[0] sdb[1] 488386496 blocks [2/2] [UU] unused devices: none I'm confused. I can't find much information on dmraid; the man page seems to imply that it's for use with hardware RAID controllers, and I don't know if I should be using that or mdadm, or both. Previously I just used mdadm and everything Just Worked. I don't know why assembling and starting the array doesn't present the contents of the md device as expected, Why, but it does? You said that you created an ext3 on it, so why not just mount /dev/md0 directly? I think you have indeed gotten a bit confused (understandably. And maybe a bit panicked too...), and may have forgotten what you said at the top of this posting ;-) and why fdisk shows special devices in /dev which the mount command says don't exist. The mount command does say they exist. However it cannot ID the filesystem on them. No wonder, since there isn't one there. The user of the machine is getting worried as there's a lot of data on this array, and of course, he has no backup. Well, allways the same story. There is no excuse for not having backup... I'm at the point of taking the disks out and trying them in a machine running FC4. Any ideas or suggestions please before I do that? Mount /dev/md0 directly. It should have your ext3. However it is important that you remove the bogus partition table. Easiest way to do that is as follows: 0. (Optionally) disable unhelpful dmraid boot script 1. Get the thing to work again, then make full backup. 2. Degrade the array by setting sdb as faulty 3. remove sdb from array 4. Partition sdb with one large partition of type 0xfb Reboot if fdisk could not get th kernel to reload the partition table. 5. make a degraded raid 1 on /dev/sdb1 as md1 (specify the second disk as "missing" to mdadm) 6. make filesystem on /dev/md1 and copy all data over from /dev/md0 7. stop /dev/md0, and create similar partition to sdb on sda Reboot if fdisk told you it could not reload the patrtition table. 8. Add /dev/sda1 to /dev/md1 9. Adjust /etc/fstab as needed You should not have a partition on sda and one on sdb, both set to be auto-started as /dev/md1 by the kernel. BTW, you can do this whole operation with a Knoppix CD or memory stick, you just need to load the RAID kernel modules manually. Arno |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
problems with long VGA cable | Markus.Humm | Nvidia Videocards | 6 | December 19th 06 08:07 PM |
Raid 0 Long Term USe Questions. | Kylie Saunders | Storage (alternative) | 11 | March 23rd 06 04:51 AM |
Any P4P800-E Deluxe PATA RAID vs. XP software RAID benchmarks? | Shawn Barnhart | Asus Motherboards | 0 | July 21st 04 05:14 PM |
Master Restore CD + Different Computer Upgrade. (long). | W. Orr | Packard Bell Computers | 1 | February 2nd 04 11:41 PM |
Free upgrade to Nero v6 - How long? | Dave | Cdr | 5 | August 30th 03 04:21 PM |