If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Question: Reliability of physical snapshots on SANs
Hello!
As far as I know some SAN solutions are able to snapshot a 'partition' on hardware level. So after performing the snapshot (which takes a second or so) directly on the SAN, I see partition P and the snapshot P'. Changes to P are not visible in the snapshot P' and vice versa. Now my problem is: Because the SAN doesn't know what kind of file system is stored on the partition it cannot know if the file system on partition P is consistent when the snapshot is done. So why does everybody tell me, that the file system P' will ALWAYS be ok.? If I don't make sure (on the computer using Partition P) that no (sensitive) 'control data' is send while the snapshots starts on the SAN I might get a corrupted file system! E.g. I imaging the following scenario: time 1 File system on P is fine 2 control data of P is updated (e.g send in two scsi blocks) block O1 and O2 need to be replaced with N1 and N2. 3a First block N1 of control data is send to SAN 4a SAN starts snapshot partition P 3b Second block N2 of control data is send to SAN 4b SAN sets up P' 4c SAN provides P and snapshot P' Because the snapshot 'freeze' was done between 3a and 3b the partition P' will see N1 and O2 - the file system is broken! I don't see how the SAN will detect such a situation to make sure this NEVER happens! Comments: * I know this might look like an academic question - but if it can fail in theory it can fail in real life. [E.g. a running program creates thousand of new directories] * It might work 100% failsafe it the underlying protocol is using transactions but SCSI doesn't do. Or is there some other SCSI trick I don't know (e.g. some 'drive will be removed, flush your (control) data' SCSI message) * Modern (journaling) file systems are not corrupted that easily but even if only some journaling data is inconsistent on P' it might cause trouble if the file system P' is a readonly file system so the journaling data cannot be repaired. * I think the only reliable way is to make sure no control data is written to the partition while the snapshot is done. That is you have to unmount the file system first! * What happens if you are currently copying a 1 GB file and only 512 MB have been copied when the snapshot is done. I guess you get a truncated file on P'. But what if the (badly designed?) file system control data already says it should be 1 GB - 'broken' file system. * Some people say it is enough to synchronize the partition using the computer it is mounted on a few seconds before starting the snapshot. That way changed control data is only in the cache. NO! What if someone else is starting a second sync! Or caching is disabled or the partition is accessed directly (databases). * Some people got the idea the SAN waits until it gets no write requests for that partition for 'some time' - so no 'half-done' transactions are seen by the SAN - very unreliable! And a snapshot would not be possible if a big file is currently written. * I understand that it works if on your computer you can 'block' control data for some time and you can trigger the snapshot on the SAN - but I have been told the computer doesn't have to know and you don't have to do anything on your computer! * All people I talked to do agree that copying a big file during the snapshot procedure will result in a truncated file but they deny this happens to 'control data' - I don't see the difference: if a file that is one million blocks in size is truncated by the snapshot a two blocks file will be too - and because the SAN doesn't know the difference between a 'harmless' two blocks file and two blocks of important control data that belongs together the problem is still there. * To me a SAN hardware snapshot is equal to splitting a SCSI mirror (RAID-1) when the write LED is off by plugging one disk out of the box. When mounting the disk on another computer it might work - but there is no guarantee it does! Thank You, Erik |
#2
|
|||
|
|||
Erik H. wrote:
Hello! As far as I know some SAN solutions are able to snapshot a 'partition' on hardware level. So after performing the snapshot (which takes a second or so) directly on the SAN, I see partition P and the snapshot P'. Changes to P are not visible in the snapshot P' and vice versa. Now my problem is: Because the SAN doesn't know what kind of file system is stored on the partition it cannot know if the file system on partition P is consistent when the snapshot is done. Indeed, either your snapshot solution knows about the filesystem and is integrated with it, or you need to unmount the filesystem (or at least stop all apps and cause a host cache flush before starting the snap). Sure you could rely on filesystem journals to recover from whatever inconsistencies you caused by snapshotting while the filesystem was still running, but that doesn't count as a point-in-time snapshot then. Arne Joris |
#3
|
|||
|
|||
Arne Joris wrote in message news:Wmbgd.51049$Pl.44100@pd7tw1no...
Erik H. wrote: Hello! As far as I know some SAN solutions are able to snapshot a 'partition' on hardware level. So after performing the snapshot (which takes a second or so) directly on the SAN, I see partition P and the snapshot P'. Changes to P are not visible in the snapshot P' and vice versa. Now my problem is: Because the SAN doesn't know what kind of file system is stored on the partition it cannot know if the file system on partition P is consistent when the snapshot is done. Indeed, either your snapshot solution knows about the filesystem and is integrated with it, or you need to unmount the filesystem (or at least stop all apps and cause a host cache flush before starting the snap). The problem is that some people say the 'flush' solution works, but I say it doesn't (on a multi-user environment). Reason: If User A (root) does a host cache flush and a few seconds later starts the snapshot, some other user/application might have changed the data again (e.g. creating many directories). The changes might still be in the cache, but I think I cannot rely on that fact, e.g. another User/Application might have started another host cache clush and User A starts the snapshot while the 2. host cache flush is ongoing, which might result in a broken file system again. Sure you could rely on filesystem journals to recover from whatever inconsistencies you caused by snapshotting while the filesystem was still running, but that doesn't count as a point-in-time snapshot then. Arne Joris Thank You, Erik |
#4
|
|||
|
|||
Erik H. wrote:
Indeed, either your snapshot solution knows about the filesystem and is integrated with it, or you need to unmount the filesystem (or at least stop all apps and cause a host cache flush before starting the snap). The problem is that some people say the 'flush' solution works, but I say it doesn't (on a multi-user environment). Reason: If User A (root) does a host cache flush and a few seconds later starts the snapshot, some other user/application might have changed the data again (e.g. creating many directories). The changes might still be in the cache, but I think I cannot rely on that fact, e.g. another User/Application might have started another host cache clush and User A starts the snapshot while the 2. host cache flush is ongoing, which might result in a broken file system again. Yes you need to ensure the filesystem is not doing any metadata or data operations while the snapshot is being taken. Most filesystems do not have any guarantees about changes staying in the host cache for any length of time, so you can't rely on that. It all depends on the purpose of your snapshot; if all you want is a "workable" filesystem in your snapshot, you might be able to live with the proposed solution; metadata journaling should log a couple hundred meta data operations, and it could be unlikely (depending what your filesystem is being used for) that a user causes this many metadata changes (ie. create a file, delete a file, append to a file,...) while the snapshot is being taken. So even though the filesystem on your snapshot will be slightly incoherent, the journal allows it to become coherent again. If you require data consistency on the other hand (for example you rely on data in lock files and the files being locked to be in sync, or you rely on a file's header to be used for locking purposes) this isn't good enough. Arne Joris |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
couple of Dimension XPS Gen4 question | Matt | Dell Computers | 3 | March 4th 05 02:20 AM |
Farley "Building Storage Networks" Question... | R. Damian Koziel | Storage & Hardrives | 1 | June 3rd 04 04:23 PM |
Farley "Building Storage Networks" Question... | R. Damian Koziel | Storage & Hardrives | 0 | June 3rd 04 10:21 AM |
Quick question for NutCracker... | NuT CrAcKeR | Compaq Servers | 2 | May 2nd 04 02:47 AM |
Memory Question - outcome of exceeding the memory limits of a machine. | John B. | General | 4 | November 4th 03 12:25 PM |