If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
write delays on writes to EMC Symetrix
Hi
We have an app that writes to a 700MB memory mapped file on an EMC Sym from a Sun V480 with JNI HBAs. The Sym has 16Gb of cache. On occasions the app users complain that the app freezes. I wrote a perl script to simulate the app, and measure write performance. My script re-writes a 500MB file, 500 bytes at a time. The script runs flat out writing 500 meg in approx 70 secs - no problem, however I have noticed that individual writes are sometimes very slow. I measure the time for each write, and store the times in an array, and then after the writes completes, I print out the write times, typical output. 0.000037 0.000038 0.000037 0.000037 1.234568 0.000967 0.000532 0.000214 0.000065 0.000037 There seems to be some correlation between SAN activity and these delayed writes, they are very noticeable during the night backups, but were totally absent during the weekend. Typically I get one or two slow writes taking 0.5 secs in every million writes. There is no regularity to the delays, we do see fsflush induced delays, these are typically around 0.1 secs and are quite regular. We were originally running ufs with logging, and changed to vxfs, this improved from average 20 slow writes to average of two slow writes, we also installed EMC Powerpath - this had no effect. We thought originally that this may be extent allocation related, that's why I re-write an existing file. The phenomena seems to affect a large number of hosts attached to different Syms, it is also present on hosts with Emulex HBAs, and is also observed from Egenera Linux blades. All the Sun hosts have SRDF, so that was another suspect, but the Linux blades have no SRDF - there goes another theory. The huge Sym cache would seem to rule out disk temp recals, so I am now puzzled. Any ideas Pedro |
#2
|
|||
|
|||
In article , "pedro d" writes:
Hi We have an app that writes to a 700MB memory mapped file on an EMC Sym from a Sun V480 with JNI HBAs. The Sym has 16Gb of cache. On occasions the app users complain that the app freezes. I wrote a perl script to simulate the app, and measure write performance. My script re-writes a 500MB file, 500 bytes at a time. The script runs flat out writing 500 meg in approx 70 secs - no problem, however I have noticed that individual writes are sometimes very slow. I measure the time for each write, and store the times in an array, and then after the writes completes, I print out the write times, typical output. 0.000037 0.000038 0.000037 0.000037 1.234568 0.000967 0.000532 0.000214 0.000065 0.000037 There seems to be some correlation between SAN activity and these delayed writes, they are very noticeable during the night backups, but were totally absent during the weekend. Typically I get one or two slow writes taking 0.5 secs in every million writes. There is no regularity to the delays, we do see fsflush induced delays, these are typically around 0.1 secs and are quite regular. We were originally running ufs with logging, and changed to vxfs, this improved from average 20 slow writes to average of two slow writes, we also installed EMC Powerpath - this had no effect. We thought originally that this may be extent allocation related, that's why I re-write an existing file. The phenomena seems to affect a large number of hosts attached to different Syms, it is also present on hosts with Emulex HBAs, and is also observed from Egenera Linux blades. All the Sun hosts have SRDF, so that was another suspect, but the Linux blades have no SRDF - there goes another theory. The huge Sym cache would seem to rule out disk temp recals, so I am now puzzled. Any ideas I've no PermaCache experience, but the PermaCache file has to be tied to physical storage. Regardless of what it is tied to... the writes are destaged to physical disk. Find out from your Storage Administrator what it is mapped to. Go in WLA, metrics, Disks and look at that disk: write commands per sec Kbytes written per sec seeks per sec average hypers per seek average kbyes per write You are on the right track. That hyper your small 700 MByte file is mapped to may be red-hot due to write traffic but also since that hyper is just one slice of that physical disk, reads and writes to that hyper may indeed be impeding the writes flushed from PermaCache to that hyper. What will clue us in is how much seeking it is doing, especially average hypers per seek correlated to your slow times. It is a manner of correlation. Second thing that may be at issue as you don't mention if they are sequential writes. What type of write traffic and how is the Sym able to combine writes? Writing 500 MBytes at 500 bytes per second is somewhere around 14000 writes per second. The Sym surely has a threshold whereby it de-stages it could just be a matter of it can't write any faster when it goes to write. That is why KBytes per sec, write commands per sec, kbytes per write is listed above. You can see if it is a matter of saturation during the "hangs" (very good possibiity). And that will jump out at you when you view the graphs. Now the question is... can you map that PermaCache file to a large Meta? That way you would be de-staging your writes to many disks instead of just a hyper. Yes, built-in assumption is you are mapped to a hyper. Rob |
#3
|
|||
|
|||
Thanks Rob
The disk partition is in fact a slice on a 4 disk Meta. I am doing sequential writes to the file. The application is in fact a trading app, that writes to several files, a memory mapped database file sitting behind a Sybase Open Server instance, a log file and a transaction number mmap file. There are some less important files. The database file is created daily and is approx 700 MBytes. The tx number file is just a few bytes. The log file grows to about 700MB in one day's trading. Each trade is entered in the database - and hence the mmap, and written to the log. Ideally I would like to move the logs to a separate Meta, but the app currently does not allow this. I am pretty sure that the Meta itself is not write bound, and stats from the WLA do not show any hot spots - the problem is that WLA is not very granular, and does not seems to be capable of showing stats for very short time periods ( I may be wrong here ). I noticed that if I increase my individual write from 500 Bytes to 1k, the test time increase only slightly, which suggests to me that I am far from saturation of the cache, after all I am still only writing something like 10MB per sec and as my data is being striped across 4 disks... Any idea how de-staging works, can it cause IO blocking during a de-stage - I would have expected the cache to be configured as a circular pair of fifos, so that writes go to one fifo whilst the second is de-staged. I know that we have 16GB of cache, but is this cache subdivided in any way so that a particular meta is only served with a small amount of cache? Pedro "Rob Young" wrote in message ... In article , "pedro d" writes: Hi We have an app that writes to a 700MB memory mapped file on an EMC Sym from a Sun V480 with JNI HBAs. The Sym has 16Gb of cache. On occasions the app users complain that the app freezes. I wrote a perl script to simulate the app, and measure write performance. My script re-writes a 500MB file, 500 bytes at a time. The script runs flat out writing 500 meg in approx 70 secs - no problem, however I have noticed that individual writes are sometimes very slow. I measure the time for each write, and store the times in an array, and then after the writes completes, I print out the write times, typical output. 0.000037 0.000038 0.000037 0.000037 1.234568 0.000967 0.000532 0.000214 0.000065 0.000037 There seems to be some correlation between SAN activity and these delayed writes, they are very noticeable during the night backups, but were totally absent during the weekend. Typically I get one or two slow writes taking 0.5 secs in every million writes. There is no regularity to the delays, we do see fsflush induced delays, these are typically around 0.1 secs and are quite regular. We were originally running ufs with logging, and changed to vxfs, this improved from average 20 slow writes to average of two slow writes, we also installed EMC Powerpath - this had no effect. We thought originally that this may be extent allocation related, that's why I re-write an existing file. The phenomena seems to affect a large number of hosts attached to different Syms, it is also present on hosts with Emulex HBAs, and is also observed from Egenera Linux blades. All the Sun hosts have SRDF, so that was another suspect, but the Linux blades have no SRDF - there goes another theory. The huge Sym cache would seem to rule out disk temp recals, so I am now puzzled. Any ideas I've no PermaCache experience, but the PermaCache file has to be tied to physical storage. Regardless of what it is tied to... the writes are destaged to physical disk. Find out from your Storage Administrator what it is mapped to. Go in WLA, metrics, Disks and look at that disk: write commands per sec Kbytes written per sec seeks per sec average hypers per seek average kbyes per write You are on the right track. That hyper your small 700 MByte file is mapped to may be red-hot due to write traffic but also since that hyper is just one slice of that physical disk, reads and writes to that hyper may indeed be impeding the writes flushed from PermaCache to that hyper. What will clue us in is how much seeking it is doing, especially average hypers per seek correlated to your slow times. It is a manner of correlation. Second thing that may be at issue as you don't mention if they are sequential writes. What type of write traffic and how is the Sym able to combine writes? Writing 500 MBytes at 500 bytes per second is somewhere around 14000 writes per second. The Sym surely has a threshold whereby it de-stages it could just be a matter of it can't write any faster when it goes to write. That is why KBytes per sec, write commands per sec, kbytes per write is listed above. You can see if it is a matter of saturation during the "hangs" (very good possibiity). And that will jump out at you when you view the graphs. Now the question is... can you map that PermaCache file to a large Meta? That way you would be de-staging your writes to many disks instead of just a hyper. Yes, built-in assumption is you are mapped to a hyper. Rob |
#4
|
|||
|
|||
In article , "pedro d" writes:
Thanks Rob The disk partition is in fact a slice on a 4 disk Meta. You mean either: "4-way?" If so, 8 physical disks involved. "2-way?" If so, 4 physical disks involved. I am doing sequential writes to the file. Okay. So in an ideal situation it is taking those 14000-15000 500 Byte writes and making about 100 64K writes out of them. Or is it? Don't know. I am pretty sure that the Meta itself is not write bound, and stats from the WLA do not show any hot spots - the problem is that WLA is not very granular, and does not seems to be capable of showing stats for very short time periods ( I may be wrong here ). It isn't just about hot spots, but are you IO bound or pending IO at -any- point (a cause of delays)? That can be tricky to determine. I noticed that if I increase my individual write from 500 Bytes to 1k, the test time increase only slightly, which suggests to me that I am far from saturation of the cache, after all I am still only writing something like 10MB per sec and as my data is being striped across 4 disks... But go back and look at the Disks that are part of that Meta. If for example you are doing 100+ seeks/sec that cross hyper boundaries at the same time you are putting out heavy write traffic, you have a candidate. Another red flag is looking at hosts that are using storage associated with those 8 physical disks that the 4-way meta is associated with. Do these hosts report long read queue lengths at the time the writes go longer in duration? The problem is... the Sym doesn't report read queue depth. You have to reverse engineer it. These five things to look at in the last post: write commands per sec Kbytes written per sec seeks per sec average hypers per seek average kbyes per write Will help to see if there is an underlying issue. write commands per sec to the meta partitions will tell you how well it combined those 500 byte writes. Analysis and coorelation will tell you just what is occuring. Coorelation is what I used to track down a bottleneck, and write up a summary for internal consumption. Why are your writes running longer? The Sym hasn't ACKed them, busy destaging, maybe? Either way, you should be able to confirm the disks are the bottleneck or they aren't as you track down the cause. Any idea how de-staging works, can it cause IO blocking during a de-stage - I would have expected the cache to be configured as a circular pair of fifos, so that writes go to one fifo whilst the second is de-staged. No idea. Good luck finding that out, let me know if you stumble upon technical details. I know that we have 16GB of cache, but is this cache subdivided in any way so that a particular meta is only served with a small amount of cache? This I have been "told" - take it for what it is worth. Each volume has a certain amount of cache associated with it as a "start". If busy (or whatever internal criteria they key on), it will expand the cache associated with that volume. It can expand it again, three expansions (or so I've been informed). I was told that by an EMC rep, there may be something written about that somewhere but I can't find it. I don't have figures and what I poorly describe sounds like urban legend but is all I got. If you ever stumble upon technical details, drop me a line. Rob |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
writing cd`s | biggmark | Cdr | 7 | December 31st 04 08:57 AM |
Harddisks: Seek, Read, Write, Read, Write, Slow ? | Marc de Vries | General | 7 | July 26th 04 02:57 AM |
Nero help needed | foghat | Cdr | 0 | May 31st 04 08:23 PM |
Linux hanging while trying to write cd-r. | Jayasheel C H | Cdr | 0 | September 16th 03 06:04 AM |
Help! - The dreaded buffer underrun | XPG | Cdr | 5 | August 31st 03 06:27 PM |