A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » General Hardware & Peripherals » Storage & Hardrives
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Very High Rate Continous Transfer



 
 
Thread Tools Display Modes
  #11  
Old July 19th 05, 07:01 PM
_firstname_@lr_dot_los-gatos_dot_ca.us
external usenet poster
 
Posts: n/a
Default

In article ,
Stephen Maudsley wrote:

wrote in message
roups.com...
I am looking into what it will take to support continuous (not burst)
160 to 200 MBytes/sec transfer to disk. What types of drives and how
would they be configured as an array (or multiple arrays)?

What type of processor and bus architecture would be appropriate? Data
will be from a capture memory that is shared between the processor and
the capture electronics. So memory bandwidth will be at least 320 to
400 MBytes/sec.


Might be worth looking for articles and papers from CERN - they've been
doing this sort of stuff for years and used to publishe papers on the
computing architectures.


Being a retired high-energy physicist (and former CERN collaborator)
myself ...

Yes, it would be a good idea to start there, and read their stuff. A
good starting point is to look for the web presence of the "CERN
OpenLab", and read what is posted there.

But the original poster's situation and CERN are in different leagues.
I've recently seen a ~1 GByte/sec test running at CERN, sustained for
a whole week. But it required O(100) computers, massive networking
gear, and many hundred disk drives, with some of the finer software
and hardware products from industry thrown into the mix. It also
consumed all told probably a dozen people (both from CERN and from
industry) for a year to set up, and the hardware cost should be
measured in units of M$.

The other thing to remember is that to CERN, the data storage problem
(even though it is massive) is a small part of their overall mission.
Anyone who spends ~10 billion $ on building an accelerator, about the
same on the physics experiments, and a few billion $ a year on
operation and support, has a strong incentive to build a reliable and
fast data storage system, because loss of data would have huge
economic costs.

I very much doubt that the original poster's system will reach this
scale; still, stealing some good ideas there is a good plan.

Another thing to remember from the CERN experience: Just because the
system can do a certain speed (say 400 MB/sec) once, doesn't mean at
all that it can do so sustained. Things go wrong all the time
(guaranteed to happen in a large system, which typically even involves
a few humans, which are about as unreliable as disk drives, and nobody
has invented RAID for sys admins yet). The real test is not to do 400
MB/sec for 10 seconds, but do so sustained 24x7 for a month. This is
much much harder.

--
The address in the header is invalid for obvious reasons. Please
reconstruct the address from the information below (look for _).
Ralph Becker-Szendy
  #12  
Old July 23rd 05, 07:26 PM
external usenet poster
 
Posts: n/a
Default

Because of the nature of the data I am saving, I think I can simplify
quite a bit. I'll still need to figure out what hardware I need to
support this, but here is the direction I am heading:

Most of the time the data will be in blocks that are about 1 mS in
sample time duration. At 160 MBps, that's only 160KB per block. With a
set of 7 or 8 separate physical volumes, I will write data blocks into
separate files, sequentially writing in the next physical drive for
each block. I can do this under software control.

(I am not worried about redundancy at the moment. This data will be
stored for only a few hours before it is transferred to a server, where
data security can be addressed.)

I read a report at Tom's Hardware that shows the worst case write
bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
to about 27 MBps as the drive becomes full. (Can someone verify my
understanding of this? It's he
http://www.tomshardware.com/storage/...transfer_graph
)

Overhead
--------
I am not sure what the processor overhead will be to open and close
files while doing this. The alternative is to stream the blocks into
larger files, which only changes the data read process.

With this process, I can probably rely on disk cache to absorb most
remaining delays (like seek time).

Drive reliability
-----------------
I am wondering if I can stream data into these blocks continuously
without buffering it in main memory. As I said in an earlier post, I
want to continuously capture into memory and decide when to offload the
last 10 GB of recorded data. But I am now thinking I can stream this
data directly to the disks (with a significant savings in main memory)
and overwrite that which I don't want to keep. How hard is this on the
drives, if I do this continuously for 12 hours striaght, or for 24/7?

Drive Controller
----------------
With the solution outline given above, I will need a controller,
preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
independent physical drives, at a minimum. Does anyone have a
suggestion?


Regards,
Jim

  #13  
Old July 23rd 05, 09:51 PM
Bill Todd
external usenet poster
 
Posts: n/a
Default

wrote:
Because of the nature of the data I am saving, I think I can simplify
quite a bit. I'll still need to figure out what hardware I need to
support this, but here is the direction I am heading:

Most of the time the data will be in blocks that are about 1 mS in
sample time duration. At 160 MBps, that's only 160KB per block. With a
set of 7 or 8 separate physical volumes, I will write data blocks into
separate files, sequentially writing in the next physical drive for
each block. I can do this under software control.


It would be even easier using a single file spread across the disks
under RAID-0 software control (you're effectively talking above about
recreating RAID-0 in your application).


(I am not worried about redundancy at the moment. This data will be
stored for only a few hours before it is transferred to a server, where
data security can be addressed.)


Hmmm. 3 hrs. x 3600 sec/hr. x 160 MB/sec = 1.728 TB - considerably more
space than you'll have using 7 or 8 100 GB drives even if you manage it
optimally.


I read a report at Tom's Hardware that shows the worst case write
bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
to about 27 MBps as the drive becomes full. (Can someone verify my
understanding of this? It's he
http://www.tomshardware.com/storage/...transfer_graph
)


The number sounds reasonable, but you should still leave a bit of margin
just in case (especially using a non-RAID-3 array where the disks won't
be synchronized with each other, though your application may tend to be
self-synchronizing). Of course, you should check the manufacturer's
spec sheet too.


Overhead
--------
I am not sure what the processor overhead will be to open and close
files while doing this.


You almost certainly don't want to be opening and closing files at all
frequently: that could start to screw up your data rate to disk (even
if the relevant file data is usually cached, it often gets updated on
close). For that matter, you'll want to suppress any frequent on-disk
updates to things like the file's last-accessed and last-modified times,
reuse existing file space rather than allocate new space to avoid
on-disk allocation update activity and suppress end-of-file-mark
updates, etc.

The alternative is to stream the blocks into
larger files, which only changes the data read process.

With this process, I can probably rely on disk cache to absorb most
remaining delays (like seek time).


Quite possibly not at the data rates you're talking about.


Drive reliability
-----------------
I am wondering if I can stream data into these blocks continuously
without buffering it in main memory.


Probably not - see previous comment. Besides, if you don't go through
main memory you'd be completely by-passing the file system and writing
driver code. But using asynchronous multi-buffering you can stay within
the realm of normal application behavior without needing much memory.

As I said in an earlier post, I
want to continuously capture into memory and decide when to offload the
last 10 GB of recorded data. But I am now thinking I can stream this
data directly to the disks (with a significant savings in main memory)
and overwrite that which I don't want to keep. How hard is this on the
drives, if I do this continuously for 12 hours striaght, or for 24/7?

Drive Controller
----------------
With the solution outline given above, I will need a controller,
preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
independent physical drives, at a minimum. Does anyone have a
suggestion?


If you're as cost-conscious as you appear to be, consider 3.5" SATA
drives - which will give you the temporary storage space you need and
comparable or better bandwidth in numbers that should fit in a 3U
enclosure. 3Ware makes controllers which may handle the bandwidth when
used as a simple JBOD (I've heard varying reports of their capabilities
at the higher RAID levels).

- bill
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Running a M7NCD Motherboard with 400 FSB Krutibas Biswal Homebuilt PC's 17 October 4th 04 11:36 AM
Increasing disk performance with many small files (NTFS/ Windowsroaming profiles) Benno... Storage & Hardrives 18 July 23rd 04 12:41 PM
Whats the fasted sustanined transfer rate of a drive ? We Live For The One We Die For The One Storage (alternative) 1 April 6th 04 09:50 PM
Slow transfer rate WzL Storage (alternative) 0 February 25th 04 04:53 PM
RAID 5 vs. single SCSI drive? Carlos Moreno Storage (alternative) 35 December 19th 03 06:20 PM


All times are GMT +1. The time now is 10:52 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.