If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
Jeff Peterson wrote:
"Nik Simpson" wrote in message ... Jeff Peterson wrote: 1. Just capturing the data performing some operation on it, storing the results and throwing away the sample we accumualte averages (of cross products of fourier tranforms) So the basic problem is getting 400MB/s of data into memory and processing it, but are you reading 400MB every second, or sampling say once every ten seconds. If it's every second, then you've got a bigger problem because I'd be surprised if you can process it fast enough to get the job done before the next sample comes along. we will take about 64K samples, then can pause while processing... however all the time we are pausing we are losing data. so we do want to keep the duty cycle up. 50% dudty cyle is not a problem. 5% would be. From my limited understanding of FFTs the actual processing should be something that could easily be multi-threaded and would see pretty close to linear scalability with additional CPUs, so at the very least an SMP system with at least 2-4 CPUs would help, and assuming it's 64bit floating point then a 64bit CPU like Opteron or Itanium might come in handy. If the idea of multiple data streams is possible (and the synchronization problem can be overcome) then if workload does scale well with CPUs, a cluster of low-cost single CPU systems each processing part of the data stream would be worth looking at as this could be easily scaled, i.e. five systems each handling an 80MB/s stream might be cheaper and faster than one big system trying to crunch 400MB/s. Additionally, if designed this way, then you could add additional systems in order to increase the duty cycle, i.e. 10 systems handing 40MB/s could be relatively cheap and would have roughly 2x the duty cycle of the original 5 systems. 2. You might be actually planning to capture to disk 400MB/s for a sustained period which has some pretty hairy implications for storage capacity. we wont store the raw data, just a very much reduced set. So disk output bandwidth is not going to be a problem, what you are looking for is a way of getting 400MB/s of data into memory for post-processing, correct. Is it possible to break-up the input stream, so for example instead of reading a single stream of 400MB/s, you've five devices reading 80MB/s in parralel? Is the design of the device capturing the data set in stone or can it be "parrallelized" if so it would make the problem much simpler and any solution more scalable and less expensive. this could work. for example we have considered using 2 x scsi 320 interfaces. might work but its a bit of a kludge, and if we got the two interfaces out of sync we would have a real mess. Is there any way to insert synch markers in the data stream so that the problem of data streams getting out of sync can be handled? -- Nik Simpson |
#12
|
|||
|
|||
yes, repacking might allow a 64/66 PCI to accept the data. i worry
that we will spend lots of time and money, but the margin will be insufficient for it to actually work. i have heard that some PCI cores are not too efficient. Spend money and time on what? With regards to PCI, I am pretty sure it will work. You can ask PCI crowd on the PCI mailing list (http://www.pcisig.com/developers/tec...port/pci_forum), they will tell you for sure.And it doesn't have to be a core, you could use an industry proven silicon, e.g. from PLX. I would be more worried about processing all this data in your PC. I don't think any PC can do FFT's while keeping up with such a data flow. Let's say you want to do 1024 point FFT. At 400 MSPS it will take only 2.56 us to accumulate a new block of data. The latest and greatest ADI ADSP-TS201S can do a 1024-point complex FFT time in 16.8 microseconds. I doubt any of the Intel chips can do it faster. AFAIK, TI DSP's aren't faster either. So, in my opinion you will either need an array of fast DSP's or some sort of FPGA based processing. Trying to do this kind of processing in host doesn't sound feasible to me. /Mikhail |
#13
|
|||
|
|||
Jeff Peterson wrote:
"Nik Simpson" wrote in message ... we accumualte averages (of cross products of fourier tranforms) So the basic problem is getting 400MB/s of data into memory and processing it, but are you reading 400MB every second, or sampling say once every ten seconds. If it's every second, then you've got a bigger problem because I'd be surprised if you can process it fast enough to get the job done before the next sample comes along. we will take about 64K samples, then can pause while processing... however all the time we are pausing we are losing data. so we do want to keep the duty cycle up. 50% dudty cyle is not a problem. 5% would be. As stated elsethread, if you give up trying to get this throughput on a conventional PC platform, you probably can do this on a "big enough" FPGA. From your memory needs alone (64K x 6 x some overhead in which to do your FFT) you're probably looking north of an XC2V2000, and the single chip price is measured in the thousands of US$. For the c.a.f group to estimate with any precision the smallest practical part, you need to do things like compute the number of bits precision you need for your butterflies. The 96 18x18 multipliers on an XC2V3000 would come in real handy, especially if they didn't need to be cascaded for more precision. If you can make your design work at 200 MS/s (DDR), Even 32 multipliers would let you run the FFT as fast as data points stream in -- although that would also require 16 x 64K x 18 bits storage, out of reach for the current Xilinx offerings at least. I know who I'd ask first for help (ahem-ray-cough). - Larry |
#15
|
|||
|
|||
The fastest slots on a PC Mainboard are the memory expansion slots.
It's an easy to design hardware interface and if you use a server mainboard with multiple memory channels you get a hell lot of bandwidth. ....and forget Windows support. Only the specially hacked Linux will be your friend. and SUN used to place graphics boards in memory slots. Sorry? Sun used S-Bus for them, which is not memory slot. -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation http://www.storagecraft.com |
#16
|
|||
|
|||
The fastest slots on a PC Mainboard are the memory expansion slots.
It's an easy to design hardware interface and if you use a server mainboard with multiple memory channels you get a hell lot of bandwidth. ...and forget Windows support. Only the specially hacked Linux will be your friend. ???? The need to write their own driver anyway. I do not know much about windows driver programming, but it should be possible for a driver developer to map arbitrary physical address ranges to user space. You need chipset specific code to enable access to the dimm after boot, because it must start disabled to prevent windows from using the memory. But as they use the board only in a single setup, this is no problem at all. Anyway, an experiment of that type is likely to use an real time OS anyway, neither windows nor plain vanilla linux. Maybe OS9 or VxWorks. Sorry? Sun used S-Bus for them, which is not memory slot. They did, but they also had UMA archtiectures based on DIMMS. Kolja Sulimma |
#17
|
|||
|
|||
"Jeff Peterson" wrote in message om... We are building a new radio telescope called PAST (http://astrophysics.phys.cmu.edu/~jbp/past6.pdf) which we will install at the South Pole or in Western China. To make this work, will need to sample (6 to 8 bit precision) dozens of analog voltages at 400 Msample/sec and feed these data streams into PCs. One PC per sampler. The flash ADCs we need are available (Maxim), but we are finding it difficult to get the data into the PC. One simple way would be to use SCSI ultra640, but so far I have not found any 640 adapters on the market. Is any 640 adapter available? anything coming soon? or we could go right into a PCI-X bus. has anyone out there done this at 400 Mb/s? is this hard to do? FPGA core liscense for this seems expensive ($9K), with no guarentee of 400 mByte rates. is there a better way? thanks -Jeff Peterson Why dont you get an AGP Graphics processor, and try to connect your ADCs to the GPU Memory Bus. Run a PCI card for graphics on the PC. The GPUs are programmable , so you might even be able to do some processing inside... Since you only need 400 MSamples/S, you could live with the Maxims. If you want to get some real speed, then maybe something like the Atmel TS8308500 (500 Mspl/s), TS8388B (1 Gspl/s) or TS83102G0B (Gspl/s) could be of interest. Going up to Giga Samples per second, would make your problem worse though :-) http://www.atmel.com/dyn/products/da...?family_id=611 -- Best Regards Ulf at atmel dot com These comments are intended to be my own opinion and they may, or may not be shared by my employer, Atmel Sweden. |
#18
|
|||
|
|||
You need chipset specific code to enable access to the dimm after
boot, because it must start disabled to prevent windows from using the memory. Easier! Just add /MAXMEM to Windows's BOOT.INI, and it will skip some of the BIOS reported memory. So, for the second sight, the think looks easier. Anyway, an experiment of that type is likely to use an real time OS anyway, neither windows nor plain vanilla linux. Maybe OS9 or VxWorks. Surely. -- Maxim Shatskih, Windows DDK MVP StorageCraft Corporation http://www.storagecraft.com |
#19
|
|||
|
|||
Maxim S. Shatskih wrote:
You need chipset specific code to enable access to the dimm after boot, because it must start disabled to prevent windows from using the memory. Easier! Just add /MAXMEM to Windows's BOOT.INI, and it will skip some of the BIOS reported memory. So, for the second sight, the think looks easier. The trick is knowing which physical memory slots are affected by the BOOT.INI statement. An alternative is simply to grab physical memory address space for a device driver during the boot sequence and lock Windows out of it, DataCore uses that approach for it's cache in SANsymphony. -- Nik Simpson |
#20
|
|||
|
|||
On a sunny day (Thu, 20 Nov 2003 00:06:40 -0500) it happened "MM"
wrote in : yes, repacking might allow a 64/66 PCI to accept the data. i worry that we will spend lots of time and money, but the margin will be insufficient for it to actually work. i have heard that some PCI cores are not too efficient. Spend money and time on what? With regards to PCI, I am pretty sure it will work. You can ask PCI crowd on the PCI mailing list (http://www.pcisig.com/developers/tec...port/pci_forum), they will tell you for sure.And it doesn't have to be a core, you could use an industry proven silicon, e.g. from PLX. I would be more worried about processing all this data in your PC. I don't think any PC can do FFT's while keeping up with such a data flow. Let's say you want to do 1024 point FFT. At 400 MSPS it will take only 2.56 us to accumulate a new block of data. The latest and greatest ADI ADSP-TS201S can do a 1024-point complex FFT time in 16.8 microseconds. I doubt any of the Intel chips can do it faster. AFAIK, TI DSP's aren't faster either. So, in my opinion you will either need an array of fast DSP's or some sort of FPGA based processing. Trying to do this kind of processing in host doesn't sound feasible to me. /Mikhail A little while ago in sci.crypt there was some talk about the first optical processor. Basically this is an LED array with multipliers that can do 125 million complex 128 point FFT or 500000 DFT 16 K size per second. http://www.lenslet.com/newsItem.asp?...ve=&newsId=184 www.lenslet.com The thing itself is a normal DSP with the optical array (you can buy that separately too). Normal logic, if you interfaced a FPGA you could go faster perhaps, those gallium arsenide LEDS switch at 20 GHz... No idea what it costs, perhaps less then you think. Download the datasheet .pdf, maybe it is of use... JP |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Ultra DMA | Ken | Homebuilt PC's | 28 | November 14th 04 01:54 AM |
Ultra DMA | Ken | Asus Motherboards | 21 | November 14th 04 01:54 AM |
Need Help To Identify Maker of DDR400 DIMM card | gmv | Homebuilt PC's | 6 | August 28th 04 05:48 PM |
memory too slow... | Euclid | Compaq Computers | 4 | May 10th 04 11:20 AM |
Promise IDE/Intel IDE comparison - PATA - P4C800E-Deluxe | Noozer | General | 8 | January 18th 04 01:25 AM |