400 Mb/s ADC

#11 November 20th 03, 01:09 AM

Jeff Peterson wrote:
"Nik Simpson" wrote in message
...
Jeff Peterson wrote:
1. Just capturing the data performing some operation on it, storing
the results and throwing away the sample

we accumualte averages (of cross products of fourier tranforms)

So the basic problem is getting 400MB/s of data into memory and
processing it, but are you reading 400MB every second, or sampling
say once every ten seconds. If it's every second, then you've got a
bigger problem because I'd be surprised if you can process it fast
enough to get the job done before the next sample comes along.
we will take about 64K samples, then can pause while processing...
however all the time we are pausing we are losing data. so we do
want to
keep the duty cycle up. 50% dudty cyle is not a problem. 5% would be.

From my limited understanding of FFTs the actual processing should be
something that could easily be multi-threaded and would see pretty close to
linear scalability with additional CPUs, so at the very least an SMP system
with at least 2-4 CPUs would help, and assuming it's 64bit floating point
then a 64bit CPU like Opteron or Itanium might come in handy. If the idea of
multiple data streams is possible (and the synchronization problem can be
overcome) then if workload does scale well with CPUs, a cluster of low-cost
single CPU systems each processing part of the data stream would be worth
looking at as this could be easily scaled, i.e. five systems each handling
an 80MB/s stream might be cheaper and faster than one big system trying to
crunch 400MB/s. Additionally, if designed this way, then you could add
additional systems in order to increase the duty cycle, i.e. 10 systems
handing 40MB/s could be relatively cheap and would have roughly 2x the duty
cycle of the original 5 systems.

2. You might be actually planning to capture to disk 400MB/s for a
sustained period which has some pretty hairy implications for
storage capacity.

we wont store the raw data, just a very much reduced set.

So disk output bandwidth is not going to be a problem, what you are
looking for is a way of getting 400MB/s of data into memory for
post-processing, correct. Is it possible to break-up the input
stream, so for example instead of reading a single stream of
400MB/s, you've five devices reading 80MB/s in parralel? Is the
design of the device capturing the data set in stone or can it be
"parrallelized" if so it would make the problem much simpler and any
solution more scalable and less expensive.
this could work. for example we have considered using 2 x scsi 320
interfaces. might work but its a bit of a kludge, and if we got the
two interfaces out of sync we would have a real mess.

Is there any way to insert synch markers in the data stream so that the
problem of data streams getting out of sync can be handled?

--
Nik Simpson

#12 November 20th 03, 05:06 AM

yes, repacking might allow a 64/66 PCI to accept the data. i worry
that we will spend lots of time and money, but the margin will be
insufficient for it to actually work. i have heard that some PCI
cores are not too efficient.

Spend money and time on what? With regards to PCI, I am pretty sure it will
work. You can ask PCI crowd on the PCI mailing list
(http://www.pcisig.com/developers/tec...port/pci_forum), they will
tell you for sure.And it doesn't have to be a core, you could use an
industry proven silicon, e.g. from PLX. I would be more worried about
processing all this data in your PC. I don't think any PC can do FFT's while
keeping up with such a data flow. Let's say you want to do 1024 point FFT.
At 400 MSPS it will take only 2.56 us to accumulate a new block of data.
The latest and greatest ADI ADSP-TS201S can do a 1024-point complex FFT time
in 16.8 microseconds. I doubt any of the Intel chips can do it faster.
AFAIK, TI DSP's aren't faster either. So, in my opinion you will either need
an array of fast DSP's or some sort of FPGA based processing. Trying to do
this kind of processing in host doesn't sound feasible to me.

/Mikhail

#13 November 20th 03, 05:29 AM

Jeff Peterson wrote:
"Nik Simpson" wrote in message ...
we accumualte averages (of cross products of fourier tranforms)

So the basic problem is getting 400MB/s of data into memory and processing
it, but are you reading 400MB every second, or sampling say once every ten
seconds. If it's every second, then you've got a bigger problem because I'd
be surprised if you can process it fast enough to get the job done before
the next sample comes along.
we will take about 64K samples, then can pause while processing...
however all the time we are pausing we are losing data. so we do want to
keep the duty cycle up. 50% dudty cyle is not a problem. 5% would be.

As stated elsethread, if you give up trying to get this throughput
on a conventional PC platform, you probably can do this on a "big enough"
FPGA. From your memory needs alone (64K x 6 x some overhead in which
to do your FFT) you're probably looking north of an XC2V2000, and the
single chip price is measured in the thousands of US$. For the c.a.f
group to estimate with any precision the smallest practical part, you
need to do things like compute the number of bits precision you need
for your butterflies. The 96 18x18 multipliers on an XC2V3000 would
come in real handy, especially if they didn't need to be cascaded for
more precision. If you can make your design work at 200 MS/s (DDR),
Even 32 multipliers would let you run the FFT as fast as data points
stream in -- although that would also require 16 x 64K x 18 bits
storage, out of reach for the current Xilinx offerings at least.

I know who I'd ask first for help (ahem-ray-cough).

- Larry

#14 November 20th 03, 08:57 AM

(Jeff Peterson) wrote in message . com...
We are building a new radio telescope called PAST
(http://astrophysics.phys.cmu.edu/~jbp/past6.pdf)
which we will install at the South Pole or in Western China.

To make this work, will need to sample (6 to 8 bit precision) dozens
of analog voltages at 400 Msample/sec and feed these data streams into
PCs. One PC per sampler.

The flash ADCs we need are available (Maxim), but we are finding it
difficult to get the data into the PC.

You should definitely talk to High Energy Physics People. Like the
STAR experiment at BNL or ALICE at CERN. Talk to the data aquisition
and Level 3 Trigger people there. You probably can just buy boards
with fast links and DSPs from them.

If you want to design it yourself, here are some comments:
1)
If you use a busmaster device you and you want to read data with 50%
duty cycle you can buffer the events in your readout board and reduce
the data rate to 200MByte/s. You add one event of latency.

2)
The fastest slots on a PC Mainboard are the memory expansion slots.
It's an easy to design hardware interface and if you use a server
mainboard with multiple memory channels you get a hell lot of
bandwidth. I remember seeing a cryptoaccelerator on a DIMM somewhere
and SUN used to place graphics boards in memory slots.

3.
If your political environment is similar to high energy physics, than
if you can reduce the duty cycle it does not really matter how
expensive the readout boards are. With a large FPGA on a PCI board you
can try to perform all computations on the board and achieve a 100%
duty cycle.

Kolja Sulimma

#15 November 20th 03, 09:07 AM

The fastest slots on a PC Mainboard are the memory expansion slots.
It's an easy to design hardware interface and if you use a server
mainboard with multiple memory channels you get a hell lot of
bandwidth.

....and forget Windows support. Only the specially hacked Linux will be your
friend.

and SUN used to place graphics boards in memory slots.

Sorry? Sun used S-Bus for them, which is not memory slot.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation

http://www.storagecraft.com

#16 November 20th 03, 03:32 PM

The fastest slots on a PC Mainboard are the memory expansion slots.
It's an easy to design hardware interface and if you use a server
mainboard with multiple memory channels you get a hell lot of
bandwidth.

...and forget Windows support. Only the specially hacked Linux will be your
friend.
????
The need to write their own driver anyway.

I do not know much about windows driver programming, but it should be
possible for a driver developer to map arbitrary physical address
ranges to user space.
You need chipset specific code to enable access to the dimm after
boot, because it must start disabled to prevent windows from using the
memory. But as they use the board only in a single setup, this is no
problem at all.
Anyway, an experiment of that type is likely to use an real time OS
anyway, neither windows nor plain vanilla linux. Maybe OS9 or VxWorks.

Sorry? Sun used S-Bus for them, which is not memory slot.
They did, but they also had UMA archtiectures based on DIMMS.

Kolja Sulimma

#17 November 20th 03, 04:59 PM

"Jeff Peterson" wrote in message
om...
We are building a new radio telescope called PAST
(http://astrophysics.phys.cmu.edu/~jbp/past6.pdf)
which we will install at the South Pole or in Western China.

To make this work, will need to sample (6 to 8 bit precision) dozens
of analog voltages at 400 Msample/sec and feed these data streams into
PCs. One PC per sampler.

The flash ADCs we need are available (Maxim), but we are finding it
difficult to get the data into the PC.

One simple way would be to use SCSI ultra640, but so far I have not
found any 640 adapters on the market. Is any 640 adapter available?
anything coming soon?

or we could go right into a PCI-X bus. has anyone out there
done this at 400 Mb/s? is this hard to do? FPGA core liscense
for this seems expensive ($9K), with no guarentee of 400 mByte rates.

is there a better way?

thanks

-Jeff Peterson

Why dont you get an AGP Graphics processor, and try to connect your ADCs to
the GPU Memory Bus.
Run a PCI card for graphics on the PC.

The GPUs are programmable , so you might even be able to do some processing
inside...

Since you only need 400 MSamples/S, you could live with the Maxims.

If you want to get some real speed, then maybe something like the Atmel
TS8308500 (500 Mspl/s), TS8388B (1 Gspl/s) or TS83102G0B (Gspl/s) could be
of interest.
Going up to Giga Samples per second, would make your problem worse though
:-)

http://www.atmel.com/dyn/products/da...?family_id=611

--
Best Regards
Ulf at atmel dot com
These comments are intended to be my own opinion and they
may, or may not be shared by my employer, Atmel Sweden.

#18 November 20th 03, 07:36 PM

You need chipset specific code to enable access to the dimm after
boot, because it must start disabled to prevent windows from using the
memory.

Easier! Just add /MAXMEM to Windows's BOOT.INI, and it will skip some of the
BIOS reported memory.
So, for the second sight, the think looks easier.

Anyway, an experiment of that type is likely to use an real time OS
anyway, neither windows nor plain vanilla linux. Maybe OS9 or VxWorks.

Surely.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation

http://www.storagecraft.com

#19 November 20th 03, 08:28 PM

Maxim S. Shatskih wrote:
You need chipset specific code to enable access to the dimm after
boot, because it must start disabled to prevent windows from using
the memory.

Easier! Just add /MAXMEM to Windows's BOOT.INI, and it will skip some
of the BIOS reported memory.
So, for the second sight, the think looks easier.

The trick is knowing which physical memory slots are affected by the
BOOT.INI statement. An alternative is simply to grab physical memory address
space for a device driver during the boot sequence and lock Windows out of
it, DataCore uses that approach for it's cache in SANsymphony.

--
Nik Simpson

#20 November 20th 03, 08:34 PM

On a sunny day (Thu, 20 Nov 2003 00:06:40 -0500) it happened "MM"
wrote in :

yes, repacking might allow a 64/66 PCI to accept the data. i worry
that we will spend lots of time and money, but the margin will be
insufficient for it to actually work. i have heard that some PCI
cores are not too efficient.

Spend money and time on what? With regards to PCI, I am pretty sure it will
work. You can ask PCI crowd on the PCI mailing list
(http://www.pcisig.com/developers/tec...port/pci_forum), they will
tell you for sure.And it doesn't have to be a core, you could use an
industry proven silicon, e.g. from PLX. I would be more worried about
processing all this data in your PC. I don't think any PC can do FFT's while
keeping up with such a data flow. Let's say you want to do 1024 point FFT.
At 400 MSPS it will take only 2.56 us to accumulate a new block of data.
The latest and greatest ADI ADSP-TS201S can do a 1024-point complex FFT time
in 16.8 microseconds. I doubt any of the Intel chips can do it faster.
AFAIK, TI DSP's aren't faster either. So, in my opinion you will either need
an array of fast DSP's or some sort of FPGA based processing. Trying to do
this kind of processing in host doesn't sound feasible to me.

/Mikhail

A little while ago in sci.crypt there was some talk about the first optical processor.
Basically this is an LED array with multipliers that can do 125 million complex 128 point
FFT or 500000 DFT 16 K size per second.
http://www.lenslet.com/newsItem.asp?...ve=&newsId=184
www.lenslet.com
The thing itself is a normal DSP with the optical array (you can buy that separately too).
Normal logic, if you interfaced a FPGA you could go faster perhaps, those gallium
arsenide LEDS switch at 20 GHz...
No idea what it costs, perhaps less then you think.
Download the datasheet .pdf, maybe it is of use...
JP

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Ultra DMA	Ken	Homebuilt PC's	28	November 14th 04 01:54 AM
Ultra DMA	Ken	Asus Motherboards	21	November 14th 04 01:54 AM
Need Help To Identify Maker of DDR400 DIMM card	gmv	Homebuilt PC's	6	August 28th 04 05:48 PM
memory too slow...	Euclid	Compaq Computers	4	May 10th 04 11:20 AM
Promise IDE/Intel IDE comparison - PATA - P4C800E-Deluxe	Noozer	General	8	January 18th 04 01:25 AM