DMA IP cores and DMA Linux drivers

Applications, development tools, FPGA, C, WEB
Post Reply
Arnold
Posts: 54
Joined: Wed Mar 11, 2015 3:07 pm

DMA IP cores and DMA Linux drivers

Post by Arnold » Tue Jun 09, 2015 11:53 am

Hello,

I wonder if there is a faster way to transfer the adc samples to the ARM core besides mmap the fpga memory to linux...

How much improvement would you get with DMA transfer? Can you share your knowledge?

I have found the following projects. @Pavel you are saying they are to complicated to use?
https://github.com/bmartini/zynq-xdma
https://github.com/bmartini/zynq-axis

Thanks,
A

pavel
Posts: 799
Joined: Sat May 23, 2015 5:22 pm

Re: DMA IP cores and DMA Linux drivers

Post by pavel » Tue Jun 09, 2015 1:31 pm

Hi Arnold,
I wonder if there is a faster way to transfer the adc samples to the ARM core besides mmap the fpga memory to linux...
Just to be sure that I understand correctly this part.

First of all, mmap does not affect the performance in any way.

The default Red Pitaya FPGA configuration uses a block RAM (BRAM) buffer connected to the GP bus:

ADC -> BRAM buffer -> GP bus -> CPU

Alternatively, the ADC can be connected to the HP bus and then it can write data to the on-board DDR3 RAM (this approach is often called DMA):

ADC -> HP bus -> DDR3 RAM -> CPU

The HP bus is faster than the GP bus and CPU access to DDR3 RAM is faster than to BRAM via the GP bus.

You can find all the details about the GP and HP buses in the chapter 5 of the following document:

http://www.xilinx.com/support/documenta ... 00-TRM.pdf
How much improvement would you get with DMA transfer?
It's very application dependent.

The BRAM+GP bus approach is fast enough for the following applications:
  • record relatively short (less than 120k) series of the ADC samples at full speed (125 MSPS)
  • transfer the ADC samples to the CPU after some pre-processing (for example, decimation) done on the FPGA at relatively low speeds (less than 10 MSPS)
The HP bus+DDR3 RAM approach can be used for the following applications:
  • record long (1-100M) series of the ADC samples at high speed (10-125 MSPS)
  • continuously transfer the ADC samples to the CPU at high speed (10-125 MSPS)
Then you have several possibilities of how to control the data transfers from ADC to the BRAM or DDR3 RAM buffers and how to access the data stored in these buffers.

Possibilities to control the data transfers from FPGA:
  • Xilinx DMA IP cores
  • custom DMA IP cores
  • custom Verilog/SystemVerilog/VHDL modules
Let's look at the case when the IP cores and Verilog/SystemVerilog/VHDL modules should be controlled from a Linux application running on the CPU.

Possibilities to control the IP cores and the Verilog/SystemVerilog/VHDL modules from a Linux application running on the CPU:
  • /dev/mem
  • UIO driver
  • custom Linux driver
Possibilities to access the data stored in the BRAM or DDR3 RAM buffers from a Linux application running on the CPU:
  • /dev/mem
  • UIO driver
  • custom Linux driver
If done properly, any of these 3^3 combinations would have the same performance.
Can you share your knowledge?
For my ZYNQ based projects, I reduce the amount of DDR3 RAM accessible to Linux and use a memory block at the end of the DDR3 RAM address range as a buffer for DMA.

I found this idea in the following article:

http://blog.fakultaet-technik.de/develo ... oot-files/

Then I use a custom DMA IP core together with /dev/mem to control the IP core and to access the data stored in the DDR3 RAM buffer.

Here are the pluses of this approach:
  • very simple DMA
  • low FPGA resource usage
  • no need for scatter-gather
  • no need for Linux driver (I use /dev/mem)
  • Linux CMA uses the same approach (so, I'm not inventing anything new)
  • can't think of any minuses
Here is the IP core that I use to write the ADC data to the DDR3 RAM buffer:

https://github.com/pavel-demin/red-pita ... riter_v1_0

and here are two simple projects that use this IP core:

https://github.com/pavel-demin/red-pita ... s/adc_test

https://github.com/pavel-demin/red-pita ... tizer_test

Cheers,

Pavel

fbalakirev
Posts: 101
Joined: Thu Sep 03, 2015 6:56 pm

Re: DMA IP cores and DMA Linux drivers

Post by fbalakirev » Thu Sep 10, 2015 1:39 am

Hi Pavel,

I have a question about your adc-test-server.c example of dealing with DMA. As far as I understand, you have to map a secondary buffer "buf" within OS memory space(?) to copy mapped ADC data before sending it to TCP/IP client. Can one just send chunks of "ram"-mapped space directly, without buffering and forking? Also, the "buf" size is set to be the same as "ram" - is that just a coincidence or a requirement?

pavel
Posts: 799
Joined: Sat May 23, 2015 5:22 pm

Re: DMA IP cores and DMA Linux drivers

Post by pavel » Thu Sep 10, 2015 7:27 am

As far as I understand, you have to map a secondary buffer "buf" within OS memory space(?) to copy mapped ADC data before sending it to TCP/IP client. Can one just send chunks of "ram"-mapped space directly, without buffering and forking?
Yes, it's possible to send chunks of "ram"-mapped space directly. The transfer rate would be limited to about 20 MB/s.

With buffering without forking, the transfer rate is about 50 MB/s. I've copied it from the following code by Nils Roos:
https://github.com/bkinman/rp_remote_ac ... e54e650e6b

With buffering and forking, both CPUs are used and the transfer rate is about 65 MB/s.
Also, the "buf" size is set to be the same as "ram" - is that just a coincidence or a requirement?
It's a coincidence.

fbalakirev
Posts: 101
Joined: Thu Sep 03, 2015 6:56 pm

Re: DMA IP cores and DMA Linux drivers

Post by fbalakirev » Fri Sep 11, 2015 4:14 am

Thanks for the explanation!

How's the sts_data output of you AXI4-stream writer IP is scaled in relation to actual write address?

pavel
Posts: 799
Joined: Sat May 23, 2015 5:22 pm

Re: DMA IP cores and DMA Linux drivers

Post by pavel » Fri Sep 11, 2015 8:31 am

fbalakirev wrote:How's the sts_data output of you AXI4-stream writer IP is scaled in relation to actual write address?
The sts_data port is directly connected to an internal counter (int_addr_reg). The width of the HP bus is 64 bits. The memory is byte-addressable. The internal counter counts the number of 64-bit words sent to the HP bus. So, we have a factor of 8 here.

The cfg_data port defines the base address. AXI4-Stream RAM Writer writes data in bursts of 16*8=128 bytes. The AXI specification does not permit AXI bursts to cross 4 kB address boundaries. So, cfg_data should be divisible by 128.

The actual write address can be calculated as:

Code: Select all

cfg_data + 8*sts_data
You can find a similar expression on line 137 in axis_ram_writer.v:
https://github.com/pavel-demin/red-pita ... ter.v#L137

fbalakirev
Posts: 101
Joined: Thu Sep 03, 2015 6:56 pm

Re: DMA IP cores and DMA Linux drivers

Post by fbalakirev » Fri Sep 11, 2015 4:12 pm

Thanks for info!

I guess if I want to record over 256MB I need to increase ADDR_WIDTH parameter for AXI writer to at least 26 then?

fbalakirev
Posts: 101
Joined: Thu Sep 03, 2015 6:56 pm

Re: DMA IP cores and DMA Linux drivers

Post by fbalakirev » Sat Sep 12, 2015 7:39 pm

Hi Pavel,

So far I was able to record up to 384 MB of continuous wave forms in one go at 125 MSPS with your slightly modified packetizer_test example. Thanks for all the help so far!

Post Reply
jadalnie klasyczne ekskluzywne meble wypoczynkowe do salonu ekskluzywne meble tapicerowane ekskluzywne meble do sypialni ekskluzywne meble włoskie

Who is online

Users browsing this forum: No registered users and 31 guests