DMA IP cores and DMA Linux drivers
-
- Posts: 54
- Joined: Wed Mar 11, 2015 3:07 pm
DMA IP cores and DMA Linux drivers
Hello,
I wonder if there is a faster way to transfer the adc samples to the ARM core besides mmap the fpga memory to linux...
How much improvement would you get with DMA transfer? Can you share your knowledge?
I have found the following projects. @Pavel you are saying they are to complicated to use?
https://github.com/bmartini/zynq-xdma
https://github.com/bmartini/zynq-axis
Thanks,
A
I wonder if there is a faster way to transfer the adc samples to the ARM core besides mmap the fpga memory to linux...
How much improvement would you get with DMA transfer? Can you share your knowledge?
I have found the following projects. @Pavel you are saying they are to complicated to use?
https://github.com/bmartini/zynq-xdma
https://github.com/bmartini/zynq-axis
Thanks,
A
-
- Posts: 799
- Joined: Sat May 23, 2015 5:22 pm
Re: DMA IP cores and DMA Linux drivers
Hi Arnold,
First of all, mmap does not affect the performance in any way.
The default Red Pitaya FPGA configuration uses a block RAM (BRAM) buffer connected to the GP bus:
ADC -> BRAM buffer -> GP bus -> CPU
Alternatively, the ADC can be connected to the HP bus and then it can write data to the on-board DDR3 RAM (this approach is often called DMA):
ADC -> HP bus -> DDR3 RAM -> CPU
The HP bus is faster than the GP bus and CPU access to DDR3 RAM is faster than to BRAM via the GP bus.
You can find all the details about the GP and HP buses in the chapter 5 of the following document:
http://www.xilinx.com/support/documenta ... 00-TRM.pdf
The BRAM+GP bus approach is fast enough for the following applications:
Possibilities to control the data transfers from FPGA:
Possibilities to control the IP cores and the Verilog/SystemVerilog/VHDL modules from a Linux application running on the CPU:
I found this idea in the following article:
http://blog.fakultaet-technik.de/develo ... oot-files/
Then I use a custom DMA IP core together with /dev/mem to control the IP core and to access the data stored in the DDR3 RAM buffer.
Here are the pluses of this approach:
https://github.com/pavel-demin/red-pita ... riter_v1_0
and here are two simple projects that use this IP core:
https://github.com/pavel-demin/red-pita ... s/adc_test
https://github.com/pavel-demin/red-pita ... tizer_test
Cheers,
Pavel
Just to be sure that I understand correctly this part.I wonder if there is a faster way to transfer the adc samples to the ARM core besides mmap the fpga memory to linux...
First of all, mmap does not affect the performance in any way.
The default Red Pitaya FPGA configuration uses a block RAM (BRAM) buffer connected to the GP bus:
ADC -> BRAM buffer -> GP bus -> CPU
Alternatively, the ADC can be connected to the HP bus and then it can write data to the on-board DDR3 RAM (this approach is often called DMA):
ADC -> HP bus -> DDR3 RAM -> CPU
The HP bus is faster than the GP bus and CPU access to DDR3 RAM is faster than to BRAM via the GP bus.
You can find all the details about the GP and HP buses in the chapter 5 of the following document:
http://www.xilinx.com/support/documenta ... 00-TRM.pdf
It's very application dependent.How much improvement would you get with DMA transfer?
The BRAM+GP bus approach is fast enough for the following applications:
- record relatively short (less than 120k) series of the ADC samples at full speed (125 MSPS)
- transfer the ADC samples to the CPU after some pre-processing (for example, decimation) done on the FPGA at relatively low speeds (less than 10 MSPS)
- record long (1-100M) series of the ADC samples at high speed (10-125 MSPS)
- continuously transfer the ADC samples to the CPU at high speed (10-125 MSPS)
Possibilities to control the data transfers from FPGA:
- Xilinx DMA IP cores
- custom DMA IP cores
- custom Verilog/SystemVerilog/VHDL modules
Possibilities to control the IP cores and the Verilog/SystemVerilog/VHDL modules from a Linux application running on the CPU:
- /dev/mem
- UIO driver
- custom Linux driver
- /dev/mem
- UIO driver
- custom Linux driver
For my ZYNQ based projects, I reduce the amount of DDR3 RAM accessible to Linux and use a memory block at the end of the DDR3 RAM address range as a buffer for DMA.Can you share your knowledge?
I found this idea in the following article:
http://blog.fakultaet-technik.de/develo ... oot-files/
Then I use a custom DMA IP core together with /dev/mem to control the IP core and to access the data stored in the DDR3 RAM buffer.
Here are the pluses of this approach:
- very simple DMA
- low FPGA resource usage
- no need for scatter-gather
- no need for Linux driver (I use /dev/mem)
- Linux CMA uses the same approach (so, I'm not inventing anything new)
- can't think of any minuses
https://github.com/pavel-demin/red-pita ... riter_v1_0
and here are two simple projects that use this IP core:
https://github.com/pavel-demin/red-pita ... s/adc_test
https://github.com/pavel-demin/red-pita ... tizer_test
Cheers,
Pavel
-
- Posts: 101
- Joined: Thu Sep 03, 2015 6:56 pm
Re: DMA IP cores and DMA Linux drivers
Hi Pavel,
I have a question about your adc-test-server.c example of dealing with DMA. As far as I understand, you have to map a secondary buffer "buf" within OS memory space(?) to copy mapped ADC data before sending it to TCP/IP client. Can one just send chunks of "ram"-mapped space directly, without buffering and forking? Also, the "buf" size is set to be the same as "ram" - is that just a coincidence or a requirement?
I have a question about your adc-test-server.c example of dealing with DMA. As far as I understand, you have to map a secondary buffer "buf" within OS memory space(?) to copy mapped ADC data before sending it to TCP/IP client. Can one just send chunks of "ram"-mapped space directly, without buffering and forking? Also, the "buf" size is set to be the same as "ram" - is that just a coincidence or a requirement?
-
- Posts: 799
- Joined: Sat May 23, 2015 5:22 pm
Re: DMA IP cores and DMA Linux drivers
Yes, it's possible to send chunks of "ram"-mapped space directly. The transfer rate would be limited to about 20 MB/s.As far as I understand, you have to map a secondary buffer "buf" within OS memory space(?) to copy mapped ADC data before sending it to TCP/IP client. Can one just send chunks of "ram"-mapped space directly, without buffering and forking?
With buffering without forking, the transfer rate is about 50 MB/s. I've copied it from the following code by Nils Roos:
https://github.com/bkinman/rp_remote_ac ... e54e650e6b
With buffering and forking, both CPUs are used and the transfer rate is about 65 MB/s.
It's a coincidence.Also, the "buf" size is set to be the same as "ram" - is that just a coincidence or a requirement?
-
- Posts: 101
- Joined: Thu Sep 03, 2015 6:56 pm
Re: DMA IP cores and DMA Linux drivers
Thanks for the explanation!
How's the sts_data output of you AXI4-stream writer IP is scaled in relation to actual write address?
How's the sts_data output of you AXI4-stream writer IP is scaled in relation to actual write address?
-
- Posts: 799
- Joined: Sat May 23, 2015 5:22 pm
Re: DMA IP cores and DMA Linux drivers
The sts_data port is directly connected to an internal counter (int_addr_reg). The width of the HP bus is 64 bits. The memory is byte-addressable. The internal counter counts the number of 64-bit words sent to the HP bus. So, we have a factor of 8 here.fbalakirev wrote:How's the sts_data output of you AXI4-stream writer IP is scaled in relation to actual write address?
The cfg_data port defines the base address. AXI4-Stream RAM Writer writes data in bursts of 16*8=128 bytes. The AXI specification does not permit AXI bursts to cross 4 kB address boundaries. So, cfg_data should be divisible by 128.
The actual write address can be calculated as:
Code: Select all
cfg_data + 8*sts_data
https://github.com/pavel-demin/red-pita ... ter.v#L137
-
- Posts: 101
- Joined: Thu Sep 03, 2015 6:56 pm
Re: DMA IP cores and DMA Linux drivers
Thanks for info!
I guess if I want to record over 256MB I need to increase ADDR_WIDTH parameter for AXI writer to at least 26 then?
I guess if I want to record over 256MB I need to increase ADDR_WIDTH parameter for AXI writer to at least 26 then?
-
- Posts: 101
- Joined: Thu Sep 03, 2015 6:56 pm
Re: DMA IP cores and DMA Linux drivers
Hi Pavel,
So far I was able to record up to 384 MB of continuous wave forms in one go at 125 MSPS with your slightly modified packetizer_test example. Thanks for all the help so far!
So far I was able to record up to 384 MB of continuous wave forms in one go at 125 MSPS with your slightly modified packetizer_test example. Thanks for all the help so far!
jadalnie klasyczne ekskluzywne meble wypoczynkowe do salonu ekskluzywne meble tapicerowane ekskluzywne meble do sypialni ekskluzywne meble włoskie
Who is online
Users browsing this forum: No registered users and 31 guests