Discussions about active development projects
#1034 by PiccoLas
Sat Oct 25, 2014 10:16 am

i have create a simple MatLab/Octave script to import the raw binary data. The recalculation of the two's complement was a little bit tricky, but i found an example what i have a little bit modified.

i have taken a ADC snapshot with Nils kernel driver with
Code: Select allcat /red/rpad0 > test_ddrdump.txt
and copied this file to my normal desktop. To plot the first points i have written this small script:

Code: Select all%% plot signal

%% first clean up
close all; clear all; clc;

%% Initial variables
file_dir = pwd;
file_name = [ file_dir '\' 'test_ddrdump.txt'];
% some variables to scale the later plot
n_plot_points = 200 ;
dec_factor = 1;
sample_rate = 125e6 / dec_factor;

%% read data from file, interpret as unsigned int16
fid = fopen(file_name);
adc = fscanf(fid, '%x');
fclose (fid);
% reformat to matrix (1MSample, 2 channels)
adc = reshape(adc,1024*1024,2);

% transfer from "misinterpreted" two's complement to double  and scale to voltage
adc (:,1) = tc2dec(adc(:,1),14) /2^12;
adc (:,2) = tc2dec(adc(:,2),14) /2^12;

%% plot first n_plot_points data points
% tic
% time scaling
n_points = size (adc(:,1));
time = [0:1/sample_rate: (n_points-1) * 1/sample_rate];
% ploting
plot (time(1:n_plot_points),adc(1:n_plot_points,1),'b');
hold on;
plot (time(1:n_plot_points),adc(1:n_plot_points,2),'r');
hold off;
xlabel ('time [s]');
ylabel ('Voltage [V]');
% toc

and for converting the "wrong" misinterpreted Int16- to double-values i use this function:

Code: Select all%% modified from http://www.mathworks.nl/matlabcentral/fileexchange/5485-two-s-complement-for-matlab
%% modified example from Hassan Naseri

% input as int16 val, but wrong bit alligned, output as double
function value = tc2dec(val,N)
%  val = bin2dec(bin); % removed from original example, cause we have dec values
  % modified for matrix calculation
  y = sign(2.^(N-1)-val).*(2.^(N-1)-abs(2.^(N-1)-val));
  if ((y == 0) && (val ~= 0))
   value = -val;
   value = y;

% reverse function, too export the data backwards in two's complement
function value = dec2tc(dec, N)
  value = dec2bin(mod((dec),2^N),N);

Perhaps for some one interested/usefully


#1036 by Nils Roos
Sat Oct 25, 2014 5:28 pm
Thank you very much for the contribution. Since I don't have any experience with Matlab & co, I would invite anyone who does to try it out, see if it is useful for you.

In other news, a first test of sending sample data over the network topped out at ~34MS/s. Not bad at all for starters.
#1037 by bexizuo
Sat Oct 25, 2014 8:49 pm
wow, that's good news :o ... if there is used UDP protocol there can happen packet loss ,
so for UDP stream there could be some sequence counter for example first 2 bytes to check if packets came as they were send,

could you please test TCP ? ... so it could be simplier for checking packet loss,

good luck ... :)
#1131 by SampsaRanta
Sat Nov 15, 2014 10:13 am
Nils, your work is inspiring to see.

I read the FPGA code you have there. Do you think it would be good idea to implement some kind of ddr_next_addr thing that could be loaded before the previous buffer finishes? Using large enough buffers might give the CPU enough time to refresh the next buffer in queue and allow more continous operation. After this one goes in use, CPU driver side can then see the ddr_next_addr in ddr_curr_addr and load next one..

And after that, I'm thinking about solution to have the ring buffer of addresses in the BRAM in DDR Dump firmware and trigger load of the ddr_next_addr stuff from there. This would good to have feature for say network streaming, could use packet size buffers (see zero copy below) and also would not require as much real time work to track the status of ddr_curr_addr. Who would even want to have scope functionality if continous capture can be archived? When this continous capture and streaming is archived, scope functionality can be left in the other FPGA branch, no reason to waste bits for that in continous capture DDR Dump code if they both need same resources.

Typical ways network stacks address performance limits are,
- ring buffer (multiple buffers in ring) and interrupt service routine to feed the ring buffer with empty buffers
- zero copy (no need to move data while processing it forward, like in routing, just pass the buffer to next device for processing with as little modifications needed as possible)

For network drivers, if you have problem with small packet loss, your driver might allow increasing number of buffers. Intel server type network chips typicially can increase the buffer size from default 256 to 4096 or something.. However Java might not the best way get performance.

As an example of very high speed optimized solution I would give you Netmap and pkt-gen, these have optimized code to load data directly into network cards ring buffer. It totally skips the OS IP stack. In this the data locates in memory so that the header can be prepended (the header for MAC address source and destination atleast) , then you just need to put the packet into output ring buffer buffer. After that, receiving it from ethernet is a trivial thing, trust me.


Also, if you could share how to extend the buffer to full memory, this would be interesting to know.
#1138 by Nils Roos
Sat Nov 15, 2014 7:27 pm
Hi Sampsa,
thank you very much for taking an interest, and for the constructive feedback. I spent most of today looking over the netmap code. Since I am new to linux virtual memory management, having it as a reference is invaluable.
Do you think it would be good idea to implement some kind of ddr_next_addr thing that could be loaded before the previous buffer finishes?

The ddr_[ab]_base registers already function in that way, they are only read when the ddr_[ab]_end mark is reached or an address (re)load is signaled via the ddr_control register. They can be preloaded with the next buffer address at any time in between.
Also, if you could share how to extend the buffer to full memory, this would be interesting to know.

This depends on whether you still want to run linux on the device or not. In the former case, it is infeasible to use all of the available external RAM; however, the kernel can be told to only use a certain portion of RAM via its command line parameters (but I guess you knew that already). Thus, a sizable chunk - say 256MB - could be set aside for the buffers while still running a fully fledged OS. If you really wanted to use all the available external RAM, a bare metal application with network support could run from OCM.
Either way, the DDR Dump module accepts any addresses with 4kB alignment as buffer start and end, and will happily write to the specified range, no matter who is in possession from the MMU's point of view.
It loads the ddr_[ab]_base into ddr_[ab]_curr and then starts writing chunks of 2kB from its internal BRAM buffer into consecutive DDR locations until it reaches ddr_[ab]_end, whence it wraps around to ddr_[ab]_base again. ddr_[ab]_curr always points to the start of the currently written 2kB block and progresses in step with the BRAM buffer being filled from the decimated ADC data.

... and also would not require as much real time work to track the status of ddr_curr_addr.

One of the next things on my list is interrupt support, the tracking of ddr_curr was just a stop-gap measure to arrive at a working solution as quickly as possible.
Maintaining a scatter-gather list in BRAM is a very good suggestion and will certainly be considered on the way to achieving maximum performance. It also ties in nicely with the way the on-chip ENET controller organizes its data queues.

You wouldn't per-chance be interested in donating some of your expertise to this ? :mrgreen:

Again, many thanks for piping in.
#1142 by SampsaRanta
Sat Nov 15, 2014 11:05 pm

Actually I don't know if using BRAM is the best way to go or not, maybe you could load the address easily from DDR, too .. or what is easy to implement on FPGA. You might do good with linked list structure as well, having a pointer to next buffer loaded from DDR, triggered at time of taking the current ddr_base in use.In short, to archieve some basic continous capture functionality, just implement a mechanism that allows enough capture memory to be available for the capture. And maybe a signal from FPGA capture part that tells if it runs out of buffers to see if this is the case (unless you already have). If you use megabyte size of buffers, wait a bit for the capture to start, put next buffer in queue and try have the scheduler to run your code to check ddr_curr soon enough again to check if next buffer capture has started and there is need to queue next buffer, this might work for start.. Having bigger buffer or more extra buffers available in place where FPGA knows to find these just helps in case of scheduling jitter.

Now to think of it, just having the continous buffer working would be a good step forward. And if you're code can already load new target address after previous, it might not be that far away.

For network packet size, this is quite small compared to the data stream. Packet size on ethernet is normally 1500 bytes and gigabit ethernet has jumbo frame around 9000 bytes (if hardware supports jumbo frame, the data sheet for Zynq did not highlight this feature). So to run data directly to network, the capture buffer would need to change very often. And also have to try align these in such way the there would be place for network headers when giving the packet to network driver, I mean there might be align thing you need to follow on ethernet chip, too.. Maybe easier to test if the CPU has enough power to send data out packet by packet from the megabyte stream.

For next step towards continous capture, it might be also good to implement additional way of interfacing the capture buffers between user land and kernel space? Use of shared user land memory mapped buffers instead of current device read approach might do good. That would allow zero-copy to transfer to user land at this time and prototype the continous capture. There is a way to translate user land memory address to kernel memory address and then give that directly as buffer. There is something like this in the netmap code.

For HW interrupt alternative, there is also thing called softirq. And something called tasklets. These might help to implement alternative way to the while loop on tracking the buffer and prototype with current FPGA code..

Hope this helps.
#1143 by SampsaRanta
Sun Nov 16, 2014 11:45 am
Giving it some more tought..

Feeding small buffers to capture part might be not worth, as the capture part might be able to split the data for network packet size that the userland consumer can then feed in smaller chunks to ethernet rings.

Allowing to organize the capture buffer like
header (12 bytes ethernet + udp) + 2 bytes (packet running counter to track losses)
1498 bytes data
(not used part)

So this woud mean when the capture start, you would skip the first say 14 bytes, and then capture 1498, mark the capture on this packet finished and move to next 2k alignment.

There might be some other header stuff you need to consider, not sure. But there are few bytes to allow that.

Who is online

Users browsing this forum: No registered users and 1 guest