FPGA Based Beamforming in Simulink: Part 1 - Algorithm Design

This tutorial is the first of a two-part series that will guide you through how to develop a beamformer in Simulink suitable for implementation on hardware, such as a Field Programmable Gate Array (FPGA). It will also show how to compare the results of the implementation model with those of a behavioral model. The second part of the tutorial FPGA Based Beamforming in Simulink: Part 2 - Code Generation shows how to generate HDL code from the implementation model and verify that the generated HDL code produces the correct results compared to the behavioral model.

The tutorial shows how to design an FPGA implementation-ready beamformer to match a corresponding behavioral model in Simulink® using the Phased Array System Toolbox™, DSP System Toolbox™, and Fixed-Point Designer™. To verify the implementation model, it compares the simulation output of the implementation model with the output of the behavioral model.

The Phased Array System Toolbox™ is used to design and verify the floating-point functional algorithm, which provides the behavioral reference model. The behavioral model is then used to verify the results of the fixed-point, implementation model used to generate HDL code.

Fixed-Point Designer™ provides data types and tools for developing fixed-point and single-precision algorithms to optimize performance on embedded hardware. You can perform bit-true simulations to observe the impact of limited range and precision without implementing the design on hardware.

Partitioning Model for FPGA

The are three key modeling concepts to keep in mind when preparing a Simulink® model to target FPGAs:

  • Sample-based processing: Also commonly referred to as serial processing, is an efficient data processing technique in hardware designs that enables you to tradeoff between resources and throughput.

  • Subsystem targeted for HDL code generation: In order to generate HDL code from a model, the implementation algorithm must be inside a Simulink subsystem.

  • Time-aligned outputs of behavioral and implementation models: For comparing the outputs of the behavioral and FPGA implementation models, you must time align their outputs by adding latency to the behavioral model.

Beamforming Algorithm

In this example we use a Phase-Shift Beamformer as the behavioral algorithm, which is re-implemented in the HDL Algorithm subsystem using Simulink blocks that support HDL code generation. The beamformer's job is to calculate the phase required between each of the ten channels to maximize the received signal power in the direction of the incident angle. Below is the Simulink model with the behavioral algorithm and its corresponding implementation algorithm for an FPGA.

modelname = 'SimulinkBeamformingHDLWorkflowExample';

% Ensure model is visible and not obstructed by scopes.

The Simulink model has two branches. The top branch is the behavioral, floating-point model of our algorithm and the bottom branch is the functionally equivalent fixed-point version using blocks that support HDL code generation. Besides plotting the output of both branches to compare the two, we also calculate and plot the difference, or error, between both outputs.

Notice that there's a delay ($Z^{-55}$) at the output of the behavioral model. This is necessary because the implementation algorithm uses 55 delays to enable pipelining which creates latency that needs to be accounted for. Accounting for this latency is called delay balancing and is necessary to time-align the output between the behavioral model and the implementation model to make it easier to compare the results.

Multi-channel Receive Signal

To synthesize a received signal at the phased array antenna, the model includes a subsystem that generates a multi-channel signal. The Baseband Multi-channel Signal subsystem models a transmitted waveform and the received target echo at the incident angle captured via a 10-element antenna array. The subsystem also includes a receiver pre-amp model to account for receiver noise. This subsystem generates the input stimulus for our behavioral and implementation models.

% Open subsystem that generates the received multi-channel signal.
open_system([modelname '/Baseband Multi-channel Signal']);

Serialization and Quantization

The model includes a Serialization & Quantization subsystem which converts floating-point, frame-based signals to fixed-point, sample-based signals necessary for modeling streaming data in hardware. Sample-based processing was chosen because our system will run at less than 400 MHz; therefore, we're optimizing for resources instead of throughput.

% Open subsystem that serializes and quantizes the received signal.
open_system([modelname '/Serialization & Quantization']);

The input signal to the serialization subsystem has 10 channels with 300 samples per channel or a 300x10 size signal. The subsystem serializes, or unbuffers, the signal producing a sample-based signal that's 1x10, i.e., one sample per channel, which is then quantized to meet the requirements of our system.

The Quantize Signal block's output data type is set to:

  • Output data type = fixdt(1,12,19)

which is a signed, 12-bit word length and 19-bit fraction length. This precision was chosen because we're targeting a Xilinx® Virtex®-7 FPGA which has a 12-bit ADC. We try to minimize the number of fractional bits used but still have enough precision to ensure we meet the allowed quantization error for our application.

Designing the Implementation Subsystem

The HDL Algorithm subsystem, which is targeted for HDL code generation, implements the beamformer, which was designed using Simulink blocks that support HDL code generation.

The Angle2SteeringVec subsystem calculates the signal delay at each antenna element of a Uniform Linear Array (ULA). The delay is then fed to a multiply and accumulate (MAC) subsystem to perform beamforming.

% Open subsystem with HDL algorithm.
open_system([modelname '/HDL Algorithm']);

The algorithm in the HDL Algorithm subsystem is functionally equivalent to the phase-shift beamforming behavioral algorithm but can generate HDL code. There are three main differences that enables this subsystem to generate efficient HDL code:

  1. processing is performed serially, i.e., sample-based processing is used

  2. arithmetic is performed with fixed-point data types

  3. added delays to enable pipelining by HDL synthesis tool

To ensure proper clock timing, any delay added to one branch of the implementation model must be matched to all other parallel branches as seen above. The Angle2SteeringVec subsystem, for example, added 36 delays; therefore, the top branch of the HDL Algorithm subsystem includes a delay of 36 samples right before the MAC subsystem. Likewise, the MAC subsystem used 19 delays, which must be balanced by adding 19 delays to the output of the Angle2SteeringVec subsystem. Let's look inside the MAC subsystem to account for the 19 delays.

% Open the MAC subsystem.
open_system([modelname '/HDL Algorithm/MAC']);

Looking at the bottom branch of the MAC subsystem, we see a $Z^{-2}$, followed by the complex multiply block which contains a $Z^{-1}$, then there's a $Z^{-4}$, followed by 4 delays of $Z^{-3}$ for a total of 19 delays. The delay values are defined in the PreLoadFcn callback in Model Properties.

Calculating the Steering Vector

The Angle2SteeringVec subsystem breaks the task into a few steps to calculate the steering vector from the signal's angle of arrival. It first calculates the signal's arrival delay at each sensor by matrix multiplying the antenna element position in the array by the signal's incident direction. The delays are then fed to the SinCos subsystem which calculates the trigonometric functions sine and cosine using the simple and efficient CORDIC algorithm.

% Open the Angle2SteeringVec subsystem.
open_system([modelname '/HDL Algorithm/Angle2SteeringVec']);

Because our design consists of a 10 element ULA spaced at half-wavelength, the antenna element position is based on the spacing between each antenna element measured outwardly from the center of the antenna array. We can define the spacing between elements as a vector of 10 numbers ranging from -6.7453 to 6.7453, i.e., with a spacing of 1/2 wavelength, which is 2.99/2. Given that we're using fixed-point arithmetic, the data type used for the element spacing vector is fixdt(1,8,4), i.e., a signed 8-bit word length and 4-bit fraction length numeric data type.


To compare your sample-based, fixed-point, implementation design with the floating-point, frame-based, behavioral design you need to deserialize the output of the implementation subsystem and convert it to a floating-point data type. Alternatively, you can compare the results directly with sample-based signals but then you must unbuffer the output of the behavioral model as shown:

to match the sample-based signal output from the implementation algorithm. In this case, you only need to convert the output of the HDL Algorithm subsystem to floating-point by setting the Data Type Conversion block's output data type to double.

Comparing Output of HDL Model to Behavioral Model

Run the model to display the results. You can run the Simulink model by clicking the Play button or calling the sim command in the MATLAB command line. Use the scopes to compare the outputs visually.


As seen in the Time Scopes showing the Beamformed Signal and Beamformed Signal (HDL), the two signals are nearly identical. We can see the error on the order of 10^-3 in the Error scope. This shows that the HDL Algorithm subsystem is producing the same results as the behavioral model within quantization error. This is an important first step before generating HDL code.

Because the HDL model used 55 delays, the scope titled HDL Beamformed Signal is delayed by 55ms when compared to the original transmitted or beamformed signal shown on the Behavioral Beamformed Signal scope.


This example is the first of a two-part tutorial series on how to design an FPGA implementation-ready algorithm, automatically generate HDL code, and verify the HDL code in Simulink. This example showed how to use blocks from the Phased Array System Toolbox to create a behavioral model, to serve as a golden reference, and how to create a subsystem for implementation using Simulink blocks that support HDL code generation. It also compared the output of the implementation model to the output of the corresponding behavioral model to verify that the two algorithms are functionally equivalent.

Once you verify that your implementation algorithm is functionally equivalent to your golden reference, you can use HDL Coder™ Generate HDL Code from Simulink to generate HDL code and test bench for the HDL Algorithm subsystem. Use HDL Verifier™ to generate a SystemVerilog DPI Test Bench (HDL Coder) HDL test bench.

The second part of this two-part tutorial series FPGA Based Beamforming in Simulink: Part 2 - Code Generation shows how to generate HDL code from the implementation model and verify that the generated HDL code produces the same results as the behavioral model as well as the implementation model.