Systolic Matrix Solve Using QR Decomposition
Libraries:
Fixed-Point Designer HDL Support /
Matrices and Linear Algebra /
Linear System Solvers
Description
The Systolic Matrix Solve Using QR Decomposition block solves the system of linear equations AX = B using QR decomposition, where A and B are matrices. To compute X = A-1, set B to be the identity matrix.
The systolic implementation to minimizes system latency and increases the throughput. Systolic implementations require more hardware resources than burst or partial-systolic implementations.
Examples
How to Use Systolic Matrix Solve Using QR Decomposition Block
This example shows how to use the Systolic Matrix Solve Using QR Decomposition block to compute the solution X of the least-squares matrix equation AX = B.
Solve AX=B Using Economy Size QR Decomposition
The Systolic Matrix Solve Using QR Decomposition block first performs the QR decomposition using the Systolic QR Decomposition block which transforms A in-place to R and B in-place to C = Q'B. The block then solves the transformed system RX = C through back substitution, where QR is the orthogonal-triangular decomposition of A. For more information, see Systolic QR Decomposition.
I/O Interface
This block uses the AMBA AXI handshake protocol on both the input and the output side. On the input side, data transaction occurs when both validIn and ready signals are asserted. On the output side, data transaction occurs when both validOut and readyIn signals are asserted.
In this example, the validIn and readyIn signals remain asserted, which indicates the upstream source and downstream sinks are always available. When all matrices A and B are sent, the Data Handler loops back to the first A and B matrices.
Define Block Parameters
Specify the dimension of the sample matrices A and B, the number of input sample matrices, and the complexity of matrices A and B.
The Systolic Matrix Solve Using QR Decomposition block supports both real and complex inputs. Set the complexity of the input in the block mask accordingly.
m = 3;
n = 3;
p = 1;
numSamples = 3;
Complexity = "complex";
Generate Input A and B Matrices
Use the specified simulation parameters to generate the input matrices A
and
B
.
rng('default'); if strcmp(Complexity,'real') A = 2*(rand(m,n,numSamples)-0.5); B = 2*(rand(m,p,numSamples)-0.5); else A = complex(2*(rand(m,n,numSamples)-0.5),... 2*(rand(m,n,numSamples)-0.5)); B = complex(2*(rand(m,p,numSamples)-0.5),... 2*(rand(m,p,numSamples)-0.5)); end
Define and Cast to Input Datatype
Specify a data type for the inputs, and cast the inputs to that data type.
DT = "fixed";
For fixed-point data types, define the number of precision bits.
precisionBits = 18;
Cast the inputs to the selected data type.
if strcmp(Complexity,'real') T = fixed.realQRMatrixSolveFixedpointTypes(m,n,... max(abs(A(:))),max(abs(B(:))),precisionBits); else T = fixed.complexQRMatrixSolveFixedpointTypes(m,n,... max(abs(A(:))),max(abs(B(:))),precisionBits); end tv = castToDT(A,B,T,DT);
Open Model and Run Simulation
Open the SystolicQRSolverModel
model. Configure the model workspace and run the simulation.
model = 'SystolicQRSolverModel';
open_system(model);
fixed.example.setModelWorkspace(model,'A',tv.A,'B',tv.B,... 'm',m,'n',n,'p',p,'OutputType',tv.OutputType,'numSamples',numSamples); dutName = [model '/Systolic Matrix Solve Using QR Decomposition']; set_param(dutName,'complexity',Complexity); out = sim(model);
Verify Output Solutions
To verify the output solutions, compare the model output to the output of the MATLAB® qr()
function.
X = out.X; for i = 1:numSamples relative_error = norm(double(A(:,:,i)*X(:,:,i) - B(:,:,i)))/norm(double(B(:,:,i))) end
relative_error = 2.3054e-05
relative_error = 1.3028e-04
relative_error = 2.5335e-05
Verify Block Latency
The latency of the Systolic Matrix Solve Using QR Decomposition block depends on the datatype, dimension, and complexity of matrices A and B.
Use the embblk.latency.systolicQRMatrixSolverBlockTiming
function to calculate the expected throughput and latency of the block.
[expThroughput,expLatency] = embblk.latency.systolicQRMatrixSolverBlockTiming(tv.A,tv.B)
expThroughput = 25
expLatency = 183
Retrieve actual block throughput and latency from the simulation data.
tDataIn = find(out.logsout.get('Input Handshake').Values.Data == 1); tDataOut = find(out.logsout.get('Output Handshake').Values.Data == 1); actualthroughput = diff(tDataIn)
actualthroughput = 9×1
25
25
25
25
25
25
25
25
25
actualLatency = tDataOut - tDataIn(1:numSamples)
actualLatency = 3×1
183
183
183
Ports
Input
A — Matrix A
matrix
Matrix A, specified as a matrix. A is an m-by-n matrix where m ≥ 2 and m ≥ n. If B is single or double, A must be the same data type as B. If A is a fixed-point data type, A must be signed, use binary-point scaling, and have the same word length as B. Slope-bias representation is not supported for fixed-point data types.
Dependencies
Use the Signal type parameter to specify the complexity of the input data.
Data Types: single
| double
| fixed point
Complex Number Support: Yes
B — Matrix B
matrix
Matrix B, specified as a matrix. B is an m-by-p matrix where m ≥ 2. If A is single or double, B must be the same data type as A. If B is a fixed-point data type, B must be signed, use binary-point scaling, and have the same word length as A. Slope-bias representation is not supported for fixed-point data types.
Dependencies
Use the Signal type parameter to specify the complexity of the input data.
Data Types: single
| double
| fixed point
Complex Number Support: Yes
validIn — Whether inputs are valid
scalar
Whether inputs are valid, specified as a Boolean scalar. This control signal
indicates when the data from the A and B input ports are
valid. When this value is 1 (true
) and the value at
readyIn is 1 (true
), the block captures the values
on the A and B input ports. When this value is 0
(false
), the block ignores the input samples.
Tips
After sending a true
validIn signal, there may be some delay before readyIn is
set to false
. To ensure all data is processed, you must wait
until readyIn is set to false
before sending another
true
validIn signal.
Data Types: Boolean
readyIn — Whether downstream block is ready
scalar
Whether downstream block is ready, specified as a Boolean scalar. This control
signal monitors the ready port of the downstream block. When the
readyIn value is 1
(true
), and
the value at validOut is 1
(true
),
the block outputs data to the downstream block. When the readyIn value is
0
(false
), the downstream block is not ready
to accept data. The Systolic QR Decomposition block pauses on the
output stage and the ready signal remains 0
(false
) until the readyIn signal is high.
Data Types: Boolean
restart — Whether to clear internal states
scalar
Whether to clear internal states, specified as a Boolean scalar. When this value
is 1
(true
), the block stops the current
calculation and clears all internal states. When this value is 0
(false
) and the validIn value is 1
(true
), the block begins a new subframe.
Data Types: Boolean
Output
X — Matrix X
matrix
Matrix X, returned as a matrix.
Data Types: single
| double
| fixed point
Complex Number Support: Yes
validOut — Whether output data is valid
scalar
Whether output data is valid, returned as a Boolean scalar. This control signal
indicates when the data at output port X is valid. When this value is
1
(true
), the block has successfully computed
the X matrix. When this value is 0
(false
), the output data is not valid.
Data Types: Boolean
ready — Whether block is ready
scalar
Whether block is ready, returned as a Boolean scalar. This control signal that
indicates when the block is ready for new input data. When this value is
1
(true
) and the validIn value is
1
(true
), the block accepts input data in the
next time step. When this value is 0
(false
),
the block ignores input data in the next time step.
Tips
After sending a true
validIn signal, there may be some delay before ready is
set to false
. To ensure all data is processed, you must wait
until ready is set to false
before sending another
true
validIn signal.
Data Types: Boolean
Parameters
To edit block parameters interactively, use the Property Inspector. From the Simulink® Toolstrip, on the Simulation tab, in the Prepare gallery, select Property Inspector.
Number of rows in matrices A and B — Number of rows in matrices A and B
2
(default) | positive integer-valued scalar
Number of rows in input matrices A and B, specified as a positive integer-valued scalar.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter: | m |
Values: | 2 (default) | positive integer-valued scalar |
Data Types: | char |
Number of columns in matrix A — Number of columns in matrix A
2
(default) | positive integer-valued scalar
Number of columns in input matrix A, specified as a positive integer-valued scalar.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter: | n |
Values: | 2 (default) | positive integer-valued scalar |
Data Types: | char |
Number of columns in matrix B — Number of columns in matrix B
1
(default) | positive integer-valued scalar
Number of columns in input matrix B, specified as a positive integer-valued scalar.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter: | p |
Values: | 1 (default) | positive integer-valued scalar |
Data Types: | char |
Output datatype — Data type of output matrix X
fixdt(1,18,4)
(default) | double
| single
| fixdt(1,16,0)
| <data type expression>
Data type of the output matrix X, specified as
fixdt(1,16,0)
, fixdt(1,18,14)
,
double
, single
, or as a user-specified data type
expression. The type can be specified directly, or expressed as a data type object such
as Simulink.NumericType
.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter: | OutputType |
Values: | fixdt(1,18,4) (default) | double | single | fixdt(1,16,0) | <data type expression> |
Data Types: | char |
Signal type — Complexity of matrices A and B
real
(default) | complex
Complexity of matrices A and B, specified as
real
or complex
.
Programmatic Use
To set the block parameter value programmatically, use
the set_param
function.
To get the block parameter value
programmatically, use the get_param
function.
Parameter: | complexity |
Values: | real (default) | complex |
Data Types: | char |
Algorithms
Choosing the Implementation Method
Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.
Implementation | Throughput | Latency | Area |
---|---|---|---|
Systolic | C | O(n) | O(mn2) |
Partial-Systolic | C | O(m) | O(n2) |
Partial-Systolic with Forgetting Factor | C | O(n) | O(n2) |
Burst | O(n) | O(mn) | O(n) |
Where C is a constant proportional to the word length of the data, m is the number of rows in matrix A, and n is the number of columns in matrix A.
For additional considerations in selecting a block for your application, see Choose a Block for HDL-Optimized Fixed-Point Matrix Operations.
AMBA AXI Handshake Process
This block uses the AMBA AXI handshake protocol [1]. The valid/ready
handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid
signal indicates when data is available. The ready
signal indicates that the block can accept the data. Transfer of data occurs only when both the valid
and ready
signals are high.
Block Timing
The Systolic Matrix Solve Using QR Decomposition block accepts and process A and B as full matrices. After computation, the block outputs the X matrix. The systolic implementation uses a pipelined structure, so the block can accept new matrix inputs before outputting the result of the current matrix.
For example, assume that the input A and B matrices are 3-by-3. Additionally assume that validIn asserts before ready, meaning that the upstream data source is faster than the QR decomposition.
In the figure,
A1
is the first A matrix,B1
is the first B matrix,X1
is the first X matrix, and so on.Throughput — From a successful matrix input to the block being ready to accept the next matrix.
Latency — From a successful matrix input to the block starting to output the corresponding solution.
The latency of the Systolic Matrix Solve Using QR Decomposition block depends on the size (m, n, p), complexity, and word length (wl) of matrices A and B.
If the data types of A and B are double, then wl is 53.
If the data type of A and B are single, then wl is 24.
If the data types of A and B are fixed point, then wl is the word length.
Use the embblk.latency.systolicQRMatrixSolverBlockTiming
function to calculate the
latency and
throughput.
[throughput,latency] = embblk.latency.systolicQRMatrixSolverBlockTiming(A,B)
Hardware Resource Utilization
This block supports HDL code generation using the Simulink HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).
This example data was generated by synthesizing the block on a Xilinx® Zynq®-7000 xc7z045 SoC. The synthesis tool was Vivado® v2023.1 (win64).
The following parameters were used for synthesis.
Block parameters:
m = 4
n = 4
p = 1
Signal type:
real
Input matrix
A
andB
type:sfix16_en10
Output type:
sfix16_en10
Target frequency: 150 MHz
Resource | Usage | Available | Utilization (%) |
---|---|---|---|
Slice LUTs | 10348 | 218600 | 4.73 |
Slice Registers | 2989 | 437200 | 0.68 |
DSPs | 6 | 900 | 0.67 |
Block RAM Tile | 0 | 545 | 0.00 |
URAM | 0 | 0 |
Value | |
---|---|
Requirement | 6.6667 ns (150 MHz) |
Data Path Delay | 5.535 ns |
Slack | 1.113 ns |
Clock Frequency | 180.06 MHz |
The following parameters were used for synthesis.
Block parameters:
m = 4
n = 4
p = 1
Signal type:
complex
Input matrix
A
andB
type:sfix16_en10
Output type:
sfix16_en10
Target frequency: 150 MHz
Resource | Usage | Available | Utilization (%) |
---|---|---|---|
Slice LUTs | 30832 | 218600 | 14.10 |
Slice Registers | 8533 | 437200 | 1.95 |
DSPs | 20 | 900 | 2.22 |
Block RAM Tile | 0 | 545 | 0.00 |
URAM | 0 | 0 |
Value | |
---|---|
Requirement | 6.6667 ns (150 MHz) |
Data Path Delay | 6.394 ns |
Slack | 0.254 ns |
Clock Frequency | 155.94 MHz |
References
[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/AMBA-AXI3-and-AXI4-Protocol-Specification/Single-Interface-Requirements/Basic-read-and-write-transactions/Handshake-process
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.
Slope-bias representation is not supported for fixed-point data types.
HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.
HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
General | |
---|---|
ConstrainedOutputPipeline | Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is
|
InputPipeline | Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
OutputPipeline | Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
Supports fixed-point data types only. Fixed-point data types must use signed binary-point scaling. Slope and bias scaled types are not supported.
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)