Real Partial-Systolic Q-less QR Decomposition

Q-less QR decomposition for real-valued matrices

Libraries:
Fixed-Point Designer HDL Support / Matrices and Linear Algebra / Matrix Factorizations

Description

The Real Partial-Systolic Q-less QR Decomposition block uses QR decomposition to compute the economy size upper-triangular R factor of the QR decomposition A = QR, where A is a real-valued matrix, without computing Q. The solution to A'Ax = B is x = R\R'\b.

When Regularization parameter is nonzero, the Real Partial-Systolic Q-less QR Decomposition block computes the upper-triangular factor R of the economy size QR decomposition of $[\begin{matrix} λ I_{n} \\ A \end{matrix}]$ where λ is the regularization parameter.

Examples

Implement Hardware-Efficient Real Partial-Systolic Q-less QR Decomposition

How to use the Real Partial-Systolic Q-less QR Decomposition block.

Open Script

Determine Fixed-Point Types for Q-less QR Decomposition

Use fixed.qlessqrFixedpointTypes to determine fixed-point types for computation of Q-less QR decomposition.

Open Live Script

Ports

Input

expand all

A(i,:) — Rows of real matrix A
vector

Rows of real matrix A, specified as a vector. A is an m-by-n matrix where m ≥ 2 and n ≥ 2. If A is a fixed-point data type, A must be signed and use binary-point scaling. Slope-bias representation is not supported for fixed-point data types.

Data Types: single | double | fixed point

validIn — Whether inputs are valid
`Boolean` scalar

Whether inputs are valid, specified as a Boolean scalar. This control signal indicates when the data from the A(i,:) input port is valid. When this value is 1 (true) and the value of ready is 1 (true), the block captures the values at the A(i,:) input port. When this value is 0 (false), the block ignores the input samples.

After sending a true validIn signal, there may be some delay before ready is set to false. To ensure all data is processed, you must wait until ready is set to false before sending another true validIn signal.

Data Types: Boolean

restart — Whether to clear internal states
`Boolean` scalar

Whether to clear internal states, specified as a Boolean scalar. When this value is 1 (true), the block stops the current calculation and clears all internal states. When this value is 0 (false) and the value at validIn is 1 (true), the block begins a new subframe.

Data Types: Boolean

Output

expand all

R — Upper-triangular matrix R
matrix

Economy size QR decomposition matrix R, returned as a vector or matrix. R is an upper triangular matrix. The size of matrix R is n-by-n. The output at R has the same data type as the input at A(i,:).

Data Types: single | double | fixed point

validOut — Whether output data is valid
`Boolean` scalar

Whether the output data is valid, specified as a Boolean scalar. This control signal indicates when the data at output port R is valid. When this value is 1 (true), the block has successfully computed the matrix R. When this value is 0 (false), the output data is not valid.

Data Types: Boolean

ready — Whether block is ready
`Boolean` scalar

Whether the block is ready, returned as a Boolean scalar. This control signal indicates when the block is ready for new input data. When this value is 1 (true) and validIn is 1 (true), the block accepts input data in the next time step. When this value is 0 (false), the block ignores input data in the next time step.

Data Types: Boolean

Parameters

expand all

Number of rows in matrix A — Number of rows in input matrix A
`4` (default) | positive integer-valued scalar

Number of rows in input matrix A, specified as a positive integer-valued scalar.

Programmatic Use

Block Parameter: m

Type: character vector

Values: positive integer-valued scalar

Default: 4

Number of columns in matrix A — Number of columns in input matrix A
`4` (default) | positive integer-valued scalar

Number of columns in input matrix A, specified as a positive integer-valued scalar.

Programmatic Use

Block Parameter: n

Type: character vector

Values: positive integer-valued scalar

Default: 4

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

Regularization parameter, specified as a nonnegative scalar. Small, positive values of the regularization parameter can improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of the estimate often results in a smaller mean squared error when compared to least-squares estimates.

Programmatic Use

Block Parameter: regularizationParameter

Type: character vector

Values: real nonnegative scalar

Default: 0

Algorithms

expand all

Choosing the Implementation Method

Systolic implementations prioritize speed of computations over space constraints, while burst implementations prioritize space constraints at the expense of speed of the operations. The following table illustrates the tradeoffs between the implementations available for matrix decompositions and solving systems of linear equations.

Implementation	Throughput	Latency	Area
Systolic	High	O(nlog2(m))	O(mn²)
Partial-Systolic	Medium	O(mn)	O(n²)
Burst	Low	O(mn)	O(n)

Where m is the number of rows in matrix A and n is the number of columns in matrix A. Regardless of architecture, a larger word length results in lower throughput, larger latency, and larger area.

For additional considerations in selecting a block for your application, see Choose a Block for HDL-Optimized Fixed-Point Matrix Operations.

AMBA AXI Handshake Process

This block uses the AMBA AXI handshake protocol [1]. The valid/ready handshake process is used to transfer data and control information. This two-way control mechanism allows both the manager and subordinate to control the rate at which information moves between manager and subordinate. A valid signal indicates when data is available. The ready signal indicates that the block can accept the data. Transfer of data occurs only when both the valid and ready signals are high.

Block Timing

The Partial-Systolic Q-less QR Decomposition blocks accept and process the matrix A row by row. After accepting m rows, the block outputs the R matrices as single vectors. The partial-systolic implementation uses a pipelined structure, so the block can accept new matrix inputs before outputting the result of the current matrix.

For example, assume that the input A matrix is 3-by-3. Additionally assume that validIn asserts before ready, meaning that the upstream data source is faster than the QR decomposition.

Timing diagram for the Partial-Systolic Q-less QR Decomposition blocks.

In the figure,

A1r1 is the first row of the first A matrix, R1 is the first R matrix, and so on.
validIn to ready — From a successful row input to the block being ready to accept the next row.
Last row validIn to validOut — From the last row input to the block starting to output the solution.

The following table provides details of the timing for the Partial-Systolic Q-less QR Decomposition blocks.

Block	`validIn` to `ready` (cycles)	Last Row `validIn` to `validOut` (cycles)
Real Partial-Systolic Q-less QR Decomposition	wl + 7	(wl + 6)*n + 3
Complex Partial-Systolic Q-less QR Decomposition	wl + 9	(wl + 7.5)2n + 3

In the table, m represents the number of rows in matrix A, and n is the number of columns in matrix A. wl represents the word length.

If the data type of A is double, then wl is 53.
If the data type of A is single, then wl is 24.
If the data type of A is fixed point, then wl is the word length.

Hardware Resource Utilization

This block supports HDL code generation using the Simulink^® HDL Workflow Advisor. For an example, see HDL Code Generation and FPGA Synthesis from Simulink Model (HDL Coder) and Implement Digital Downconverter for FPGA (DSP HDL Toolbox).

In R2023a: The table below shows a summary of the resource utilization results.

This example data was generated by synthesizing the block on a Xilinx^® Zynq^®-7 ZC706 evaluation board (-2 speed grade).

The following parameters were used for synthesis.

Block parameters:
- m = 10
- n = 10
- p = 1
- Matrix A dimension: 10-by-10
- Matrix B dimension: 10-by-1
Input data type: sfix18_En12

Resource	Usage
LUT	29896
LUTRAM	994
Flip Flop	18953

In R2022b: The tables below show the post place-and-route resource utilization results and timing summary, respectively.

This example data was generated by synthesizing the block on a Xilinx Zynq UltraScale™ + RFSoC ZCU111 evaluation board. The synthesis tool was Vivado^® v.2020.2 (win64).

The following parameters were used for synthesis.

Block parameters:
- m = 16
- n = 16
- p = 1
- Matrix A dimension: 16-by-16
- Matrix B dimension: 16-by-1
Input data type: sfix16_En14
Target frequency: 300 MHz

Resource	Usage	Available	Utilization (%)
CLB LUTs	96911	425280	22.79
CLB Registers	77355	850560	9.09
DSPs	0	4272	0.00
Block RAM Tile	0	1080	0.00
URAM	0	80	0.00

	Value
Requirement	3.3333 ns
Data Path Delay	3.221 ns
Slack	0.095 ns
Clock Frequency	308.80 MHz

References

[1] "AMBA AXI and ACE Protocol Specification Version E." https://developer.arm.com/documentation/ihi0022/e/

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Slope-bias representation is not supported for fixed-point data types.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

HDL Coder™ provides additional configuration options that affect HDL implementation and synthesized logic.

HDL Architecture

This block has one default HDL architecture.

HDL Block Properties

General
ConstrainedOutputPipeline	Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is `0`. For more details, see ConstrainedOutputPipeline (HDL Coder).
InputPipeline	Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see InputPipeline (HDL Coder).
OutputPipeline	Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is `0`. For more details, see OutputPipeline (HDL Coder).

Restrictions

Supports fixed-point data types only.

Version History

Introduced in R2020b

expand all

R2023a: Smart unrolling for improved resource utilization

When you update the diagram, the loop which composes the partial-systolic pipeline is unrolled. This updated internal architecture removes dead operations in simulation and generated code, resulting in a significant decrease in the number of hardware resources required. This block simulates with clock and bit-true fidelity with respect to library versions of these blocks in previous releases.

Resource	R2022b	R2023a
LUT	54305	29896
LUTRAM	1090	994
Flip Flop	33901	18953

This example data was generated by synthesizing the block on a Xilinx Zynq-7 ZC706 evaluation board (-2 speed grade).

The following parameters were used for synthesis.

Block parameters:
- m = 10
- n = 10
- p = 1
- Matrix A dimension: 10-by-10
- Matrix B dimension: 10-by-1
Input data type: sfix18_En12

R2022a: Support for Tikhonov regularization parameter

The RealPartial-Systolic Q-less QR Decomposition block now supports the Tikhonov Regularization parameter.

Real Partial-Systolic Q-less QR Decomposition

Description

Examples

Implement Hardware-Efficient Real Partial-Systolic Q-less QR Decomposition

Determine Fixed-Point Types for Q-less QR Decomposition

Ports

Input

A(i,:) — Rows of real matrix A
vector

validIn — Whether inputs are valid
`Boolean` scalar

restart — Whether to clear internal states
`Boolean` scalar

Output

R — Upper-triangular matrix R
matrix

validOut — Whether output data is valid
`Boolean` scalar

ready — Whether block is ready
`Boolean` scalar

Parameters

Number of rows in matrix A — Number of rows in input matrix A
`4` (default) | positive integer-valued scalar

Programmatic Use

Number of columns in matrix A — Number of columns in input matrix A
`4` (default) | positive integer-valued scalar

Programmatic Use

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

Programmatic Use

Algorithms

Choosing the Implementation Method

AMBA AXI Handshake Process

Block Timing

Hardware Resource Utilization

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

R2023a: Smart unrolling for improved resource utilization

R2022a: Support for Tikhonov regularization parameter

See Also

Blocks

Functions

Topics

Real Partial-Systolic Q-less QR Decomposition

Description

Examples

Implement Hardware-Efficient Real Partial-Systolic Q-less QR Decomposition

Determine Fixed-Point Types for Q-less QR Decomposition

Ports

Input

A(i,:) — Rows of real matrix A vector

validIn — Whether inputs are valid Boolean scalar

restart — Whether to clear internal states Boolean scalar

Output

R — Upper-triangular matrix R matrix

validOut — Whether output data is valid Boolean scalar

ready — Whether block is ready Boolean scalar

Parameters

Number of rows in matrix A — Number of rows in input matrix A 4 (default) | positive integer-valued scalar

Programmatic Use

Number of columns in matrix A — Number of columns in input matrix A 4 (default) | positive integer-valued scalar

Programmatic Use

Regularization parameter — Regularization parameter 0 (default) | real nonnegative scalar

Programmatic Use

Algorithms

Choosing the Implementation Method

AMBA AXI Handshake Process

Block Timing

Hardware Resource Utilization

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

HDL Code Generation Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.

Version History

R2023a: Smart unrolling for improved resource utilization

R2022a: Support for Tikhonov regularization parameter

See Also

Blocks

Functions

Topics

A(i,:) — Rows of real matrix A
vector

validIn — Whether inputs are valid
`Boolean` scalar

restart — Whether to clear internal states
`Boolean` scalar

R — Upper-triangular matrix R
matrix

validOut — Whether output data is valid
`Boolean` scalar

ready — Whether block is ready
`Boolean` scalar

Number of rows in matrix A — Number of rows in input matrix A
`4` (default) | positive integer-valued scalar

Number of columns in matrix A — Number of columns in input matrix A
`4` (default) | positive integer-valued scalar

Regularization parameter — Regularization parameter
0 (default) | real nonnegative scalar

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

HDL Code Generation
Generate VHDL, Verilog and SystemVerilog code for FPGA and ASIC designs using HDL Coder™.