KDTreeSearcher

Create Kd-tree nearest neighbor searcher

Description

`KDTreeSearcher` model objects store the results of a nearest neighbor search that uses the Kd-tree algorithm. Results include the training data, distance metric and its parameters, and maximum number of data points in each leaf node (that is, the bucket size). The Kd-tree algorithm partitions an n-by-K data set by recursively splitting n points in K-dimensional space into a binary tree.

Once you create a `KDTreeSearcher` model object, you can search the stored tree to find all neighboring points to the query data by performing a nearest neighbor search using `knnsearch` or a radius search using `rangesearch`. The Kd-tree algorithm is more efficient than the exhaustive search algorithm when K is small (that is, K ≤ 10), the training and query sets are not sparse, and the training and query sets have many observations.

Creation

Use either the `createns` function or the `KDTreeSearcher` function (described here) to create a `KDTreeSearcher` model object. Both functions use the same syntax except that the `createns` function has the `'NSMethod'` name-value pair argument, which you use to choose the nearest neighbor search method. The `createns` function also creates an `ExhaustiveSearcher` object. Specify `'NSMethod','kdtree'` to create a `KDTreeSearcher` object. The default is `'kdtree'` if K ≤ 10, the training data is not sparse, and the distance metric is Euclidean, city block, Chebychev, or Minkowski.

Syntax

``Mdl = KDTreeSearcher(X)``
``Mdl = KDTreeSearcher(X,Name,Value)``

Description

example

````Mdl = KDTreeSearcher(X)` grows a default Kd-tree (`Mdl`) using the n-by-K numeric matrix of training data (`X`).```

example

````Mdl = KDTreeSearcher(X,Name,Value)` specifies additional options using one or more name-value pair arguments. You can specify the maximum number of data points in each leaf node (that is, the bucket size) and the distance metric, and set the distance metric parameter (`DistParameter`) property. For example, `KDTreeSearcher(X,'Distance','minkowski','BucketSize',10)` specifies to use the Minkowski distance when searching for nearest neighbors and to use `10` for the bucket size. To specify `DistParameter`, use the `P` name-value pair argument.```

Input Arguments

expand all

Training data that grows the Kd-tree, specified as a numeric matrix. `X` has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

Data Types: `single` | `double`

Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose `Name` in quotes.

Example: `'Distance','minkowski','P',3,'BucketSize',10` specifies to use the following when searching for nearest neighbors: the Minkowski distance, `3` for the Minkowski distance metric exponent, and `10` for the bucket size.

Distance metric used when you call `knnsearch` or `rangesearch` to find nearest neighbors for future query points, specified as the comma-separated pair consisting of `'Distance'` and one of these values.

ValueDescription
`'chebychev'`Chebychev distance (maximum coordinate difference).
`'cityblock'`City block distance.
`'euclidean'`Euclidean distance.
`'minkowski'`Minkowski distance. The default exponent is 2. To specify a different exponent, use the `'P'` name-value pair argument.

For more details, see Distance Metrics.

The software does not use the distance metric for creating a `KDTreeSearcher` model object, so you can alter the distance metric by using dot notation after creating the object.

Example: `'Distance','minkowski'`

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of `'P'` and a positive scalar. This argument is valid only if `'Distance'` is `'minkowski'`.

Example: `'P',3`

Data Types: `single` | `double`

Maximum number of data points in each leaf node of the Kd-tree, specified as the comma-separated pair consisting of `'BucketSize'` and a positive integer.

Example: `'BucketSize',10`

Data Types: `single` | `double`

Properties

expand all

Training data that grows the Kd-tree, specified as a numeric matrix. `X` has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

The input argument `X` of `createns` or `KDTreeSearcher` sets this property.

Data Types: `single` | `double`

Distance metric used when you call `knnsearch` or `rangesearch` to find nearest neighbors for future query points, specified as `'chebychev'`, `'cityblock'`, `'euclidean'`, or `'minkowski'`.

The `'Distance'` name-value pair argument of `createns` or `KDTreeSearcher` sets this property.

The software does not use the distance metric for creating a `KDTreeSearcher` model object, so you can alter it by using dot notation.

Distance metric parameter values, specified as empty (`[]`) or a positive scalar.

If `Distance` is `'minkowski'`, then `DistParameter` is the exponent in the Minkowski distance formula. Otherwise, `DistParameter` is `[]`, indicating that the specified distance metric formula has no parameters.

The `'P'` name-value pair argument of `createns` or `KDTreeSearcher` sets this property.

You can alter `DistParameter` by using dot notation, for example, `Mdl.DistParameter = PNew`, where `PNew` is a positive scalar.

Data Types: `single` | `double`

Maximum number of data points in each leaf node of the Kd-tree, specified as a positive integer.

The `'BucketSize'` name-value pair argument of `createns` or `KDTreeSearcher` sets this property.

Data Types: `single` | `double`

Object Functions

 `knnsearch` Find k-nearest neighbors using searcher object `rangesearch` Find all neighbors within specified distance using searcher object

Examples

collapse all

Grow a four-dimensional Kd-tree that uses the Euclidean distance.

```load fisheriris X = meas; [n,k] = size(X)```
```n = 150 ```
```k = 4 ```

`X` has 150 observations and 4 predictors.

Grow a four-dimensional Kd-tree using the entire data set as training data.

`Mdl1 = KDTreeSearcher(X)`
```Mdl1 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

`Mdl1` is a `KDTreeSearcher` model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can grow a Kd-tree by using `createns`.

`Mdl2 = createns(X)`
```Mdl2 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

`Mdl2` is also a `KDTreeSearcher` model object, and it is equivalent to `Mdl1`. Because `X` has four columns and the default distance metric is Euclidean, `createns` creates a `KDTreeSearcher` model by default.

To find the nearest neighbors in `X` to a batch of query data, pass the `KDTreeSearcher` model object and the query data to `knnsearch` or `rangesearch`.

Load Fisher's iris data. Focus on the petal dimensions.

```load fisheriris X = meas(:,[3 4]); % Predictors```

Grow a two-dimensional Kd-tree using `createns` and the training data. Specify the Minkowski distance metric.

`Mdl = createns(X,'Distance','Minkowski')`
```Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 2 X: [150x2 double] ```

Because `X` has two columns and the distance metric is Minkowski, `createns` creates a `KDTreeSearcher` model object by default.

Access properties of `Mdl` by using dot notation. For example, use `Mdl.DistParameter` to access the Minkowski distance exponent.

`Mdl.DistParameter`
```ans = 2 ```

You can pass query data and `Mdl` to:

Create a `KDTreeSearcher` model object and alter the `Distance` property by using dot notation.

```load fisheriris X = meas;```

Grow a default four-dimensional Kd-tree using the entire data set as training data.

`Mdl = KDTreeSearcher(X)`
```Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

Specify that the neighbor searcher use the Minkowski metric to compute the distances between the training and query data.

`Mdl.Distance = 'minkowski'`
```Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 2 X: [150x4 double] ```

You can pass `Mdl` and the query data to either `knnsearch` or `rangesearch` to find the nearest neighbors to the points in the query data based on the Minkowski distance.

Grow a Kd-tree nearest neighbor searcher object by using the `createns` function. Pass the object and query data to the `knnsearch` function to find k-nearest neighbors.

`load fisheriris`

Remove five irises randomly from the predictor data to use as a query set.

```rng(1) % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data tIdx = ~ismember(1:n,qIdx); % Indices of training data Q = meas(qIdx,:); X = meas(tIdx,:);```

Grow a four-dimensional Kd-tree using the training data. Specify the Minkowski distance for finding nearest neighbors.

`Mdl = createns(X,'Distance','minkowski')`
```Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 2 X: [145x4 double] ```

Because `X` has four columns and the distance metric is Minkowski, `createns` creates a `KDTreeSearcher` model object by default. The Minkowski distance exponent is `2` by default.

Find the indices of the training data (`Mdl.X`) that are the two nearest neighbors of each point in the query data (`Q`).

`IdxNN = knnsearch(Mdl,Q,'K',2)`
```IdxNN = 5×2 17 4 6 2 1 12 89 66 124 100 ```

Each row of `IdxNN` corresponds to a query data observation, and the column order corresponds to the order of the nearest neighbors, with respect to ascending distance. For example, based on the Minkowski distance, the second nearest neighbor of `Q(3,:)` is `X(12,:)`.

Version History

Introduced in R2010a