subclust

Find cluster centers using subtractive clustering

Syntax

centers = subclust(data,clusterInfluenceRange)

centers = subclust(data,clusterInfluenceRange,Name,Value)

[centers,sigma]
= subclust(___)

Description

centers = subclust(data,clusterInfluenceRange) clusters input data using subtractive clustering with the specified cluster influence range, and returns the computed cluster centers. The subtractive clustering algorithm estimates the number of clusters in the input data.

example

centers = subclust(data,clusterInfluenceRange,Name,Value) clusters data using algorithm options specified by one or more Name,Value arguments.

example

[centers,sigma] = subclust(___) returns the sigma values specifying the range of influence of a cluster center in each of the data dimensions.

example

Examples

collapse all

Find Cluster Centers Using Subtractive Clustering

Open Live Script

Load data set.

load clusterDemo.dat

Find cluster centers using the same range of influence for all dimensions.

C = subclust(clusterDemo,0.6);

Each row of C contains one cluster center.

C = 3×3

    0.5779    0.2355    0.5133
    0.7797    0.8191    0.1801
    0.1959    0.6228    0.8363

Specify Bounds for Subtractive Clustering

Open Live Script

Load data set.

load clusterDemo.dat

Define minimum and maximum normalization bounds for each data dimension. Use the same bounds for each dimension.

dataScale = [-0.2 -0.2 -0.2;
              1.2  1.2  1.2];

Find cluster centers.

C = subclust(clusterDemo,0.5,'DataScale',dataScale);

Specify Options for Subtractive Clustering

Open Live Script

Load data set.

load clusterDemo.dat

Specify the following clustering options:

Squash factor of 2.0 - Only find clusters that are far from each other.
Accept ratio 0.8 - Only accept data points with a strong potential for being cluster centers.
Reject ratio of 0.7 - Reject data points if they do not have a strong potential for being cluster centers.
Verbosity flag of 0 - Do not print progress information to the command window.

options = [2.0 0.8 0.7 0];

Find cluster centers, using a different range of influence for each dimension and the specified options.

C = subclust(clusterDemo,[0.5 0.25 0.3],'Options',options);

Obtain Cluster Influence Range for Each Data Dimension

Open Live Script

Load data set.

load clusterDemo.dat

Cluster data, returning cluster sigma values, S.

[C,S] = subclust(clusterDemo,0.5);

Cluster sigma values indicate the range of influence of the computed cluster centers in each data dimension.

Input Arguments

collapse all

`data` — Data set to be clustered
M-by-N array

Data to be clustered, specified as an M-by-N array, where M is the number of data points and N is the number of data dimensions.

`clusterInfluenceRange` — Range of influence of the cluster center
scalar value in the range [`0`, `1`] | vector

Range of influence of the cluster center for each input and output assuming the data falls within a unit hyperbox, specified as one of the following:

Scalar value in the range [0 1] — Use the same influence range for all inputs and outputs.
Vector — Use different influence ranges for each input and output.

Specifying a smaller range of influence usually creates more and smaller data clusters, producing more fuzzy rules.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: centers = subclust(data,0.5,DataScale=10)

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: centers = subclust(data,0.5,"DataScale",10)

`DataScale` — Data scale factors
`"auto"` (default) | 2-by-N array

Data scale factors for normalizing input and output data into a unit hyperbox, specified as a 2-by-N array, where N is the total number of inputs and outputs. Each column of DataScale specifies the minimum value in the first row and the maximum value in the second row for the corresponding input or output data set.

When DataScale is "auto", the subclust function uses the actual minimum and maximum values in the data to be clustered.

`Options` — Clustering options
vector

Clustering options, specified as a vector with the following elements.

`Options(1)` — Squash factor
`1.25` (default) | positive scalar

Squash factor for scaling the range of influence of cluster centers, specified as a positive scalar. A smaller squash factor reduces the potential for outlying points to be considered as part of a cluster, which usually creates more and smaller data clusters.

`Options(2)` — Acceptance ratio
`0.5` (default) | scalar value in the range [`0`, `1`]

Acceptance ratio, defined as a fraction of the potential of the first cluster center, above which another data point is accepted as a cluster center, specified as a scalar value in the range [0, 1]. The acceptance ratio must be greater than the rejection ratio.

`Options(3)` — Rejection ratio
`0.15` (default) | scalar value in the range [`0`, `1`]

Rejection ratio, defined as a fraction of the potential of the first cluster center, below which another data point is rejected as a cluster center, specified as a scalar value in the range [0, 1]. The rejection ratio must be less than acceptance ratio.

`Options(4)` — Information display flag
`false` (default) | `true`

Information display flag indicating whether to display progress information during clustering, specified as one of the following:

false — Do not display progress information.
true — Display progress information.

Output Arguments

collapse all

`centers` — Cluster centers
J-by-N array

Cluster centers, returned as a J-by-N array, where J is the number of clusters and N is the number of data dimensions.

`sigma` — Range of influence of cluster centers
N-element row vector

Range of influence of cluster centers for each data dimension, returned as an N-element row vector. All cluster centers have the same set of sigma values.

Tips

To generate a fuzzy inference system using subtractive clustering, use the genfis command. For example, suppose you cluster your data using the following syntax:
```
C = subclust(data,clusterInfluenceRange,"DataScale",dataScale,"Options",options);
```
where the first M columns of data correspond to input variables, and the remaining columns correspond to output variables.
You can generate a fuzzy system using the same training data and subtractive clustering configuration. To do so:
1. Configure clustering options.
  opt = genfisOptions("SubtractiveClustering"); opt.ClusterInfluenceRange = clusterInfluenceRange; opt.DataScale = dataScale; opt.SquashFactor = options(1); opt.AcceptRatio = options(2); opt.RejectRatio = options(3); opt.Verbose = options(4);
2. Extract input and output variable data.
  inputData = data(:,1:M); outputData = data(:,M+1:end);
3. Generate FIS structure.
  fis = genfis(inputData,outputData,opt);
The fuzzy system, fis, contains one fuzzy rule for each cluster, and each input and output variable has one membership function per cluster. You can generate only Sugeno fuzzy systems using subtractive clustering. For more information, see genfis and genfisOptions.

Algorithms

Subtractive clustering assumes that each data point is a potential cluster center. The algorithm does the following:

Calculate the likelihood that each data point would define a cluster center, based on the density of surrounding data points.
Choose the data point with the highest potential to be the first cluster center.
Remove all data points near the first cluster center. The vicinity is determined using clusterInfluenceRange.
Choose the remaining point with the highest potential as the next cluster center.
Repeat steps 3 and 4 until all the data is within the influence range of a cluster center.

The subtractive clustering method is an extension of the mountain clustering method proposed in [2].

References

[1] Chiu, Stephen L. “Fuzzy Model Identification Based on Cluster Estimation.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 267–78. https://doi.org/10.3233/IFS-1994-2306.

[2] Yager, Ronald R., and Dimitar P. Filev. “Generation of Fuzzy Rules by Mountain Clustering.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 209–19. https://doi.org/10.3233/IFS-1994-2301.

Version History

Introduced before R2006a

expand all

R2017a: Specify options using name-value pair arguments

To specify options for subtractive, you now use name-value pair arguments. Any name-value pair arguments that you do not specify remain at their default values.

Previously, you specified options using optional input arguments xBounds and options.

fisOut = subclust(fisIn,radii,xBounds,options);

Starting in R2017a, modify your code to use one or more name-value pair arguments. For example, specify clustering options.

fisOut = subclust(fisIn,radii,'Options',options);

The following table shows the mapping of the old input arguments to the new name-value pair arguments.

Old `subclust` Input Argument	New Name-Value Argument
`xBounds`	`'DataScale'`
`options`	`'Options'`

subclust

Syntax

Description

Examples

Find Cluster Centers Using Subtractive Clustering

Specify Bounds for Subtractive Clustering

Specify Options for Subtractive Clustering

Obtain Cluster Influence Range for Each Data Dimension

Input Arguments

`data` — Data set to be clustered
M-by-N array

`clusterInfluenceRange` — Range of influence of the cluster center
scalar value in the range [`0`, `1`] | vector

Name-Value Arguments

`DataScale` — Data scale factors
`"auto"` (default) | 2-by-N array

`Options` — Clustering options
vector

`Options(1)` — Squash factor
`1.25` (default) | positive scalar

`Options(2)` — Acceptance ratio
`0.5` (default) | scalar value in the range [`0`, `1`]

`Options(3)` — Rejection ratio
`0.15` (default) | scalar value in the range [`0`, `1`]

`Options(4)` — Information display flag
`false` (default) | `true`

Output Arguments

`centers` — Cluster centers
J-by-N array

`sigma` — Range of influence of cluster centers
N-element row vector

Tips

Algorithms

References

Version History

R2017a: Specify options using name-value pair arguments

See Also

Topics

subclust

Syntax

Description

Examples

Find Cluster Centers Using Subtractive Clustering

Specify Bounds for Subtractive Clustering

Specify Options for Subtractive Clustering

Obtain Cluster Influence Range for Each Data Dimension

Input Arguments

data — Data set to be clustered M-by-N array

clusterInfluenceRange — Range of influence of the cluster center scalar value in the range [0, 1] | vector

Name-Value Arguments

DataScale — Data scale factors "auto" (default) | 2-by-N array

Options — Clustering options vector

Options(1) — Squash factor 1.25 (default) | positive scalar

Options(2) — Acceptance ratio 0.5 (default) | scalar value in the range [0, 1]

Options(3) — Rejection ratio 0.15 (default) | scalar value in the range [0, 1]

Options(4) — Information display flag false (default) | true

Output Arguments

centers — Cluster centers J-by-N array

sigma — Range of influence of cluster centers N-element row vector

Tips

Algorithms

References

Version History

R2017a: Specify options using name-value pair arguments

See Also

Topics

`data` — Data set to be clustered
M-by-N array

`clusterInfluenceRange` — Range of influence of the cluster center
scalar value in the range [`0`, `1`] | vector

`DataScale` — Data scale factors
`"auto"` (default) | 2-by-N array

`Options` — Clustering options
vector

`Options(1)` — Squash factor
`1.25` (default) | positive scalar

`Options(2)` — Acceptance ratio
`0.5` (default) | scalar value in the range [`0`, `1`]

`Options(3)` — Rejection ratio
`0.15` (default) | scalar value in the range [`0`, `1`]

`Options(4)` — Information display flag
`false` (default) | `true`

`centers` — Cluster centers
J-by-N array

`sigma` — Range of influence of cluster centers
N-element row vector