Main Content

subclust

Find cluster centers using subtractive clustering

Description

example

centers = subclust(data,clusterInfluenceRange) clusters input data using subtractive clustering with the specified cluster influence range, and returns the computed cluster centers. The subtractive clustering algorithm estimates the number of clusters in the input data.

example

centers = subclust(data,clusterInfluenceRange,Name,Value) clusters data using algorithm options specified by one or more Name,Value arguments.

example

[centers,sigma] = subclust(___) returns the sigma values specifying the range of influence of a cluster center in each of the data dimensions.

Examples

collapse all

Load data set.

load clusterDemo.dat

Find cluster centers using the same range of influence for all dimensions.

C = subclust(clusterDemo,0.6);

Each row of C contains one cluster center.

C
C = 3×3

    0.5779    0.2355    0.5133
    0.7797    0.8191    0.1801
    0.1959    0.6228    0.8363

Load data set.

load clusterDemo.dat

Define minimum and maximum normalization bounds for each data dimension. Use the same bounds for each dimension.

dataScale = [-0.2 -0.2 -0.2;
              1.2  1.2  1.2];

Find cluster centers.

C = subclust(clusterDemo,0.5,'DataScale',dataScale);

Load data set.

load clusterDemo.dat

Specify the following clustering options:

  • Squash factor of 2.0 - Only find clusters that are far from each other.

  • Accept ratio 0.8 - Only accept data points with a strong potential for being cluster centers.

  • Reject ratio of 0.7 - Reject data points if they do not have a strong potential for being cluster centers.

  • Verbosity flag of 0 - Do not print progress information to the command window.

options = [2.0 0.8 0.7 0];

Find cluster centers, using a different range of influence for each dimension and the specified options.

C = subclust(clusterDemo,[0.5 0.25 0.3],'Options',options);

Load data set.

load clusterDemo.dat

Cluster data, returning cluster sigma values, S.

[C,S] = subclust(clusterDemo,0.5);

Cluster sigma values indicate the range of influence of the computed cluster centers in each data dimension.

Input Arguments

collapse all

Data to be clustered, specified as an M-by-N array, where M is the number of data points and N is the number of data dimensions.

Range of influence of the cluster center for each input and output assuming the data falls within a unit hyperbox, specified as one of the following:

  • Scalar value in the range [0 1] — Use the same influence range for all inputs and outputs.

  • Vector — Use different influence ranges for each input and output.

Specifying a smaller range of influence usually creates more and smaller data clusters, producing more fuzzy rules.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: centers = subclust(data,0.5,DataScale=10)

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: centers = subclust(data,0.5,"DataScale",10)

Data scale factors for normalizing input and output data into a unit hyperbox, specified as a 2-by-N array, where N is the total number of inputs and outputs. Each column of DataScale specifies the minimum value in the first row and the maximum value in the second row for the corresponding input or output data set.

When DataScale is "auto", the subclust function uses the actual minimum and maximum values in the data to be clustered.

Clustering options, specified as a vector with the following elements.

Squash factor for scaling the range of influence of cluster centers, specified as a positive scalar. A smaller squash factor reduces the potential for outlying points to be considered as part of a cluster, which usually creates more and smaller data clusters.

Acceptance ratio, defined as a fraction of the potential of the first cluster center, above which another data point is accepted as a cluster center, specified as a scalar value in the range [0, 1]. The acceptance ratio must be greater than the rejection ratio.

Rejection ratio, defined as a fraction of the potential of the first cluster center, below which another data point is rejected as a cluster center, specified as a scalar value in the range [0, 1]. The rejection ratio must be less than acceptance ratio.

Information display flag indicating whether to display progress information during clustering, specified as one of the following:

  • false — Do not display progress information.

  • true — Display progress information.

Output Arguments

collapse all

Cluster centers, returned as a J-by-N array, where J is the number of clusters and N is the number of data dimensions.

Range of influence of cluster centers for each data dimension, returned as an N-element row vector. All cluster centers have the same set of sigma values.

Tips

  • To generate a fuzzy inference system using subtractive clustering, use the genfis command. For example, suppose you cluster your data using the following syntax:

    C = subclust(data,clusterInfluenceRange,"DataScale",dataScale,"Options",options);

    where the first M columns of data correspond to input variables, and the remaining columns correspond to output variables.

    You can generate a fuzzy system using the same training data and subtractive clustering configuration. To do so:

    1. Configure clustering options.

      opt = genfisOptions("SubtractiveClustering");
      opt.ClusterInfluenceRange = clusterInfluenceRange;
      opt.DataScale = dataScale;
      opt.SquashFactor = options(1);
      opt.AcceptRatio = options(2);
      opt.RejectRatio = options(3);
      opt.Verbose = options(4);
    2. Extract input and output variable data.

      inputData = data(:,1:M);
      outputData = data(:,M+1:end);
      
    3. Generate FIS structure.

      fis = genfis(inputData,outputData,opt);

    The fuzzy system, fis, contains one fuzzy rule for each cluster, and each input and output variable has one membership function per cluster. You can generate only Sugeno fuzzy systems using subtractive clustering. For more information, see genfis and genfisOptions.

Algorithms

Subtractive clustering assumes that each data point is a potential cluster center. The algorithm does the following:

  1. Calculate the likelihood that each data point would define a cluster center, based on the density of surrounding data points.

  2. Choose the data point with the highest potential to be the first cluster center.

  3. Remove all data points near the first cluster center. The vicinity is determined using clusterInfluenceRange.

  4. Choose the remaining point with the highest potential as the next cluster center.

  5. Repeat steps 3 and 4 until all the data is within the influence range of a cluster center.

The subtractive clustering method is an extension of the mountain clustering method proposed in [2].

References

[1] Chiu, Stephen L. “Fuzzy Model Identification Based on Cluster Estimation.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 267–78. https://doi.org/10.3233/IFS-1994-2306.

[2] Yager, Ronald R., and Dimitar P. Filev. “Generation of Fuzzy Rules by Mountain Clustering.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 209–19. https://doi.org/10.3233/IFS-1994-2301.

Version History

Introduced before R2006a

expand all