You may have seen the Clustering tool under the Stats | Transformation ribbon. You may have even attempted to run it on your data, but been unclear of where and what the results mean. This article will help you run the tool and use the cluster results to further process your data.
This is where you'll find the Clustering tool:
This tool is designed to detect and identify different clusters in your data. This is useful for any exploration project, where drilling naturally produces clusters around the most valuable portions of the project. In this example, we look at some drillhole assays focusing on gold values.
I want to understand what statistical clusters may be hiding in this data. If we open the Clustering tool, we get a dialog box like this:
The Input section is straight forward. Simply choose the data file you want to run clustering on, then add the data fields that are important to understand. You can run clustering on individual fields, or on several fields all together.
The Parameters section requires you to specify the Clustering Method and Number of Clusters.
|
Method |
Description |
|---|---|
|
K-Means |
A distance-based algorithm is used to partition n points into k clusters. Each point belongs to the cluster with the nearest mean (cluster centroid) which serves as a prototype of that cluster. |
|
Gaussian Mixture |
A mixture of normal distributions that represent the overall probability distribution of the data points. |
|
Self-Organising Map |
An artificial neural network (ANN) is trained to perform a dimensionality reduction to create a discretised representation (neighbourhood map) of the input data. |
For the Number of Clusters, you may want to start with an arbitrary number, then review the results. If you notice that later clusters only have a few samples in them, consider decreasing the number of clusters.
For Transformation Method you have two choices as described below. The Centered Log-Ratio method allows you to control how zero values will be calculated.
|
Method |
Description |
|---|---|
| Z-Score | Data is transformed by subtracting the mean value for each field from the values in the compositional data and then dividing by the standard deviation of each field, resulting in data with a mean of zero and a standard deviation of one. |
| Centred Log-Ratio (CLR) | A centred log-ratio transformation is undertaken to remove the effects of closure in the compositional data. |
For Sample Weight you can select the desired data field from your input data that contains a weighting value. You could use something like the length of each assay interval here.
The final part of this setup is the Output section. It allows you to write the cluster results to an entirely new output file, or have the tool create a new column in the input data file called "CLUSTER_ID."
When you run the tool, there may appear to be very little change. Open the input/output file depending on where you chose the results to be stored, and you'll find a column called CLUSTER_ID which is populated with the cluster numbers calculated by the tool:
Now that we have this data, we can take things further by displaying it next to our drillhole traces. In this image, the CLUSTER_ID results are on the left and gold assays are on the right:
We can also use this cluster data to refine our other statistical plots. For example, you can now use CLUSTER_ID as a filter and create gold Histograms for individual clusters.
There are many ways to apply Clustering results to your input data using various filters and statistical charts. Now that you know the basics, you can use this information to better understand distributions in your project.
Want to learn more?
Online Help Manuals - Click here for the latest version
Learning Management System - Click here to login or here to request access
Comments
0 comments
Please sign in to leave a comment.