papaya
Class Distance

java.lang.Object
  extended by papaya.Distance

public final class Distance
extends Object

Contains methods for computing various "distance" metrics for multidimensional scaling.

For an m-by-n input matrix with m observations and n variables, the output D is the symmetric m-by-m matrix with zeros along the diagonals and element ij specifying the distance between rows i and j.


Method Summary
static float[][] chebychev(float[][] X)
          Computes the Chebychev distance matrix between pairs of objects in the m-by-n data matrix X.
static float[][] cityblock(float[][] X)
          Computes the cityblock (or Manhattan) distance matrix between pairs of objects in the m-by-n data matrix X.
static float[][] correlation(float[][] X)
          Computes the correlation distance matrix between pairs of objects in the m-by-n data matrix X.
static float[][] cosine(float[][] X)
          Computes the cosine distance matrix between pairs of objects in the m-by-n data matrix X.
static float[][] euclidean(float[][] X)
          Computes the Euclidean distance between pairs of objects in the m-by-n data matrix X.
static float[][] mahalanobis(float[][] X)
          Computes the Mahalanobis distance matrix of the m-by-n input matrix X.
static float[][] minkowski(float[][] X, int exp)
          Returns the Minkowski distance matrix of the m-by-n input matrix X.
static float[][] seuclidean(float[][] X)
          Computes the standardized Euclidean distance between pairs of objects in the m-by-n data matrix X by standardizing X first prior to computing the distances.
static float[][] spearman(float[][] X)
          Computes the Spearman distance matrix of the m-by-n input matrix X.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

cityblock

public static float[][] cityblock(float[][] X)
Computes the cityblock (or Manhattan) distance matrix between pairs of objects in the m-by-n data matrix X. That is,
 Dist[i][j] = (X[i][1] - X[j][1]) + (X[i][2] - X[j][2]) + ... + (X[i][n] - X[j][n]). 
 

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the cityblock (or manhattan) distance between the rows of the input X.


chebychev

public static float[][] chebychev(float[][] X)
Computes the Chebychev distance matrix between pairs of objects in the m-by-n data matrix X. That is,
 Dist[i][j] =Max( |X[i][1] - X[j][1]| , |X[i][2] - X[j][2]| , ... , |(X[i][n] - X[j][n]|). 
 

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the Chebychev distance between the rows of the input X.


correlation

public static float[][] correlation(float[][] X)
Computes the correlation distance matrix between pairs of objects in the m-by-n data matrix X. E.g., if y1 and y2 corresponds to observation (or row) 1 and 2 of X, then
 Dist[1][2] = 1 -  sum_{j=1^numColumns} ( ( y1_j - mean(y1) ) * ( y2_j - mean(y2) ) / ( n * std(y1)*std(y2) ) ), 
 
where n = number of columns.

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the correlation distance between the rows of the input X.


cosine

public static float[][] cosine(float[][] X)
Computes the cosine distance matrix between pairs of objects in the m-by-n data matrix X. E.g., if y1 and y2 corresponds to observation (or row) 1 and 2 of X, then
 Dist[1][2] = 1- sum_{j=1^numColumns} ( y1_j * y2_j ) / ( mag(y1)*mag(y2) ), 
 
where mag(y1) = sum_{i=1 to numColumns} (y1[j]*y1[j]) and likewise for mag(y2). etc.

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the cosine distance between the rows of the input X.


euclidean

public static float[][] euclidean(float[][] X)
Computes the Euclidean distance between pairs of objects in the m-by-n data matrix X. That is,
 Dist[i][j]^2 = (X[i][1] - X[j][1]) ^2+ (X[i][2] - X[j][2])^2 + ... + (X[i][n] - X[j][n])^2. 
 

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the Euclidean distance between the rows of the input X.


seuclidean

public static float[][] seuclidean(float[][] X)
Computes the standardized Euclidean distance between pairs of objects in the m-by-n data matrix X by standardizing X first prior to computing the distances. The standardized X is computed using
 standardized value = (original value - mean)/standard deviation
 
where the mean and standard deviation correspond to the column mean and std.

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the standardized Euclidean distance between the rows of the input X.


mahalanobis

public static float[][] mahalanobis(float[][] X)
Computes the Mahalanobis distance matrix of the m-by-n input matrix X. That is,
 Dist[i][j]^2 = (X[i][1] - X[j][1])^2/ invcov[1][1] + ... + (X[i][n] - X[j][n])^2/ invcov[n][n]. 
 
Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the s mahalanobis distance between the rows of the input X.


minkowski

public static float[][] minkowski(float[][] X,
                                  int exp)
Returns the Minkowski distance matrix of the m-by-n input matrix X. That is,
 Dist[i][j]^p= (X[i][1] - X[j][1])^p/ + (X[i][2] - X[j][2])^p + ... + (X[i][n] - X[j][n])^p.
 
Notice that for the special case of p = 1, the Minkowski metric gives the city block metric, for the special case of p = 2, the Minkowski metric gives the Euclidean distance, and for the special case of p = &inf;, the Minkowski metric gives the Chebychev distance. Also notice that the larger the value of p, the higher the probability of causing overflow errors (which are, in turn, highly correlated to headaches and overall feelings of malaise).

Parameters:
exp - the Minkowski exponent (positive).

spearman

public static float[][] spearman(float[][] X)
Computes the Spearman distance matrix of the m-by-n input matrix X. E.g., if y1 and y2 corresponds to observation (or row) 1 and 2 of X with corresponding ranks
r11, r12, ... , r1n
and
r21, r22, ... , r2n,
then

 Dist[1][2] = 1 -  sum_{j=1^numColumns} ( ( r1_j - mean(r1) ) * ( r2_j - mean(r2) ) / ( n * std(r1)*std(r2) ) ), 
 
and similarly for the remaining rows.

Rows of X correspond to observations, and columns correspond to variables. Returns the m-by-m matrix D with D[r1][r2] corresponding to the Spearman distance between the rows of the input X.



Processing library papaya by Adila Faruk. (C) 2014