papaya
Class Correlation

java.lang.Object
  extended by papaya.Correlation
All Implemented Interfaces:
PapayaConstants

public final class Correlation
extends Object
implements PapayaConstants

Contains utilities related to computing covariances, as well as linear and rank correlation. Methods relating to computing

Whatever it is, remember that correlation does not always imply causation.


Nested Class Summary
static class Correlation.Significance
          Contains methods used to compute the significance, or pvalue of the input correlations.
static class Correlation.Weighted
          Contains methods related to computing the correlation and covariance of weighted datasets.
 
Field Summary
 
Fields inherited from interface papaya.PapayaConstants
BASELINE, big, biginv, BOTTOM, CENTER, CORNER, FONTNAME, GRAY, INDEX_NOT_FOUND, INDICES_NOT_FOUND, LEFT, LOGPI, MACHEP, MAXGAM, MAXLOG, MINLOG, RIGHT, SQRTH, SQTPI, STROKEWEIGHT, TEXTSIZE, TOP
 
Method Summary
static float auto(float[] data, int lag, float mean, float variance)
          Computes the sample autocorrelation by removing the sample mean from the input series, then normalizing the sequence by the sample variance.
static float autoLag1(float[] data, float mean)
          Returns the lag-1 autocorrelation of a dataset; Note that this method uses computations different from auto(data, 1, mean, variance).
static float[][] cov(float[][] data, boolean unbiased)
          Returns the covariance matrix of P data sequences, each of length N.
static float cov(float[] data1, float[] data2, boolean unbiased)
          Returns the covariance of two data sequences data1 and data2, each of length N.
static float[][] linear(float[][] data, boolean unbiased)
          Returns the (Pearson Product Moment) correlation matrix between multiple columns of a matrix with each column corresponding to a dataset, and each row an observation.
static float linear(float[] data1, float[] data2, boolean unbiased)
          Returns the (Pearson Product Moment) linear correlation of two data sequences.
static float[][] spearman(float[][] data, boolean unbiased)
          Computes Spearman's rank-correlation, or rho, between multiple columns of a matrix with each column corresponding to a dataset, and each row an observation.
static float spearman(float[] x, float[] y, boolean unbiased)
          Computes Spearman's rank-correlation, or rho.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

auto

public static float auto(float[] data,
                         int lag,
                         float mean,
                         float variance)
Computes the sample autocorrelation by removing the sample mean from the input series, then normalizing the sequence by the sample variance. That is, it returns
 R(lag) = E[ (X[t] - mu) * ( X[t+lag] - mu ) ] / variance(X).   
 where 
 E[ (X[t] - mu) * ( X[t+lag] - mu ) ]  = 1/size(X) * Sum_(i=0 to size-lag)( (X[t]-mu)*X[t+lag]-mu) ). 
 
Reference: Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Upper Saddle River, NJ: Prentice-Hall, 1994.

Parameters:
data - the array of data
lag - the lag value. Has to be smaller than the data sequence length
mean - the data mean
variance - the data variance
Returns:
the autocorrelation value

autoLag1

public static float autoLag1(float[] data,
                             float mean)
Returns the lag-1 autocorrelation of a dataset; Note that this method uses computations different from auto(data, 1, mean, variance).


linear

public static float[][] linear(float[][] data,
                               boolean unbiased)
Returns the (Pearson Product Moment) correlation matrix between multiple columns of a matrix with each column corresponding to a dataset, and each row an observation. That is, given P columns, data1, data2, ... , dataP, each of length n, it computes and returns the P-by-P correlation matrix, C with each element CJK given by
 CJK = CKJ = corr(dataJ,dataK,unbiasedValue).
 

Parameters:
data - The input data. Each column corresponds to a dataset; each row corresponds to an observation
unbiased - set to true to return the unbiased correlation, false to return the biased version.
Returns:
the correlation matrix (symmetric, by definition)

linear

public static float linear(float[] data1,
                           float[] data2,
                           boolean unbiased)
Returns the (Pearson Product Moment) linear correlation of two data sequences. It is related to the cov(float[], float[], boolean) function via
 corr = cov(x,y,unbiasedValue)/sqrt( cov(x,x, unbiasedValue)*cov(y,y,unbiasedValue) )
 

Parameters:
unbiased - set to true to return the unbiased correlation, false to return the biased version.

spearman

public static float[][] spearman(float[][] data,
                                 boolean unbiased)
Computes Spearman's rank-correlation, or rho, between multiple columns of a matrix with each column corresponding to a dataset, and each row an observation. That is, each pair of columns are first converted to ranks rXJ, rXK and the correlation between the ranks computed using the Pearson correlation coefficient formula.

Parameters:
unbiased - set to true to return the unbiased correlation, false to return the biased version.

spearman

public static float spearman(float[] x,
                             float[] y,
                             boolean unbiased)
Computes Spearman's rank-correlation, or rho. That is, the raw dataset Xi,Yi are first converted to ranks rXi, rYi and the correlation between the ranks computed using the Pearson correlation coefficient formula.

Parameters:
unbiased - set to true to return the unbiased correlation, false to return the biased version.

cov

public static float cov(float[] data1,
                        float[] data2,
                        boolean unbiased)
Returns the covariance of two data sequences data1 and data2, each of length N. That is,
 
 cov(data1, data2) = E(  (data1[i] - mean(data1))* (data2[i] - mean(data2)) ),
 
where E is the mathematical expectation.

cov(x,y,true) normalizes by N - 1, if N > 1, where N is the number of observations. This makes cov(x,y,true) the best unbiased estimate of the covariance matrix if the observations are from a normal distribution. For N = 1, cov(x,y,true) normalizes by N.

cov(x,y,false) normalizes by N and produces the second moment matrix of the observations about their mean.

Parameters:
data1 - x
data2 - y
unbiased - set to true to return the unbiased covariance (division by N-1), false to return the biased version (division by N).

cov

public static float[][] cov(float[][] data,
                            boolean unbiased)
Returns the covariance matrix of P data sequences, each of length N. That is, given P columns, data1, data2, ... , dataP, each of length n, it computes and returns the P-by-P covariance matrix, S with each element SJK given by
 SJK = SKJ = cov(dataJ,dataK,bias).       
 
cov(data,true) normalizes by N - 1, if N > 1, where N is the number of observations (or the number of rows in the input matrix). This makes cov(data,true) the best unbiased estimate of the covariance matrix if the observations are from a normal distribution. For N = 1, cov(data) normalizes by N.

cov(data,false) normalizes by N and produces the second moment matrix of the observations about their mean.

Parameters:
data - Each column corresponds to a dataset; each row corresponds to an observation
unbiased - set to true to return the unbiased covariance matrix, false to return the biased version.


Processing library papaya by Adila Faruk. (C) 2014