You are here

function _recommender_fast_correlation_coefficient in Recommender API 5

Same name and namespace in other branches
  1. 6 recommender.module \_recommender_fast_correlation_coefficient()

Fast correlation matrix calculation.

Parameters

$matrix has to be indexed by [0..m][0..n]. it WILL BE MODIFIED to save mem space!!!:

Return value

array the correlation matrix

1 call to _recommender_fast_correlation_coefficient()
_recommender_similarity_classical_in_memory in ./recommender.module
Matrix computation in memory. Fast. But require lots of memory. Limitation: only support $missing='zero', will cover other cases in future release

File

./recommender.module, line 106
Providing generic recommender system algorithms.

Code

function &_recommender_fast_correlation_coefficient(&$matrix) {
  if (!isset($matrix) || !is_array($matrix)) {
    return NULL;
  }
  $m = count($matrix);
  $n = count($matrix[0]);

  // assume the matrix has the same width
  $variance = array();
  foreach ($matrix as &$vector) {
    if (count($vector) != $n) {
      return NULL;
    }

    // adjust each element in the vector by the mean, and calculate variance.
    $mean = array_sum($vector) / $n;
    $sigma = 0;
    for ($i = 0; $i < $n; $i++) {
      $vector[$i] = $vector[$i] - $mean;
      $sigma += pow($vector[$i], 2);
    }

    // due to float point error, a certain small number could be indeed 0.
    // but it doesn't seem to hurt. so just comment them out.

    //$vector_diagnal[] = sqrt($sigma)<0.00001 ? 0 : sqrt($sigma);
    $variance[] = sqrt($sigma);

    // note we didn't divide it by n.
  }
  $cor_matrix = array();
  $product = array();
  for ($v1 = 0; $v1 < $m; $v1++) {
    for ($v2 = $v1; $v2 < $m; $v2++) {
      $vector_1 =& $matrix[$v1];
      $vector_2 =& $matrix[$v2];
      if ($variance[$v1] == 0 || $variance[$v2] == 0) {
        $cor_matrix[$v1][$v2] = 0;
      }
      else {
        for ($i = 0; $i < $n; $i++) {
          $product[$i] = $vector_1[$i] * $vector_2[$i];
        }
        $cor = array_sum($product) / ($variance[$v1] * $variance[$v2]);
        $cor_matrix[$v1][$v2] = $cor;
        $cor_matrix[$v2][$v1] = $cor;
      }
    }
  }
  return $cor_matrix;
}