function _recommender_fast_correlation_coefficient in Recommender API 5
Same name and namespace in other branches
- 6 recommender.module \_recommender_fast_correlation_coefficient()
Fast correlation matrix calculation.
Parameters
$matrix has to be indexed by [0..m][0..n]. it WILL BE MODIFIED to save mem space!!!:
Return value
array the correlation matrix
1 call to _recommender_fast_correlation_coefficient()
- _recommender_similarity_classical_in_memory in ./
recommender.module - Matrix computation in memory. Fast. But require lots of memory. Limitation: only support $missing='zero', will cover other cases in future release
File
- ./
recommender.module, line 106 - Providing generic recommender system algorithms.
Code
function &_recommender_fast_correlation_coefficient(&$matrix) {
if (!isset($matrix) || !is_array($matrix)) {
return NULL;
}
$m = count($matrix);
$n = count($matrix[0]);
// assume the matrix has the same width
$variance = array();
foreach ($matrix as &$vector) {
if (count($vector) != $n) {
return NULL;
}
// adjust each element in the vector by the mean, and calculate variance.
$mean = array_sum($vector) / $n;
$sigma = 0;
for ($i = 0; $i < $n; $i++) {
$vector[$i] = $vector[$i] - $mean;
$sigma += pow($vector[$i], 2);
}
// due to float point error, a certain small number could be indeed 0.
// but it doesn't seem to hurt. so just comment them out.
//$vector_diagnal[] = sqrt($sigma)<0.00001 ? 0 : sqrt($sigma);
$variance[] = sqrt($sigma);
// note we didn't divide it by n.
}
$cor_matrix = array();
$product = array();
for ($v1 = 0; $v1 < $m; $v1++) {
for ($v2 = $v1; $v2 < $m; $v2++) {
$vector_1 =& $matrix[$v1];
$vector_2 =& $matrix[$v2];
if ($variance[$v1] == 0 || $variance[$v2] == 0) {
$cor_matrix[$v1][$v2] = 0;
}
else {
for ($i = 0; $i < $n; $i++) {
$product[$i] = $vector_1[$i] * $vector_2[$i];
}
$cor = array_sum($product) / ($variance[$v1] * $variance[$v2]);
$cor_matrix[$v1][$v2] = $cor;
$cor_matrix[$v2][$v1] = $cor;
}
}
}
return $cor_matrix;
}