You are here

function PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test in Porter-Stemmer 7

Same name and namespace in other branches
  1. 6.2 porterstemmer.test \PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test()

Runs a unit test for a portion of words in the test file.

The output of the porterstemmer_stem() function is tested against known input/output pairs from http://snowball.tartarus.org/algorithms/english/stemmer.html

Or if $use_pecl is set to TRUE, the PECL stem_english() function is used instead.

Parameters

$skipto: Line of file to start on (count starts at 0), not counting short ones.

$runto: Number of lines to test, not counting short ones.

$use_pecl: If TRUE, use the PECL library if it is installed. If it isn't installed, no tests are performed.

30 calls to PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test()
PorterStemmerOutput10UnitTest::testStemmingUnitTest in ./porterstemmer.test
Tests the next 2000 words in the file for stemming accuracy.
PorterStemmerOutput11UnitTest::testStemmingUnitTest in ./porterstemmer.test
Tests the next 2000 words in the file for stemming accuracy.
PorterStemmerOutput12UnitTest::testStemmingUnitTest in ./porterstemmer.test
Tests the next 2000 words in the file for stemming accuracy.
PorterStemmerOutput13UnitTest::testStemmingUnitTest in ./porterstemmer.test
Tests the next 2000 words in the file for stemming accuracy.
PorterStemmerOutput14UnitTest::testStemmingUnitTest in ./porterstemmer.test
Tests the next 2000 words in the file for stemming accuracy.

... See full list

File

./porterstemmer.test, line 164
Tests for the Porter Stemmer module. By Jennifer Hodgdon of Poplar ProductivityWare, www.poplarware.com Unit tests are based on sample words from http://snowball.tartarus.org/algorithms/english/stemmer.html which are stored in a separate file…

Class

PorterStemmerOutput1UnitTest
Unit tests for Porter Stemmer - Stemming output part 1.

Code

function _run_porterstemmer_stem_test($skipto = 0, $runto = 2000, $use_pecl = FALSE) {
  if ($use_pecl && !$this->has_pecl_stem) {
    $this
      ->assertTrue(1, "No PECL stem library found, aborting test");
    return;
  }

  // Open word file
  $file = drupal_get_path('module', 'porterstemmer') . '/testwords.txt';
  $handle = @fopen($file, "r");
  $this
    ->assertTrue($handle, "Open file containing words to test", 'Startup');
  if (!$handle) {

    // no point doing rest of test...
    return;
  }

  // Set min characters for search to 2 so we test whole word list
  variable_set('minimum_word_size', 2);
  porterstemmer_too_short('', TRUE);
  $ran = 0;
  $skipped = 0;
  $start = time();
  $mstart = 0;
  if (function_exists('microtime')) {
    $mstart = microtime(TRUE);
  }
  while (!feof($handle) && $ran < $runto) {

    // Read a line of the file, and split into words
    $line = fgets($handle, 4096);
    $words = preg_split("/\\s+/", $line, -1, PREG_SPLIT_NO_EMPTY);
    if (count($words) < 2) {
      continue;
    }
    $in = $words[0];
    $right = $words[1];

    // Skip words less than 2 characters, which is minimum for Porter
    if (drupal_strlen($in) < 2 || drupal_strlen($right) < 2) {
      continue;
    }
    $skipped++;
    if ($skipped < $skipto) {
      continue;
    }

    // Stem the word
    if ($use_pecl && $this->has_pecl_stem) {
      $stem = stem_english($in);
    }
    else {
      $stem = porterstemmer_stem($in);
    }

    // Test correctness
    $this
      ->assertEqual($right, $stem, "Stemming {$in} results in {$right} (was {$stem})", t('Stemming test'));
    $ran++;
  }
  $start = time() - $start;
  if (function_exists('microtime')) {
    $mstart = microtime(TRUE) - $mstart;
  }
  $this
    ->assertTrue(1, "Elapsed time: {$start} seconds ({$mstart})");
  fclose($handle);
}