function PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test in Porter-Stemmer 7
Same name and namespace in other branches
- 6.2 porterstemmer.test \PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test()
Runs a unit test for a portion of words in the test file.
The output of the porterstemmer_stem() function is tested against known input/output pairs from http://snowball.tartarus.org/algorithms/english/stemmer.html
Or if $use_pecl is set to TRUE, the PECL stem_english() function is used instead.
Parameters
$skipto: Line of file to start on (count starts at 0), not counting short ones.
$runto: Number of lines to test, not counting short ones.
$use_pecl: If TRUE, use the PECL library if it is installed. If it isn't installed, no tests are performed.
30 calls to PorterStemmerOutput1UnitTest::_run_porterstemmer_stem_test()
- PorterStemmerOutput10UnitTest::testStemmingUnitTest in ./
porterstemmer.test - Tests the next 2000 words in the file for stemming accuracy.
- PorterStemmerOutput11UnitTest::testStemmingUnitTest in ./
porterstemmer.test - Tests the next 2000 words in the file for stemming accuracy.
- PorterStemmerOutput12UnitTest::testStemmingUnitTest in ./
porterstemmer.test - Tests the next 2000 words in the file for stemming accuracy.
- PorterStemmerOutput13UnitTest::testStemmingUnitTest in ./
porterstemmer.test - Tests the next 2000 words in the file for stemming accuracy.
- PorterStemmerOutput14UnitTest::testStemmingUnitTest in ./
porterstemmer.test - Tests the next 2000 words in the file for stemming accuracy.
File
- ./
porterstemmer.test, line 164 - Tests for the Porter Stemmer module. By Jennifer Hodgdon of Poplar ProductivityWare, www.poplarware.com Unit tests are based on sample words from http://snowball.tartarus.org/algorithms/english/stemmer.html which are stored in a separate file…
Class
- PorterStemmerOutput1UnitTest
- Unit tests for Porter Stemmer - Stemming output part 1.
Code
function _run_porterstemmer_stem_test($skipto = 0, $runto = 2000, $use_pecl = FALSE) {
if ($use_pecl && !$this->has_pecl_stem) {
$this
->assertTrue(1, "No PECL stem library found, aborting test");
return;
}
// Open word file
$file = drupal_get_path('module', 'porterstemmer') . '/testwords.txt';
$handle = @fopen($file, "r");
$this
->assertTrue($handle, "Open file containing words to test", 'Startup');
if (!$handle) {
// no point doing rest of test...
return;
}
// Set min characters for search to 2 so we test whole word list
variable_set('minimum_word_size', 2);
porterstemmer_too_short('', TRUE);
$ran = 0;
$skipped = 0;
$start = time();
$mstart = 0;
if (function_exists('microtime')) {
$mstart = microtime(TRUE);
}
while (!feof($handle) && $ran < $runto) {
// Read a line of the file, and split into words
$line = fgets($handle, 4096);
$words = preg_split("/\\s+/", $line, -1, PREG_SPLIT_NO_EMPTY);
if (count($words) < 2) {
continue;
}
$in = $words[0];
$right = $words[1];
// Skip words less than 2 characters, which is minimum for Porter
if (drupal_strlen($in) < 2 || drupal_strlen($right) < 2) {
continue;
}
$skipped++;
if ($skipped < $skipto) {
continue;
}
// Stem the word
if ($use_pecl && $this->has_pecl_stem) {
$stem = stem_english($in);
}
else {
$stem = porterstemmer_stem($in);
}
// Test correctness
$this
->assertEqual($right, $stem, "Stemming {$in} results in {$right} (was {$stem})", t('Stemming test'));
$ran++;
}
$start = time() - $start;
if (function_exists('microtime')) {
$mstart = microtime(TRUE) - $mstart;
}
$this
->assertTrue(1, "Elapsed time: {$start} seconds ({$mstart})");
fclose($handle);
}