You are here

class Stemmer in Search API 8

Stems search terms.

Plugin annotation


@SearchApiProcessor(
  id = "stemmer",
  label = @Translation("Stemmer"),
  description = @Translation("Stems search terms (for example, <em>talking</em> to <em>talk</em>). Currently, this only acts on English language content. It uses the Porter 2 stemmer algorithm (<a href=""https://wikipedia.org/wiki/Stemming"">More information</a>). For best results, use after tokenizing."),
  stages = {
    "pre_index_save" = 0,
    "preprocess_index" = 0,
    "preprocess_query" = 0,
  }
)

Hierarchy

Expanded class hierarchy of Stemmer

1 file declares its use of Stemmer
StemmerTest.php in tests/src/Unit/Processor/StemmerTest.php

File

src/Plugin/search_api/processor/Stemmer.php, line 25

Namespace

Drupal\search_api\Plugin\search_api\processor
View source
class Stemmer extends FieldsProcessorPluginBase {

  /**
   * Static cache for already-generated stems.
   *
   * @var string[]
   */
  protected $stems = [];

  /**
   * {@inheritdoc}
   */
  public function defaultConfiguration() {
    $configuration = parent::defaultConfiguration();
    $configuration += [
      'exceptions' => [
        'texan' => 'texa',
        'mexican' => 'mexic',
      ],
    ];
    return $configuration;
  }

  /**
   * {@inheritdoc}
   */
  public static function supportsIndex(IndexInterface $index) : bool {
    $languages = \Drupal::languageManager()
      ->getLanguages();

    // Make processor available only if English is one of the site languages.
    foreach ($languages as $language) {
      if (substr($language
        ->getId(), 0, 2) === 'en') {
        return TRUE;
      }
    }
    return FALSE;
  }

  /**
   * {@inheritdoc}
   */
  public function buildConfigurationForm(array $form, FormStateInterface $form_state) {
    $form = parent::buildConfigurationForm($form, $form_state);
    $description = $this
      ->t('If the <a href="http://snowball.tartarus.org/algorithms/english/stemmer.html">algorithm</a> does not stem words in your dataset in the desired way, you can enter specific exceptions in the form of WORD=STEM, where "WORD" is the original word in the text and "STEM" is the resulting stem. List each exception on a separate line.');

    // Convert the keyed array into a config format (word=stem)
    $default_value = http_build_query($this->configuration['exceptions'], NULL, "\n");
    $form['exceptions'] = [
      '#type' => 'textarea',
      '#title' => $this
        ->t('Exceptions'),
      '#description' => $description,
      '#default_value' => $default_value,
    ];
    return $form;
  }

  /**
   * {@inheritdoc}
   */
  public function validateConfigurationForm(array &$form, FormStateInterface $form_state) {
    parent::validateConfigurationForm($form, $form_state);
    $exceptions = $form_state
      ->getValue('exceptions');
    if (($parsed = parse_ini_string($exceptions)) === FALSE) {
      $el = $form['exceptions'];
      $form_state
        ->setError($el, $el['#title'] . ': ' . $this
        ->t('The entered text is not in valid WORD=STEM format.'));
    }
    else {
      $form_state
        ->setValue('exceptions', $parsed);
    }
  }

  /**
   * {@inheritdoc}
   */
  public function preprocessIndexItems(array $items) {
    foreach ($items as $item) {

      // Limit this processor to English language data.
      if ($item
        ->getLanguage() !== 'en') {
        continue;
      }
      foreach ($item
        ->getFields() as $name => $field) {
        if ($this
          ->testField($name, $field)) {
          $this
            ->processField($field);
        }
      }
    }
  }

  /**
   * {@inheritdoc}
   */
  public function preprocessSearchQuery(QueryInterface $query) {

    // Only process queries that can (also) return English language content.
    $languages = $query
      ->getLanguages();
    if ($languages && !in_array('en', $languages)) {
      return;
    }
    parent::preprocessSearchQuery($query);
  }

  /**
   * {@inheritdoc}
   */
  protected function testType($type) {
    return $this
      ->getDataTypeHelper()
      ->isTextType($type);
  }

  /**
   * {@inheritdoc}
   */
  protected function process(&$value) {

    // In the absence of the tokenizer processor, this ensures split words.
    $words = preg_split('/[^\\p{L}\\p{N}]+/u', strip_tags($value), -1, PREG_SPLIT_NO_EMPTY);
    $stemmed = [];
    foreach ($words as $i => $word) {

      // To optimize processing, store processed stems in a static array.
      if (!isset($this->stems[$word])) {
        $stem = new Porter2($word, $this->configuration['exceptions']);
        $this->stems[$word] = $stem
          ->stem();
      }
      $stemmed[] = $this->stems[$word];
    }
    $value = implode(' ', $stemmed);
  }

}

Members

Namesort descending Modifiers Type Description Overrides
ConfigurablePluginBase::calculateDependencies public function Calculates dependencies for the configured plugin. Overrides DependentPluginInterface::calculateDependencies 6
ConfigurablePluginBase::calculatePluginDependencies Deprecated protected function Calculates and adds dependencies of a specific plugin instance.
ConfigurablePluginBase::getConfiguration public function Gets this plugin's configuration. Overrides ConfigurableInterface::getConfiguration
ConfigurablePluginBase::getDescription public function Returns the plugin's description. Overrides ConfigurablePluginInterface::getDescription
ConfigurablePluginBase::getPluginDependencies Deprecated protected function Calculates and returns dependencies of a specific plugin instance.
ConfigurablePluginBase::label public function Returns the label for use on the administration pages. Overrides ConfigurablePluginInterface::label
ConfigurablePluginBase::moduleHandler Deprecated protected function Wraps the module handler.
ConfigurablePluginBase::onDependencyRemoval public function Informs the plugin that some of its dependencies are being removed. Overrides ConfigurablePluginInterface::onDependencyRemoval 5
ConfigurablePluginBase::setConfiguration public function Sets the configuration for this plugin instance. Overrides ConfigurableInterface::setConfiguration 3
ConfigurablePluginBase::themeHandler Deprecated protected function Wraps the theme handler.
DependencySerializationTrait::$_entityStorages protected property An array of entity type IDs keyed by the property name of their storages.
DependencySerializationTrait::$_serviceIds protected property An array of service IDs keyed by property name used for serialization.
DependencySerializationTrait::__sleep public function 1
DependencySerializationTrait::__wakeup public function 2
DependencyTrait::$dependencies protected property The object's dependencies.
DependencyTrait::addDependencies protected function Adds multiple dependencies.
DependencyTrait::addDependency protected function Adds a dependency.
FieldsProcessorPluginBase::$dataTypeHelper protected property The data type helper. 1
FieldsProcessorPluginBase::$elementInfoManager protected property The element info manager.
FieldsProcessorPluginBase::create public static function Creates an instance of the plugin. Overrides ProcessorPluginBase::create 1
FieldsProcessorPluginBase::getDataTypeHelper public function Retrieves the data type helper. 1
FieldsProcessorPluginBase::getElementInfoManager public function Retrieves the element info manager.
FieldsProcessorPluginBase::preIndexSave public function Preprocesses the search index entity before it is saved. Overrides ProcessorPluginBase::preIndexSave
FieldsProcessorPluginBase::preRenderFieldsCheckboxes public static function Preprocesses the "fields" checkboxes before rendering.
FieldsProcessorPluginBase::processConditions protected function Preprocesses the query conditions.
FieldsProcessorPluginBase::processConditionValue protected function Processes a single condition value. 1
FieldsProcessorPluginBase::processField protected function Processes a single field's value. 3
FieldsProcessorPluginBase::processFieldValue protected function Processes a single text element in a field. 3
FieldsProcessorPluginBase::processKey protected function Processes a single search keyword. 1
FieldsProcessorPluginBase::processKeys protected function Preprocesses the search keywords.
FieldsProcessorPluginBase::setDataTypeHelper public function Sets the data type helper. 1
FieldsProcessorPluginBase::setElementInfoManager public function Sets the element info manager.
FieldsProcessorPluginBase::shouldProcess protected function Determines whether a single value (not an array) should be processed. 1
FieldsProcessorPluginBase::testField protected function Tests whether a certain field should be processed. 1
FieldsProcessorPluginBase::trustedCallbacks public static function Lists the trusted callbacks provided by the implementing class. Overrides TrustedCallbackInterface::trustedCallbacks
IndexPluginBase::$index protected property The index this processor is configured for.
IndexPluginBase::getIndex public function Retrieves the index this plugin is configured for. Overrides IndexPluginInterface::getIndex
IndexPluginBase::setIndex public function Sets the index this plugin is configured for. Overrides IndexPluginInterface::setIndex
IndexPluginBase::__construct public function Constructs a \Drupal\Component\Plugin\PluginBase object. Overrides ConfigurablePluginBase::__construct 2
MessengerTrait::$messenger protected property The messenger. 29
MessengerTrait::messenger public function Gets the messenger. 29
MessengerTrait::setMessenger public function Sets the messenger.
PluginBase::$configuration protected property Configuration information passed into the plugin. 1
PluginBase::$pluginDefinition protected property The plugin implementation definition. 1
PluginBase::$pluginId protected property The plugin_id.
PluginBase::DERIVATIVE_SEPARATOR constant A string which is used to separate base plugin IDs from the derivative ID.
PluginBase::getBaseId public function Gets the base_plugin_id of the plugin instance. Overrides DerivativeInspectionInterface::getBaseId
PluginBase::getDerivativeId public function Gets the derivative_id of the plugin instance. Overrides DerivativeInspectionInterface::getDerivativeId
PluginBase::getPluginDefinition public function Gets the definition of the plugin implementation. Overrides PluginInspectionInterface::getPluginDefinition 3
PluginBase::getPluginId public function Gets the plugin_id of the plugin instance. Overrides PluginInspectionInterface::getPluginId
PluginBase::isConfigurable public function Determines if the plugin is configurable.
PluginDependencyTrait::calculatePluginDependencies protected function Calculates and adds dependencies of a specific plugin instance. Aliased as: traitCalculatePluginDependencies 1
PluginDependencyTrait::getPluginDependencies protected function Calculates and returns dependencies of a specific plugin instance. Aliased as: traitGetPluginDependencies
PluginDependencyTrait::moduleHandler protected function Wraps the module handler. Aliased as: traitModuleHandler 1
PluginDependencyTrait::themeHandler protected function Wraps the theme handler. Aliased as: traitThemeHandler 1
PluginFormTrait::submitConfigurationForm public function Form submission handler. 7
ProcessorInterface::STAGE_ADD_PROPERTIES constant Processing stage: add properties.
ProcessorInterface::STAGE_ALTER_ITEMS constant Processing stage: alter indexed items.
ProcessorInterface::STAGE_POSTPROCESS_QUERY constant Processing stage: postprocess query.
ProcessorInterface::STAGE_PREPROCESS_INDEX constant Processing stage: preprocess index.
ProcessorInterface::STAGE_PREPROCESS_QUERY constant Processing stage: preprocess query.
ProcessorInterface::STAGE_PRE_INDEX_SAVE constant Processing stage: preprocess index.
ProcessorPluginBase::$fieldsHelper protected property The fields helper. 1
ProcessorPluginBase::addFieldValues public function Adds the values of properties defined by this processor to the item. Overrides ProcessorInterface::addFieldValues 8
ProcessorPluginBase::alterIndexedItems public function Alter the items to be indexed. Overrides ProcessorInterface::alterIndexedItems 3
ProcessorPluginBase::ensureField protected function Ensures that a field with certain properties is indexed on the index.
ProcessorPluginBase::findField protected function Finds a certain field in the index.
ProcessorPluginBase::getFieldsHelper public function Retrieves the fields helper. 1
ProcessorPluginBase::getPropertyDefinitions public function Retrieves the properties this processor defines for the given datasource. Overrides ProcessorInterface::getPropertyDefinitions 8
ProcessorPluginBase::getWeight public function Returns the weight for a specific processing stage. Overrides ProcessorInterface::getWeight
ProcessorPluginBase::isHidden public function Determines whether this plugin should be hidden in the UI. Overrides HideablePluginBase::isHidden
ProcessorPluginBase::isLocked public function Determines whether this processor should always be enabled. Overrides ProcessorInterface::isLocked
ProcessorPluginBase::postprocessSearchResults public function Postprocess search results before they are returned by the query. Overrides ProcessorInterface::postprocessSearchResults 2
ProcessorPluginBase::requiresReindexing public function Determines whether re-indexing is required after a settings change. Overrides ProcessorInterface::requiresReindexing
ProcessorPluginBase::setFieldsHelper public function Sets the fields helper. 1
ProcessorPluginBase::setWeight public function Sets the weight for a specific processing stage. Overrides ProcessorInterface::setWeight
ProcessorPluginBase::supportsStage public function Checks whether this processor implements a particular stage. Overrides ProcessorInterface::supportsStage 2
Stemmer::$stems protected property Static cache for already-generated stems.
Stemmer::buildConfigurationForm public function Form constructor. Overrides FieldsProcessorPluginBase::buildConfigurationForm
Stemmer::defaultConfiguration public function Gets default configuration for this plugin. Overrides FieldsProcessorPluginBase::defaultConfiguration
Stemmer::preprocessIndexItems public function Preprocesses search items for indexing. Overrides FieldsProcessorPluginBase::preprocessIndexItems
Stemmer::preprocessSearchQuery public function Preprocesses a search query. Overrides FieldsProcessorPluginBase::preprocessSearchQuery
Stemmer::process protected function Processes a single string value. Overrides FieldsProcessorPluginBase::process
Stemmer::supportsIndex public static function Checks whether this processor is applicable for a certain index. Overrides ProcessorPluginBase::supportsIndex
Stemmer::testType protected function Determines whether a field of a certain type should be preprocessed. Overrides FieldsProcessorPluginBase::testType
Stemmer::validateConfigurationForm public function Form validation handler. Overrides FieldsProcessorPluginBase::validateConfigurationForm
StringTranslationTrait::$stringTranslation protected property The string translation service. 1
StringTranslationTrait::formatPlural protected function Formats a string containing a count of items.
StringTranslationTrait::getNumberOfPlurals protected function Returns the number of plurals supported by a given language.
StringTranslationTrait::getStringTranslation protected function Gets the string translation service.
StringTranslationTrait::setStringTranslation public function Sets the string translation service to use. 2
StringTranslationTrait::t protected function Translates a string to the current language or to a given language.
TrustedCallbackInterface::THROW_EXCEPTION constant Untrusted callbacks throw exceptions.
TrustedCallbackInterface::TRIGGER_SILENCED_DEPRECATION constant Untrusted callbacks trigger silenced E_USER_DEPRECATION errors.
TrustedCallbackInterface::TRIGGER_WARNING constant Untrusted callbacks trigger E_USER_WARNING errors.