You are here

class SolrExtractor in Search API attachments 9.0.x

Same name and namespace in other branches
  1. 8 src/Plugin/search_api_attachments/SolrExtractor.php \Drupal\search_api_attachments\Plugin\search_api_attachments\SolrExtractor

Provides solr extractor.

Plugin annotation


@SearchApiAttachmentsTextExtractor(
  id = "solr_extractor",
  label = @Translation("Solr Extractor"),
  description = @Translation("Adds Solr extractor support."),
)

Hierarchy

Expanded class hierarchy of SolrExtractor

1 file declares its use of SolrExtractor
ExtractBody.php in tests/src/Unit/ExtractBody.php

File

src/Plugin/search_api_attachments/SolrExtractor.php, line 26

Namespace

Drupal\search_api_attachments\Plugin\search_api_attachments
View source
class SolrExtractor extends TextExtractorPluginBase {

  /**
   * Entity type manager service.
   *
   * @var \Drupal\Core\Entity\EntityTypeManagerInterface
   */
  protected $entityTypeManager;

  /**
   * {@inheritdoc}
   */
  public function __construct(array $configuration, $plugin_id, array $plugin_definition, ConfigFactoryInterface $config_factory, StreamWrapperManagerInterface $stream_wrapper_manager, MimeTypeGuesserInterface $mime_type_guesser, MessengerInterface $messenger, FileSystemInterface $file_system, EntityTypeManagerInterface $entity_type_manager) {
    parent::__construct($configuration, $plugin_id, $plugin_definition, $config_factory, $stream_wrapper_manager, $mime_type_guesser, $messenger, $file_system);
    $this->entityTypeManager = $entity_type_manager;
  }

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    return new static($configuration, $plugin_id, $plugin_definition, $container
      ->get('config.factory'), $container
      ->get('stream_wrapper_manager'), $container
      ->get('file.mime_type.guesser'), $container
      ->get('messenger'), $container
      ->get('file_system'), $container
      ->get('entity_type.manager'));
  }

  /**
   * Extract file with a search api solr backend.
   *
   * @param \Drupal\file\Entity\File $file
   *   A file object.
   *
   * @return string
   *   The text extracted from the file.
   */
  public function extract(File $file) {
    $filepath = $this
      ->getRealpath($file
      ->getFileUri());

    // Load the chosen Solr server entity.
    $conditions = [
      'status' => TRUE,
      'id' => $this->configuration['solr_server'],
    ];
    $server = $this->entityTypeManager
      ->getStorage('search_api_server')
      ->loadByProperties($conditions);
    $server = reset($server);

    // Get the Solr backend.

    /** @var \Drupal\search_api_solr\Plugin\search_api\backend\SearchApiSolrBackend $backend */
    $backend = $server
      ->getBackend();
    if (!$backend
      ->isAvailable()) {
      throw new \Exception('Solr Exctractor is not available.');
    }

    // Extract the content.
    $xml_data = $backend
      ->extractContentFromFile($filepath);
    return self::extractBody($xml_data);
  }

  /**
   * Extract the body from XML response.
   */
  public static function extractBody($xml_data) {
    if (!preg_match(',<body[^>]*>(.*)</body>,sim', $xml_data, $matches)) {

      // If the body can't be found return just the text. This will be safe
      // and contain any text to index.
      return strip_tags($xml_data);
    }

    // Return the full content of the body. Including tags that can optionally
    // be used for index weight.
    return $matches[1];
  }

  /**
   * {@inheritdoc}
   */
  public function buildConfigurationForm(array $form, FormStateInterface $form_state) {
    $form = [];
    $conditions = [
      'status' => TRUE,
    ];
    $search_api_solr_servers = $this->entityTypeManager
      ->getStorage('search_api_server')
      ->loadByProperties($conditions);
    $options = [];
    foreach ($search_api_solr_servers as $solr_server) {
      if ($solr_server
        ->hasValidBackend() && $solr_server
        ->getBackend() instanceof SearchApiSolrBackend) {
        $options[$solr_server
          ->id()] = $solr_server
          ->label();
      }
    }
    $form['solr_server'] = [
      '#type' => 'select',
      '#title' => $this
        ->t('Solr server'),
      '#description' => $this
        ->t('Select the solr server you want to use.'),
      '#empty_value' => '',
      '#options' => $options,
      '#default_value' => $this->configuration['solr_server'],
      '#required' => TRUE,
    ];
    return $form;
  }

  /**
   * {@inheritdoc}
   */
  public function submitConfigurationForm(array &$form, FormStateInterface $form_state) {
    $this->configuration['solr_server'] = $form_state
      ->getValue([
      'text_extractor_config',
      'solr_server',
    ]);
    parent::submitConfigurationForm($form, $form_state);
  }

}

Members

Namesort descending Modifiers Type Description Overrides
DependencySerializationTrait::$_entityStorages protected property
DependencySerializationTrait::$_serviceIds protected property
DependencySerializationTrait::__sleep public function 2
DependencySerializationTrait::__wakeup public function 2
MessengerTrait::messenger public function Gets the messenger. 27
MessengerTrait::setMessenger public function Sets the messenger.
PluginBase::$configuration protected property Configuration information passed into the plugin. 1
PluginBase::$pluginDefinition protected property The plugin implementation definition. 1
PluginBase::$pluginId protected property The plugin_id.
PluginBase::DERIVATIVE_SEPARATOR constant A string which is used to separate base plugin IDs from the derivative ID.
PluginBase::getBaseId public function Gets the base_plugin_id of the plugin instance. Overrides DerivativeInspectionInterface::getBaseId
PluginBase::getDerivativeId public function Gets the derivative_id of the plugin instance. Overrides DerivativeInspectionInterface::getDerivativeId
PluginBase::getPluginDefinition public function Gets the definition of the plugin implementation. Overrides PluginInspectionInterface::getPluginDefinition 2
PluginBase::getPluginId public function Gets the plugin_id of the plugin instance. Overrides PluginInspectionInterface::getPluginId
PluginBase::isConfigurable public function Determines if the plugin is configurable.
SolrExtractor::$entityTypeManager protected property Entity type manager service.
SolrExtractor::buildConfigurationForm public function Form constructor. Overrides PluginFormInterface::buildConfigurationForm
SolrExtractor::create public static function Creates an instance of the plugin. Overrides TextExtractorPluginBase::create
SolrExtractor::extract public function Extract file with a search api solr backend. Overrides TextExtractorPluginBase::extract
SolrExtractor::extractBody public static function Extract the body from XML response.
SolrExtractor::submitConfigurationForm public function Form submission handler. Overrides TextExtractorPluginBase::submitConfigurationForm
SolrExtractor::__construct public function Constructs a \Drupal\Component\Plugin\PluginBase object. Overrides TextExtractorPluginBase::__construct
StringTranslationTrait::$stringTranslation protected property The string translation service. 4
StringTranslationTrait::formatPlural protected function Formats a string containing a count of items.
StringTranslationTrait::getNumberOfPlurals protected function Returns the number of plurals supported by a given language.
StringTranslationTrait::getStringTranslation protected function Gets the string translation service.
StringTranslationTrait::setStringTranslation public function Sets the string translation service to use. 2
StringTranslationTrait::t protected function Translates a string to the current language or to a given language.
TextExtractorPluginBase::$configFactory protected property Config factory service.
TextExtractorPluginBase::$messenger protected property The messenger. Overrides MessengerTrait::$messenger
TextExtractorPluginBase::$mimeTypeGuesser protected property Mime type guesser service.
TextExtractorPluginBase::$streamWrapperManager protected property Stream wrapper manager service.
TextExtractorPluginBase::calculateDependencies public function
TextExtractorPluginBase::CONFIGNAME constant Name of the config being edited.
TextExtractorPluginBase::defaultConfiguration public function Gets default configuration for this plugin. Overrides ConfigurableInterface::defaultConfiguration
TextExtractorPluginBase::getConfiguration public function Gets this plugin's configuration. Overrides ConfigurableInterface::getConfiguration
TextExtractorPluginBase::getmessenger public function
TextExtractorPluginBase::getPdfMimeTypes public function Helper method to get the PDF MIME types.
TextExtractorPluginBase::getRealpath public function Helper method to get the real path from an uri.
TextExtractorPluginBase::setConfiguration public function Sets the configuration for this plugin instance. Overrides ConfigurableInterface::setConfiguration
TextExtractorPluginBase::validateConfigurationForm public function Form validation handler. Overrides PluginFormInterface::validateConfigurationForm 5