You are here

public function DocconvExtractor::extract in Search API attachments 9.0.x

Same name and namespace in other branches
  1. 8 src/Plugin/search_api_attachments/DocconvExtractor.php \Drupal\search_api_attachments\Plugin\search_api_attachments\DocconvExtractor::extract()

Extract method.

Parameters

\Drupal\file\Entity\File $file: The file object.

Return value

string The file extracted content.

Overrides TextExtractorPluginBase::extract

File

src/Plugin/search_api_attachments/DocconvExtractor.php, line 23

Class

DocconvExtractor
Provides docconv extractor.

Namespace

Drupal\search_api_attachments\Plugin\search_api_attachments

Code

public function extract(File $file) {
  $output = '';
  $docconv_path = $this->configuration['docconv_path'];
  $filepath = $this
    ->getRealpath($file
    ->getFileUri());
  $cmd = escapeshellarg($docconv_path) . ' -input ' . escapeshellarg($filepath);

  // UTF-8 multibyte characters will be stripped by escapeshellargs() for
  // the default C-locale.
  // So temporarily set the locale to UTF-8 so that the filepath remains
  // valid.
  $backup_locale = setlocale(LC_CTYPE, '0');
  setlocale(LC_CTYPE, $backup_locale);

  // Support UTF-8 commands.
  // @see http://www.php.net/manual/en/function.shell-exec.php#85095
  shell_exec("LANG=en_US.utf-8");
  $output = shell_exec($cmd);
  if (is_null($output)) {
    throw new \Exception('Docconv Exctractor is not available.');
  }
  return $output;
}