You are here

protected function SearchApiAttachmentsAlterSettings::extractPdftotext in Search API attachments 7

Extracts pdf file content using pdftotext.

Parameters

object $file: The file.

Return value

string The file content.

1 call to SearchApiAttachmentsAlterSettings::extractPdftotext()
SearchApiAttachmentsAlterSettings::getFileContent in includes/callback_attachments_settings.inc
Extracts th file content.

File

includes/callback_attachments_settings.inc, line 468
Search API data alteration callback.

Class

SearchApiAttachmentsAlterSettings
Indexes files content.

Code

protected function extractPdftotext($file) {
  $filepath = $this
    ->getRealpath($file);

  // UTF-8 multibyte characters will be stripped by escapeshellargs() for the
  // default C-locale.
  // So temporarily set the locale to UTF-8 so that the filepath remains
  // valid.
  $backup_locale = setlocale(LC_CTYPE, '0');
  setlocale(LC_CTYPE, 'en_US.UTF-8');

  // Pdftotext descriptions states that '-' as text-file will send text to
  // stdout.
  $cmd = escapeshellcmd('pdftotext') . ' ' . escapeshellarg($filepath) . ' -';

  // Restore the locale.
  setlocale(LC_CTYPE, $backup_locale);

  // Support UTF-8 commands :
  // http://www.php.net/manual/en/function.shell-exec.php#85095
  shell_exec("LANG=en_US.utf-8");
  return shell_exec($cmd);
}