You are here

protected function SearchApiAttachmentsAlterSettings::extractPythonPdf2txt in Search API attachments 7

Extracts pdf file content using python pdf2txt script.

Parameters

object $file: The file.

Return value

string The file content.

1 call to SearchApiAttachmentsAlterSettings::extractPythonPdf2txt()
SearchApiAttachmentsAlterSettings::getFileContent in includes/callback_attachments_settings.inc
Extracts th file content.

File

includes/callback_attachments_settings.inc, line 496
Search API data alteration callback.

Class

SearchApiAttachmentsAlterSettings
Indexes files content.

Code

protected function extractPythonPdf2txt($file) {
  $filepath = $this
    ->getRealpath($file);

  // Restore the locale.
  $pdf2txt_path = realpath(variable_get('search_api_attachments_python_pdf2txt_path', '/usr/bin'));
  $pdf2txt = realpath($pdf2txt_path . '/' . variable_get('search_api_attachments_python_pdf2txt_script', 'pdf2txt'));
  $cmd = escapeshellcmd('python') . ' ' . escapeshellarg($pdf2txt) . ' -C -t text ' . escapeshellarg($filepath);

  // UTF-8 multibyte characters will be stripped by escapeshellargs() for the
  // default C-locale.
  // So temporarily set the locale to UTF-8 so that the filepath remains
  // valid.
  $backup_locale = setlocale(LC_CTYPE, '0');
  setlocale(LC_CTYPE, $backup_locale);

  // Support UTF-8 commands:
  // http://www.php.net/manual/en/function.shell-exec.php#85095
  shell_exec("LANG=en_US.utf-8");
  return shell_exec($cmd);
}