protected function SearchApiAttachmentsAlterSettings::extractPythonPdf2txt in Search API attachments 7
Extracts pdf file content using python pdf2txt script.
Parameters
object $file: The file.
Return value
string The file content.
1 call to SearchApiAttachmentsAlterSettings::extractPythonPdf2txt()
- SearchApiAttachmentsAlterSettings::getFileContent in includes/
callback_attachments_settings.inc - Extracts th file content.
File
- includes/
callback_attachments_settings.inc, line 496 - Search API data alteration callback.
Class
- SearchApiAttachmentsAlterSettings
- Indexes files content.
Code
protected function extractPythonPdf2txt($file) {
$filepath = $this
->getRealpath($file);
// Restore the locale.
$pdf2txt_path = realpath(variable_get('search_api_attachments_python_pdf2txt_path', '/usr/bin'));
$pdf2txt = realpath($pdf2txt_path . '/' . variable_get('search_api_attachments_python_pdf2txt_script', 'pdf2txt'));
$cmd = escapeshellcmd('python') . ' ' . escapeshellarg($pdf2txt) . ' -C -t text ' . escapeshellarg($filepath);
// UTF-8 multibyte characters will be stripped by escapeshellargs() for the
// default C-locale.
// So temporarily set the locale to UTF-8 so that the filepath remains
// valid.
$backup_locale = setlocale(LC_CTYPE, '0');
setlocale(LC_CTYPE, $backup_locale);
// Support UTF-8 commands:
// http://www.php.net/manual/en/function.shell-exec.php#85095
shell_exec("LANG=en_US.utf-8");
return shell_exec($cmd);
}