protected function SearchApiAttachmentsAlterSettings::extractTika in Search API attachments 7
Extracts file content using local tika executable.
Parameters
object $file: The file.
Return value
string The file content.
Throws
Exception
1 call to SearchApiAttachmentsAlterSettings::extractTika()
- SearchApiAttachmentsAlterSettings::getFileContent in includes/
callback_attachments_settings.inc - Extracts th file content.
File
- includes/
callback_attachments_settings.inc, line 395 - Search API data alteration callback.
Class
- SearchApiAttachmentsAlterSettings
- Indexes files content.
Code
protected function extractTika($file) {
$filepath = $this
->getRealpath($file);
$tika_path = realpath(variable_get('search_api_attachments_tika_path', ''));
$tika = realpath($tika_path . '/' . variable_get('search_api_attachments_tika_jar', 'tika-app-1.6.jar'));
if (!$tika || !is_file($tika)) {
throw new Exception(t('Invalid path or filename for tika application jar.'));
}
// UTF-8 multibyte characters will be stripped by escapeshellargs() for the
// default C-locale.
// So temporarily set the locale to UTF-8 so that the filepath
// remains valid.
$backup_locale = setlocale(LC_CTYPE, '0');
setlocale(LC_CTYPE, 'en_US.UTF-8');
$param = '';
if ($file['filemime'] != 'audio/mpeg') {
$param = ' -Dfile.encoding=UTF8 -cp ' . escapeshellarg($tika_path);
}
// Force running the Tika jar headless.
$param = ' -Djava.awt.headless=true ' . $param;
$cmd = variable_get('search_api_attachments_java', 'java') . $param . ' -jar ' . escapeshellarg($tika) . ' -t ' . escapeshellarg($filepath);
if (strpos(ini_get('extension_dir'), 'MAMP/')) {
$cmd = 'export DYLD_LIBRARY_PATH=""; ' . $cmd;
}
// Restore the locale.
setlocale(LC_CTYPE, $backup_locale);
// Support UTF-8 commands:
// http://www.php.net/manual/en/function.shell-exec.php#85095
shell_exec("LANG=en_US.utf-8");
return shell_exec($cmd);
}