You are here

protected function SearchApiAttachmentsAlterSettings::extractTika in Search API attachments 7

Extracts file content using local tika executable.

Parameters

object $file: The file.

Return value

string The file content.

Throws

Exception

1 call to SearchApiAttachmentsAlterSettings::extractTika()
SearchApiAttachmentsAlterSettings::getFileContent in includes/callback_attachments_settings.inc
Extracts th file content.

File

includes/callback_attachments_settings.inc, line 395
Search API data alteration callback.

Class

SearchApiAttachmentsAlterSettings
Indexes files content.

Code

protected function extractTika($file) {
  $filepath = $this
    ->getRealpath($file);
  $tika_path = realpath(variable_get('search_api_attachments_tika_path', ''));
  $tika = realpath($tika_path . '/' . variable_get('search_api_attachments_tika_jar', 'tika-app-1.6.jar'));
  if (!$tika || !is_file($tika)) {
    throw new Exception(t('Invalid path or filename for tika application jar.'));
  }

  // UTF-8 multibyte characters will be stripped by escapeshellargs() for the
  // default C-locale.
  // So temporarily set the locale to UTF-8 so that the filepath
  // remains valid.
  $backup_locale = setlocale(LC_CTYPE, '0');
  setlocale(LC_CTYPE, 'en_US.UTF-8');
  $param = '';
  if ($file['filemime'] != 'audio/mpeg') {
    $param = ' -Dfile.encoding=UTF8 -cp ' . escapeshellarg($tika_path);
  }

  // Force running the Tika jar headless.
  $param = ' -Djava.awt.headless=true ' . $param;
  $cmd = variable_get('search_api_attachments_java', 'java') . $param . ' -jar ' . escapeshellarg($tika) . ' -t ' . escapeshellarg($filepath);
  if (strpos(ini_get('extension_dir'), 'MAMP/')) {
    $cmd = 'export DYLD_LIBRARY_PATH=""; ' . $cmd;
  }

  // Restore the locale.
  setlocale(LC_CTYPE, $backup_locale);

  // Support UTF-8 commands:
  // http://www.php.net/manual/en/function.shell-exec.php#85095
  shell_exec("LANG=en_US.utf-8");
  return shell_exec($cmd);
}