protected function SearchApiAttachmentsAlterSettings::extractSolr in Search API attachments 7
Extract data using Solr.
This is done via the ExtractingRequestHandler or using the remote Tika servlet.
Parameters
object $file: The file.
Return value
string The file content.
Throws
See also
http://wiki.apache.org/solr/ExtractingRequestHandler
http://wiki.apache.org/tika/TikaJAXRS
1 call to SearchApiAttachmentsAlterSettings::extractSolr()
- SearchApiAttachmentsAlterSettings::getFileContent in includes/
callback_attachments_settings.inc - Extracts th file content.
File
- includes/
callback_attachments_settings.inc, line 531 - Search API data alteration callback.
Class
- SearchApiAttachmentsAlterSettings
- Indexes files content.
Code
protected function extractSolr($file) {
$extraction = FALSE;
$filepath = $this
->getRealpath($file);
try {
$filename = basename($filepath);
// Server name is stored in the index.
$server_name = $this->index->server;
$server = search_api_server_load($server_name, TRUE);
// Make sure this is a solr server.
$class_info = search_api_get_service_info($server->class);
$classes = class_parents($class_info['class']);
$classes[$class_info['class']] = $class_info['class'];
if (!in_array('SearchApiSolrService', $classes)) {
throw new SearchApiException(t('Server %server is not a Solr server, unable to extract file.', array(
'%server' => $server_name,
)));
}
// Open a connection to the server.
$solr_connection = $server
->getSolrConnection();
// Path for our servlet request.
$servlet_path = variable_get('search_api_attachments_extracting_servlet_path', 'update/extract');
// Parameters for the extraction request.
$params = array(
'extractOnly' => 'true',
'resource.name' => $filename,
// Matches the -t command for the tika CLI app.
'extractFormat' => 'text',
'wt' => 'json',
'hl' => 'on',
);
// Heavily inspired by apachesolr_file.
// @see apachesolr_file_extract().
// Construct a multi-part form-data POST body in $data.
$boundary = '--' . md5(uniqid(REQUEST_TIME));
$data = "--{$boundary}\r\n";
// The 'filename' used here becomes the property name in the response.
$data .= 'Content-Disposition: form-data; name="file"; filename="extracted"';
$data .= "\r\nContent-Type: application/octet-stream\r\n\r\n";
$data .= file_get_contents($filepath);
$data .= "\r\n--{$boundary}--\r\n";
$headers = array(
'Content-Type' => 'multipart/form-data; boundary=' . $boundary,
);
$options = array(
'method' => 'POST',
'headers' => $headers,
'data' => $data,
);
// Make a servlet request using the solr connection.
$response = $solr_connection
->makeServletRequest($servlet_path, $params, $options);
// If we have an extracted response, all is well.
if (isset($response->extracted)) {
$extraction = $response->extracted;
}
} catch (Exception $e) {
// Log the exception to watchdog. Exceptions from Solr may be transient,
// or indicate a problem with a specific file.
watchdog('search_api_attachments', 'Exception occurred sending %filepath to Solr.', array(
'%filepath' => $file['uri'],
));
watchdog_exception('search_api_attachments', $e);
}
return $extraction;
}