function apachesolr_attachments_extract_using_solr in Apache Solr Attachments 6
Same name and namespace in other branches
- 6.3 apachesolr_attachments.index.inc \apachesolr_attachments_extract_using_solr()
- 6.2 apachesolr_attachments.admin.inc \apachesolr_attachments_extract_using_solr()
- 7 apachesolr_attachments.index.inc \apachesolr_attachments_extract_using_solr()
For a file path, try to extract text using Solr 1.4.
Throws
Exception
1 call to apachesolr_attachments_extract_using_solr()
- apachesolr_attachments_get_attachment_text in ./
apachesolr_attachments.admin.inc - Parse the attachment getting just the raw text.
File
- ./
apachesolr_attachments.admin.inc, line 503 - Provides a file attachment search implementation for use with the Apache Solr module
Code
function apachesolr_attachments_extract_using_solr($filepath) {
// Extract using Solr.
// We allow Solr to throw exceptions - they will be caught
// by apachesolr.module.
$solr = apachesolr_get_solr();
$filename = basename($filepath);
$params = array(
'resource.name' => $filename,
'extractFormat' => 'text',
);
// Construct a multi-part form-data POST body in $data.
$boundary = '--' . md5(uniqid(time()));
$data = "--{$boundary}\r\n";
// The 'filename' used here becomes the property name in the response.
$data .= 'Content-Disposition: form-data; name="file"; filename="extracted"';
$data .= "\r\nContent-Type: application/octet-stream\r\n\r\n";
$data .= file_get_contents($filepath);
$data .= "\r\n--{$boundary}--\r\n";
$headers = array(
'Content-Type' => 'multipart/form-data; boundary=' . $boundary,
);
$response = $solr
->makeServletRequest(EXTRACTING_SERVLET, $params, 'POST', $headers, $data);
return array(
$response->extracted,
$response->extracted_metadata,
);
}