You are here

function apachesolr_clean_text in Apache Solr Search 6

Same name and namespace in other branches
  1. 8 apachesolr.module \apachesolr_clean_text()
  2. 5.2 apachesolr.index.inc \apachesolr_clean_text()
  3. 6.3 apachesolr.module \apachesolr_clean_text()
  4. 6.2 apachesolr.index.inc \apachesolr_clean_text()
  5. 7 apachesolr.module \apachesolr_clean_text()

Strip html tags and also control characters that cause Jetty/Solr to fail.

3 calls to apachesolr_clean_text()
apachesolr_add_tags_to_document in ./apachesolr.index.inc
Extract HTML tag contents from $text and add to boost fields.
apachesolr_add_taxonomy_to_document in ./apachesolr.index.inc
Extract taxonomy from $node and add to dynamic fields.
apachesolr_node_to_document in ./apachesolr.index.inc
Given a node ID, return a document representing that node.

File

./apachesolr.index.inc, line 20
Functions used when indexing content to Apache Solr.

Code

function apachesolr_clean_text($text) {

  // Add spaces before stripping tags to avoid running words together.
  $text = filter_xss(str_replace(array(
    '<',
    '>',
  ), array(
    ' <',
    '> ',
  ), $text), array());

  // Decode entities and then make safe any < or > characters.
  return htmlspecialchars(html_entity_decode($text, ENT_QUOTES, 'UTF-8'), ENT_QUOTES, 'UTF-8');
}