You are here

function apachesolr_clean_text in Apache Solr Search 6

Same name and namespace in other branches

8 apachesolr.module \apachesolr_clean_text()
5.2 apachesolr.index.inc \apachesolr_clean_text()
6.3 apachesolr.module \apachesolr_clean_text()
6.2 apachesolr.index.inc \apachesolr_clean_text()
7 apachesolr.module \apachesolr_clean_text()

Strip html tags and also control characters that cause Jetty/Solr to fail.

3 calls to apachesolr_clean_text()

apachesolr_add_tags_to_document in ./apachesolr.index.inc: Extract HTML tag contents from $text and add to boost fields.
apachesolr_add_taxonomy_to_document in ./apachesolr.index.inc: Extract taxonomy from $node and add to dynamic fields.
apachesolr_node_to_document in ./apachesolr.index.inc: Given a node ID, return a document representing that node.

File

./apachesolr.index.inc, line 20: Functions used when indexing content to Apache Solr.

Code

function apachesolr_clean_text($text) {

  // Add spaces before stripping tags to avoid running words together.
  $text = filter_xss(str_replace(array(
    '<',
    '>',
  ), array(
    ' <',
    '> ',
  ), $text), array());

  // Decode entities and then make safe any < or > characters.
  return htmlspecialchars(html_entity_decode($text, ENT_QUOTES, 'UTF-8'), ENT_QUOTES, 'UTF-8');
}