You are here

function _spamspan_filter_process in SpamSpan filter 7

Spamspan filter process callback

Scan text and replace email addresses with span tags

We are aiming to replace emails with code like this: <span class="spamspan"> <span class="u">user</span> [at] <span class="d">example [dot] com</span> <span class="t"tag contents></span></span>

1 call to _spamspan_filter_process()
spamspan in ./spamspan.module
A simple utility function wrapping the main processing callback. This function may be called by other modules and themes.
1 string reference to '_spamspan_filter_process'
spamspan_filter_info in ./spamspan.module
Implements hook_filter_info(). This function is called on every page so keep it fast and simple.

File

./spamspan.module, line 101
This module implements the spamspan technique (http://www.spamspan.com ) for hiding email addresses from spambots.

Code

function _spamspan_filter_process($text, $filter) {

  // The preg_replace_callback functions below cannot take any additional
  // arguments, so we pass the relevant settings in spamspan_admin.
  spamspan_admin()
    ->filter_set($filter);

  // Top and tail the email regexp it so that it is case insensitive and
  // ignores whitespace.
  $emailpattern = '!' . SPAMSPAN_EMAIL . '!ix';
  $emailpattern_with_options = '!' . SPAMSPAN_EMAIL . '\\[(.*?)\\]!ix';

  // Next set up a regex for mailto: URLs.
  // - see http://www.faqs.org/rfcs/rfc2368.html
  // This captures the whole mailto: URL into the second group,
  // the name into the third group and the domain into
  // the fourth. The tag contents go into the fifth.
  $mailtopattern = "!<a\\s+                                # opening <a and spaces\n      ((?:\\w+\\s*=\\s*)(?:\\w+|\"[^\"]*\"|'[^']*'))*?        # any attributes\n      \\s*                                                 # whitespace\n      href\\s*=\\s*(['\"])(mailto:" . SPAMSPAN_EMAIL . "(?:\\?[A-Za-z0-9_= %\\.\\-\\~\\_\\&;\\!\\*\\(\\)\\'#&]*)?)" . "\\2                                                # the relevant quote\n                                                          # character\n      ((?:\\s+\\w+\\s*=\\s*)(?:\\w+|\"[^\"]*\"|'[^']*'))*?     # any more attributes\n      >                                                   # end of the first tag\n      (.*?)                                               # tag contents.  NB this\n                                                          # will not work properly\n                                                          # if there is a nested\n                                                          # <a>, but this is not\n                                                          # valid xhtml anyway.\n      </a>                                                # closing tag\n      !ix";

  // HTML image tags need to be handled separately, as they may contain base64
  // encoded images slowing down the email regex function.
  // Therefore, remove all image contents and add them back later.
  // See https://drupal.org/node/1243042 for details.
  $images = array(
    array(),
  );
  $inline_image_pattern = '/data\\:(?:.+?)base64(?:.+?)["|\']/';
  preg_match_all($inline_image_pattern, $text, $images);
  $text = preg_replace($inline_image_pattern, '__spamspan_img_placeholder__', $text);

  // Now we can convert all mailto URLs
  $text = preg_replace_callback($mailtopattern, '_spamspan_callback_mailto', $text);

  // all bare email addresses with optional formatting information
  $text = preg_replace_callback($emailpattern_with_options, '_spamspan_email_addresses_with_options', $text);

  // and finally, all bare email addresses
  $text = preg_replace_callback($emailpattern, '_spamspan_bare_email_addresses', $text);

  // Revert back to the original image contents.
  foreach ($images[0] as $image) {
    $text = preg_replace('/__spamspan_img_placeholder__/', $image, $text, 1);
  }
  return $text;
}