You are here

function _invisimail_email_matching_regexes in Invisimail 7

Generates the two pattern matching regexes for finding email addresses.

This is moved to a separate function for cleanliness, and because it's a hugely complex regex that we want to be able to encapsulate separately.

Return value

array

1 call to _invisimail_email_matching_regexes()
invisimail_encode_string in ./invisimail.module
Encodes all email addresses in a string using the specified encoder.

File

./invisimail.module, line 188
This module provides a filter that will search content for email addresses and replace them with their ascii equivalents before display. This is not a complete protection from spam harvesters, but it is some help.

Code

function _invisimail_email_matching_regexes() {

  // The check for the user/name portion of the email address. This is
  // encapsulable regex that looks for at least one valid character (in most
  // cases, a space), preceded by one invalid character, followed by at least
  // one valid character.
  $valid_user_chars = 'a-zA-Z0-9_\\-\\.\\+\\^!#\\$%&*+\\/\\=\\?\\`\\|\\{\\}~\'';
  $user = "(?<![{$valid_user_chars}])[{$valid_user_chars}]+";

  // For the domain portion of an email addy, you can have a string domain,
  // an ipv4 address, or an ipv6 address. These three regex capture each of
  // those possibilities, respectively.
  $domain = '(?:[a-zA-Z0-9](?:[a-zA-Z0-9\\-]*[a-zA-Z0-9])?\\.)+[a-zA-Z]{2,6}';
  $ipv4 = '[0-9]{1,3}(?:\\.[0-9]{1,3}){3}';
  $ipv6 = '[0-9a-fA-F]{1,4}(?:[0-9a-fA-F]{1,4}){7}';

  // Now we stick it all together into a generalized, encapsulated, portable,
  // and non-subitem-capturing (hence all the '(?:', which mark subpatterns as
  // non-capturing) regex for grabbing email addresses.
  $mail = "(?:{$user})+\\@(?:{$domain}|(?:\\[(?:{$ipv4}|{$ipv6})\\]))";

  // PCRE pattern modifiers; 'i' enables case-insensitivity, and 'S' enables
  // the additional pattern analysis, as our regex is one that can benefit
  // (it is a non-anchored pattern without a single fixed starting character.
  // see http://us2.php.net/manual/en/reference.pcre.pattern.modifiers.php).
  // Global case insensitivity is a little sloppy to use, but selectively
  // toggling it within only some of the subpatterns isn't really worth the
  // effort.
  $modifiers = 'iS';

  // The final pattern. We deal with these as an entire group because invisimail
  // allows options that require us to deal with both an href and its text
  // in relation to one another.
  $pattern = "@(?:(<a [^>]*href=['\"](mailto:{$mail})['\"][^>]*>)?((?" . ">(?<!mailto:)){$mail}))|(<a [^>]*href=['\"](mailto:{$mail})['\"][^>]*>(.*?)</a>)@{$modifiers}";
  $pattern_diff_link_text = "@(<a [^>]*href=['\"](mailto:{$mail})['\"][^>]*>(.*?)</a>)@{$modifiers}";
  $pattern_same_link_text = "@(?:(<a [^>]*href=['\"](mailto:{$mail})['\"][^>]*>)?((?" . ">(?<!mailto:)){$mail}))@{$modifiers}";
  return array(
    'diff_link' => $pattern_diff_link_text,
    'same_link' => $pattern_same_link_text,
  );
}