You are here

function views_natural_sort_encode in Views Natural Sort 7

Same name and namespace in other branches
  1. 6 views_natural_sort.module \views_natural_sort_encode()

Encodes a string into an ascii-sortable such:

  • Leading articles in common languages are ingored: The A An El La Le Il
  • Unimportant punctuation is ignored: # ' " ( )
  • Unimportant words are ignored: and of or
  • Embeded numbers will sort in numerical order. The following possiblities are supported

    • A leading dash indicates a negative number, unless it is preceded by a non-whitespace character, which case it is considered just a dash.
    • Leading zeros are properly ignored so as to not influence sort order
    • Decimal numbers are supported using a period as the decimal character
    • Thousands separates are ignored, using the comma as the thous. character
    • Numbers may be up to 99 digits before the decimal, up to the precision of the processor.

Parameters

$string string: The string to be encoded

Return value

string The encoded string

File

./views_natural_sort.module, line 155
Provides a views filter that sorts titles by a more natural manner by ignoring articles like "The" and "A."

Code

function views_natural_sort_encode($string) {
  $words = variable_get('views_natural_sort_words_remove', array());
  $beginning_words = variable_get('views_natural_sort_beginning_words_remove', array());
  $symbols = variable_get('views_natural_sort_symbols_remove', '');

  // Get the words ready for being put in a regex.
  array_walk($beginning_words, 'preg_quote');
  array_walk($words, 'preg_quote');
  $regex = array();
  $replace = array();

  // Remove words from the beginning only!
  if (!empty($beginning_words)) {
    $regex[] = '/^(' . implode('|', $beginning_words) . ')\\s+/i';
    $replace[] = '';
  }

  // Remove words reguardless where they are as long as they are a word.
  if (!empty($words)) {
    $regex[] = '/\\s(' . implode('|', $words) . ')\\s+/i';
    $replace[] = ' ';
    $regex[] = '/^(' . implode('|', $words) . ')\\s+/i';
    $replace[] = '';
  }

  // Remove symbols.
  if (strlen($symbols) != 0) {
    $regex[] = '/[' . preg_quote($symbols) . ']/';
    $replace[] = '';
  }
  if (!empty($regex) && !empty($replace)) {
    $string = preg_replace($regex, $replace, $string);
  }

  // Find an optional leading dash (either preceded by whitespace or the first character) followed
  // by either:
  //   - an optional series of digits (with optional imbedded commas), then a period, then an optional series of digits OR
  //   - a series of digits (with optional imbedded commas)
  $string = preg_replace_callback('/(\\s-|^-)?(?:(\\d[\\d,]*)?\\.(\\d+)|(\\d[\\d,]*))/', '_views_natural_sort_number_encode_match_callback', $string);

  // Not exactly sure why sometimes data that has been preg replaced comes back
  // without utf8_encoding. This has been known to make Mysql vomit, so encoding
  // here. This isn't seen anywhere else in drupal though.
  // @see http://drupal.org/node/1914098
  $string = utf8_encode($string);

  // The size limit on the content field for views_natual_sort is sometimes not
  // enough. Lets truncate all data down to that size. I personally feel the
  // inaccuracy is an acceptable loss.
  return substr($string, 0, 255);
}