You are here

Recommender API Documentation in Recommender API 7.5

Authors: Daniel Xiaodan Zhou (http://michiza.com/)
Version: For Recommender API 7.x-3.x/4.x
Note:HTML version is generated by docutils ('rst2html').

What is Recommender API?

Recommender API provides easy-to-understand, easy-to-use, fully-documented APIs to write Drupal content recommendation modules (e.g., Fivestar Recommender, Ubercart Product Recommender, etc.). It also provides a unified approach to configure & execute the recommender algorithms, and to display results to end users.

What's new in 7.x-3.x/4.x?

The 3.x release completely re-writes all API code, and is not compatible with earlier releases. Major changes and features are:

  • All complex computation is now done under Apache Mahout (http://mahout.apache.org/), which is much faster than the original PHP implementation. See http://drupal.org/node/1180000 for more details.
  • All end user display is now through customizable Views rather than hard coded blocks. See http://drupal.org/node/673786 for more details.
  • You can isolate the resource intensive recommender computation from the Drupal production site.
  • It now supports more algorithms (e.g. SVD) provided by Apache Mahout.

The 4.x release is the same as 3.x release, except that it uses Async Command 2.x release.

Installation, Configuration & Execution

How does Recommender API work?

Conceptually, you need 2 computers to run Recommender API: the Drupal server for your Drupal site, and the recommender server to compute recommendations. Of course, you can run the recommender server on the same Drupal server if you like, but note that the recommender computation could easily consume all resources. On the Drupal server, you simply issue a command to run recommenders. On the recommender server, you would run the real recommender program (written in Java), which takes the command, computes results, and saves the results back to the Drupal server database for display.

Out-of-the-box Installation

This simplified installation process assumes you have Recommender API installed under "sites/all/modules/recommender" on your Drupal server. If you want to offload recommendations computations to a separate recommender server, please follow the generic installation process in the next section.

Step 1. Install the Recommender API module to "sites/all/modules/recommender".

Step 2. Download Apache Mahout v.0.6+ from http://mahout.apache.org/, and extract it to "sites/all/libraries/mahout". Make sure you have Java 6 or greater installed.

Step 3. Install the Async Command module (http://drupal.org/project/async_command) to "sites/all/modules/async_command".

Step 4. Install any helper modules to your Drupal server, such as Browsing History Recommender, Fivestar Recommender, Ubercart Product Recommender, etc.

Step 5. Periodically execute "run.sh" (Linux/Mac/Unix) or "run.bat" (Windows) under "sites/all/modules/recommender" to compute recommendations. You might need to edit "run.sh" or "run.bat" and change the DRUPAL_HOME parameter in case of errors.

Installation (Generic Process)

(You can skip this section if you use the out-of-the-box installation process.)

Step 1. Download Apache Mahout v.0.6+ from http://mahout.apache.org/, and extract it to any directory on the recommender server. Make sure you have Java 6 or greater installed.

Step 2. Install the Async Command module (http://drupal.org/project/async_command) to your Drupal server under sites/all/modules/async_command. Copy the 'lib' sub-directory and async-command.jar to any folder on the recommender server.

Step 3. Install the Recommender API module (http://drupal.org/project/recommender) to your Drupal server under sites/all/modules/recommender. Copy the 'recommender.jar' file to any folder on the recommender server.

Step 4. Install any helper modules to your Drupal server, such as Browsing History Recommender, Fivestar Recommender, Ubercart Product Recommender, etc.

Step 5. On your recommender server, create the 'config.properties' file similar to 'config.properties.example'. This file specifies how the recommender program ("recommender.jar") access your Drupal database. The access doesn't need full DB privileges. The minimum privileges are:

SELECT/UPDATE on {async_command}, {recommender_app}
SELECT on {node} (nid, created), {users} (uid, created), or other tables (see documentation of relevant helper modules)
SELECT/INSERT/UPDATE/DELETE on {recommender_similarity}, {recommender_prediction}, {recommender_preference_staging}

Step 6. Copy 'run.sh' (Linux/Mac/Unix) or 'run.bat' (Windows) to any directory on the recommender server, and manually set the parameters to point to the correct directories.

Step 7. Periodically execute 'run.sh' or 'run.bat' on your recommender server to compute recommendations.

Configuration & Execution

On your Drupal server, you can go to admin->configure->recommender to run various recommenders. Note that this doesn't do any computation, but merely issues the commands in the {async_command} queue to be executed on your recommender server. Remember to set "administer recommender" permission too.

On your recommender server, you need to periodically execute 'run.sh' or 'run.bat' to actually compute the recommendations. If you use Linux, you might want to run 'run.sh' as a cron job on your recommender server. This is different from the cron settings on the Drupal server, which just issues the commands. The cron settings could be like this (run every 30 minutes):

# in crontab -e, add the following line.
*/30 * * * * flock -n /tmp/recommender.lock recommender/run.sh >> /tmp/recommender.log 2>&1

Similarity vs. prediction

Recommender API offers two types of recommendations, similarity-based and prediction-based, although different algorithms might implement both or either (e.g. SlopeOne algorithm only has prediction-based recommendations).

One type of recommendations is based on the similarity among nodes (or users, or other types of entities). For example, if you are viewing a node, it will recommend other similar nodes. The recommended nodes are the same for this particular node regardless of which user is viewing it. The similarity scores are computed based on the fact that, for example, if two nodes are usually viewed together, or two products usually purchased together, then the two nodes/products are similar. The helper modules actually define what information to use to compute the similarity scores. The similarity scores range from -1 (completed dissimilar) to +1 (completely similar), and are directional: A is similar to B doesn't mean B is similar to A.

The other type of recommendations is based on the "prediction scores", which predict how much a user would like a node. The recommendations are personalized: different users would see different recommendations. But for each user, she would see the same recommendations regardless which page she is viewing. The prediction scores are computed based on the user's personal history. For example, if a user purchased products A and B, she might be interested in purchasing C which is similar to A and B. Exactly what "personal history" to use is defined by the helper modules. (A side note: If you treat users as nodes and nodes as users, you can then predict how much a node would "like" a user. This is useful when you want to promote a node to the most interested users.)

You need to understand the distinction between similarity and prediction in order to work with Views.

Views support

Recommender API supports Views 3, which is the preferred way to display recommendations. Most helper modules would create default Views, and you can just customize those.

However, if you do want to create your own recommender views, here's how:

Step 1: Choose the views base table, either Recommender Similarity or Recommender Prediction, depending on which type of recommendations you would show.

Step 2: In "filter criteria", you need to select which recommender application to provide the recommendations. Usually you just need the "Application ID" filter (if you are a helper module developer, please use "Application Name" filter). Use other filters if you want.

Step 3: In "relationships", add a new "Entity ID (Target)" relationship. In the next page, select the entity type of the recommended items. For example if your recommendations are nodes, then use "Content". Also check the "Require this relationship".

Step 4: In "contextual filters" (a.k.a. "Arguments"), add "Entity ID (Source)". This is where the recommendations are made for. If your recommendations are made for the current user, then here is the UserID of the current user. Usually you want provide a default value of either the current node or the current user.

Step 5: Add "fields", "sort criteria", or make other Views settings as you see fit. When you sort by similarity scores or prediction scores, choose "descending".

Recommender Algorithms Explained

User-user vs. item-item

The two most popular recommender algorithms are user-user and item-item. The user-user algorithm first computes similarities among users based on the users history records (such as purchasing history, nodes browsing history, etc), and then predicts how much a user likes an item based on how much the user's similar users like the item. The item-item algorithm first computes similarities among items based on some information (e.g., the items are always purchased together, the items are always rated the same scores, etc.), and then predicts how much a user likes an item based on how much the user likes the item's similar items.

Academic research shows that the item-item algorithm usually works better than user-user. Amazon.com uses the item-item algorithm in its recommender system.

SlopeOne

The advantage of SlopeOne is performance. But it doesn't compute similarity scores, and I don't know many real systems use this algorithm. (Note: This algorithm will be added later.)

SVD

This algorithm worked really well in the Netflix Prize (http://www.netflixprize.com/). It is especially useful when you have sparse datasets. (Note: This algorithm will be added later.)

For Developers

In the simplest case, there are 2 steps to use Recommender API. First, use "recommender_app_register()" to register your recommender application with the system (see parameters descriptions in the sub-section). Second, use "recommender_create_command()" to issue an execution command to compute recommendations.

You can use "recommender_app_update()" to update your recommender application's parameters after the app is registered. In "hook_disable()" or "hook_uninstall()" of your module, you would use "recommender_app_unregister()" to clean up the data for your recommender application.

You would use "$app_name", a machine readable string, to identify your recommender application for the recommender_app_*() functions. You can use "recommender_app_load()" to retrieve the numeric ID used internally for the recommender applications.

To display the recommendations, the suggested approach is by using Views. You can create default Views for your module using the "export" UI command in the Views module, and then copy the exported code to "yourmodule_views_default_views()". You can also use "recommender_default_view_template()" to easily generate default views for your module. In addition, you can use "recommender_top_similarity()" and "recommender_top_prediction()" to retrieve a list of recommendations programatically.

The best way to learn how to program with Recommender API is through examples. You can read the code of "rec_example" module that comes with this module, or read the code of other recommender modules (eg, history_rec, fivestar_rec, etc). To see how to use the API functions, read the comments in recommender.module. If you need further support, please create issues in the module's issue queue.

Parameters for recommender_app_register($apps)

$apps is an array of recommender applications, where the key is a machine readable string to identify a recommender application:

$apps = array(
        $app_name1 => $app_definition1,
        $app_name2 => $app_definition2,
        ...
);

$app_definition is an array defined as follows:

$app_definition = array(
        $title => 'My Recommender App', // human readable name for the app
        $params => array(...),
);

$params is an array for your recommender application's parameters. The fields are as follows:

  • 'algorithm' (required): You can choose from the following recommender algorithms:
    • item2item: The most commonly used algorithm.
    • item2item_increment: Incremental update to a base item2item application.
    • user2user: The user-user algorithm.
    • svd: The SVD algorithm.
  • 'table' (required): This specifies the user-item preference table. For example, in history_rec, this is the {history} table that logs which users browsed which nodes. The table needs to be enclosed with {}. Alternatively, you can use a SQL statement to generate the preference table dynamically. The SQL statement should starts with "SELECT" (case sensitive). If it equals to '<BUILTIN>',the API will automatically uses records in the {recommender_preference} table with the same recommender app ID. If it is equal to '<FILE>', the API will use an external file as input.

  • 'sql' (obsolete): This is for backward compatibility. Use 'table' instead even if you use SQL statement.

  • 'fields' (required if 'table' is a db table or sql statement): This is an array that specifies the required column names for 'table' or 'sql': array(user_id, item_id, [preference], [timestamp]), where [preference] and [timestamp] can be omitted.

  • 'preference_file' (optional): You can specify the preference file if table=<FILE>. This can be overriden in async_command "string1" field.

  • 'similarity' (optional): You can select which similarity algorithm to use: 'auto' (default), 'cityblock', 'euclidean', 'loglikelihood', 'pearson', 'spearman', 'tanimoto', and 'cosine'. To see the explanation of each algorithm, go to http://goo.gl/E85TR

  • 'performance' (optional): You can choose 'database' to run the recommender application in the database (saves RAM), or 'memory' to pre-load all data into memory (improves speed), or the default 'auto' to let the program decide.

  • 'preference' (optional): Specifies whether the preference is a 'boolean' (true or false) or a 'score' (1-5 score). In history_rec where preference is whether a user browsed a node or not, this should be 'boolean'. In fivestar_rec where preference is a 1-5 score, this should be 'score'.

  • 'max_keep' (optional): Specifies the maximum number of similarity/prediction records to keep in the database for each user or item. Default is 100.

  • 'base_app_name' (required if 'algorithm' is 'item2item_increment'): Specifies the $app_name of the base item2item application upon which the incremental update is for.

  • 'staging' (optional): If sets to be 'view', then instead of loading data into {recommender_preference_staging}, we create a database view instead.

  • 'prediction_source_filter' (optional): Specifiers whether you want to limit computation of predictions to certain sources (usually users). E.g., '<10000' in history_rec means compute predictions only for users with uid<10000. Currently only support '<' operator.

  • 'prediction_score_filter' (optional): Specifiers whether you want to filter prediction scores. E.g., '>0.1' means you want to keep those predictions that has scores>0.1. Currently only support '>' operator.

  • 'similarity_score_filter' (optional): Specifiers whether you want to filter similarity scores. E.g., '>0.1' means you want to keep those similarity pairs that has scores>0.1. Currently only support '>' operator.

FAQ

Why not using REST to access Apache Mahout?

Apache Mahout provides REST access. However, this module choose not to use it for the following reasons:

  • Each recommender application (Fivestar Recommender, Browsing History Recommender, etc.) would require an independent Mahout REST instance, which involves lots of administration overhead.
  • Even though we can use the REST interface to query recommendations, Mahout still requires direct database access through its JDBCDataModel.
  • The recommender algorithms usually requires access to the entire database tables all at once. It's much more efficient using direct database access than using REST.

What happens to the mouse/cheese metaphor used in the 6.x-2.x release?

The mouse/cheese metaphor was used for two reasons. First, it's more lively than the user/item terminology. Second, from a programming perspective, users and items are usually inter-changeable. So a "mouse" can act as a user at one time but as an item at another time, same for "cheese". But Mahout adopts the user/item terminology, and it handles the user/item inter-changeability by class hierarchy. To avoid confusion in 3.x, I'm not using the mouse/cheese metaphor anymore.

Is there a cloud service alternative?

We will launch a cloud service shortly.

Where to find more documentation and support?

The HTML version of this documentation is posted at http://drupal.org/node/1207634, but will not get updated as frequently as this one. You can use rst2html to generate HTML too.

For bugs report, new features requests and all other requests, please submit issues at http://drupal.org/project/issues/recommender.

If you need customization or consulting services, please contact the author at danithaca@gmail.com.

File

README.html
View source
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.7: http://docutils.sourceforge.net/" />
<title>Recommender API Documentation</title>
<meta name="authors" content="Daniel Xiaodan Zhou (http://michiza.com/)" />
<style type="text/css">

/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 6253 2010-03-02 00:24:53Z milde $
:Copyright: This stylesheet has been placed in the public domain.

Default cascading style sheet for the HTML output of Docutils.

See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/

/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
  border: 0 }

table.borderless td, table.borderless th {
  /* Override padding for "table.docutils td" with "! important".
     The right padding separates the table cells. */
  padding: 0 0.5em 0 0 ! important }

.first {
  /* Override more specific margin styles with "! important". */
  margin-top: 0 ! important }

.last, .with-subtitle {
  margin-bottom: 0 ! important }

.hidden {
  display: none }

a.toc-backref {
  text-decoration: none ;
  color: black }

blockquote.epigraph {
  margin: 2em 5em ; }

dl.docutils dd {
  margin-bottom: 0.5em }

/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
  font-weight: bold }
*/

div.abstract {
  margin: 2em 5em }

div.abstract p.topic-title {
  font-weight: bold ;
  text-align: center }

div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
  margin: 2em ;
  border: medium outset ;
  padding: 1em }

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
  font-weight: bold ;
  font-family: sans-serif }

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title {
  color: red ;
  font-weight: bold ;
  font-family: sans-serif }

/* Uncomment (and remove this text!) to get reduced vertical space in
   compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
  margin-bottom: 0.5em }

div.compound .compound-last, div.compound .compound-middle {
  margin-top: 0.5em }
*/

div.dedication {
  margin: 2em 5em ;
  text-align: center ;
  font-style: italic }

div.dedication p.topic-title {
  font-weight: bold ;
  font-style: normal }

div.figure {
  margin-left: 2em ;
  margin-right: 2em }

div.footer, div.header {
  clear: both;
  font-size: smaller }

div.line-block {
  display: block ;
  margin-top: 1em ;
  margin-bottom: 1em }

div.line-block div.line-block {
  margin-top: 0 ;
  margin-bottom: 0 ;
  margin-left: 1.5em }

div.sidebar {
  margin: 0 0 0.5em 1em ;
  border: medium outset ;
  padding: 1em ;
  background-color: #ffffee ;
  width: 40% ;
  float: right ;
  clear: right }

div.sidebar p.rubric {
  font-family: sans-serif ;
  font-size: medium }

div.system-messages {
  margin: 5em }

div.system-messages h1 {
  color: red }

div.system-message {
  border: medium outset ;
  padding: 1em }

div.system-message p.system-message-title {
  color: red ;
  font-weight: bold }

div.topic {
  margin: 2em }

h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  margin-top: 0.4em }

h1.title {
  text-align: center }

h2.subtitle {
  text-align: center }

hr.docutils {
  width: 75% }

img.align-left, .figure.align-left, object.align-left {
  clear: left ;
  float: left ;
  margin-right: 1em }

img.align-right, .figure.align-right, object.align-right {
  clear: right ;
  float: right ;
  margin-left: 1em }

img.align-center, .figure.align-center, object.align-center {
  display: block;
  margin-left: auto;
  margin-right: auto;
}

.align-left {
  text-align: left }

.align-center {
  clear: both ;
  text-align: center }

.align-right {
  text-align: right }

/* reset inner alignment in figures */
div.align-right {
  text-align: left }

/* div.align-center * { */
/*   text-align: left } */

ol.simple, ul.simple {
  margin-bottom: 1em }

ol.arabic {
  list-style: decimal }

ol.loweralpha {
  list-style: lower-alpha }

ol.upperalpha {
  list-style: upper-alpha }

ol.lowerroman {
  list-style: lower-roman }

ol.upperroman {
  list-style: upper-roman }

p.attribution {
  text-align: right ;
  margin-left: 50% }

p.caption {
  font-style: italic }

p.credits {
  font-style: italic ;
  font-size: smaller }

p.label {
  white-space: nowrap }

p.rubric {
  font-weight: bold ;
  font-size: larger ;
  color: maroon ;
  text-align: center }

p.sidebar-title {
  font-family: sans-serif ;
  font-weight: bold ;
  font-size: larger }

p.sidebar-subtitle {
  font-family: sans-serif ;
  font-weight: bold }

p.topic-title {
  font-weight: bold }

pre.address {
  margin-bottom: 0 ;
  margin-top: 0 ;
  font: inherit }

pre.literal-block, pre.doctest-block {
  margin-left: 2em ;
  margin-right: 2em }

span.classifier {
  font-family: sans-serif ;
  font-style: oblique }

span.classifier-delimiter {
  font-family: sans-serif ;
  font-weight: bold }

span.interpreted {
  font-family: sans-serif }

span.option {
  white-space: nowrap }

span.pre {
  white-space: pre }

span.problematic {
  color: red }

span.section-subtitle {
  /* font-size relative to parent (h1..h6 element) */
  font-size: 80% }

table.citation {
  border-left: solid 1px gray;
  margin-left: 1px }

table.docinfo {
  margin: 2em 4em }

table.docutils {
  margin-top: 0.5em ;
  margin-bottom: 0.5em }

table.footnote {
  border-left: solid 1px black;
  margin-left: 1px }

table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
  padding-left: 0.5em ;
  padding-right: 0.5em ;
  vertical-align: top }

table.docutils th.field-name, table.docinfo th.docinfo-name {
  font-weight: bold ;
  text-align: left ;
  white-space: nowrap ;
  padding-left: 0 }

h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  font-size: 100% }

ul.auto-toc {
  list-style-type: none }

</style>
</head>
<body>
<div class="document" id="recommender-api-documentation">
<h1 class="title">Recommender API Documentation</h1>
<table class="docinfo" frame="void" rules="none">
<col class="docinfo-name" />
<col class="docinfo-content" />
<tbody valign="top">
<tr><th class="docinfo-name">Authors:</th>
<td>Daniel Xiaodan Zhou (http://michiza.com/)</td></tr>
<tr><th class="docinfo-name">Version:</th>
<td>For Recommender API 7.x-3.x/4.x</td></tr>
<tr class="field"><th class="docinfo-name">Note:</th><td class="field-body">HTML version is generated by docutils ('rst2html').</td>
</tr>
</tbody>
</table>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#what-is-recommender-api" id="id1">What is Recommender API?</a><ul>
<li><a class="reference internal" href="#what-s-new-in-7-x-3-x-4-x" id="id2">What's new in 7.x-3.x/4.x?</a></li>
</ul>
</li>
<li><a class="reference internal" href="#installation-configuration-execution" id="id3">Installation, Configuration &amp; Execution</a><ul>
<li><a class="reference internal" href="#how-does-recommender-api-work" id="id4">How does Recommender API work?</a></li>
<li><a class="reference internal" href="#out-of-the-box-installation" id="id5">Out-of-the-box Installation</a></li>
<li><a class="reference internal" href="#installation-generic-process" id="id6">Installation (Generic Process)</a></li>
<li><a class="reference internal" href="#configuration-execution" id="id7">Configuration &amp; Execution</a></li>
<li><a class="reference internal" href="#similarity-vs-prediction" id="id8">Similarity vs. prediction</a></li>
<li><a class="reference internal" href="#views-support" id="id9">Views support</a></li>
</ul>
</li>
<li><a class="reference internal" href="#recommender-algorithms-explained" id="id10">Recommender Algorithms Explained</a><ul>
<li><a class="reference internal" href="#user-user-vs-item-item" id="id11">User-user vs. item-item</a></li>
<li><a class="reference internal" href="#slopeone" id="id12">SlopeOne</a></li>
<li><a class="reference internal" href="#svd" id="id13">SVD</a></li>
</ul>
</li>
<li><a class="reference internal" href="#for-developers" id="id14">For Developers</a><ul>
<li><a class="reference internal" href="#parameters-for-recommender-app-register-apps" id="id15">Parameters for recommender_app_register($apps)</a></li>
</ul>
</li>
<li><a class="reference internal" href="#faq" id="id16">FAQ</a><ul>
<li><a class="reference internal" href="#why-not-using-rest-to-access-apache-mahout" id="id17">Why not using REST to access Apache Mahout?</a></li>
<li><a class="reference internal" href="#what-happens-to-the-mouse-cheese-metaphor-used-in-the-6-x-2-x-release" id="id18">What happens to the mouse/cheese metaphor used in the 6.x-2.x release?</a></li>
<li><a class="reference internal" href="#is-there-a-cloud-service-alternative" id="id19">Is there a cloud service alternative?</a></li>
<li><a class="reference internal" href="#where-to-find-more-documentation-and-support" id="id20">Where to find more documentation and support?</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="what-is-recommender-api">
<h1><a class="toc-backref" href="#id1">What is Recommender API?</a></h1>
<p>Recommender API provides easy-to-understand, easy-to-use, fully-documented APIs to write Drupal content recommendation modules (e.g., Fivestar Recommender, Ubercart Product Recommender, etc.). It also provides a unified approach to configure &amp; execute the recommender algorithms, and to display results to end users.</p>
<div class="section" id="what-s-new-in-7-x-3-x-4-x">
<h2><a class="toc-backref" href="#id2">What's new in 7.x-3.x/4.x?</a></h2>
<p>The 3.x release completely re-writes all API code, and is not compatible with earlier releases. Major changes and features are:</p>
<blockquote>
<ul class="simple">
<li>All complex computation is now done under Apache Mahout (<a class="reference external" href="http://mahout.apache.org/">http://mahout.apache.org/</a>), which is much faster than the original PHP implementation. See <a class="reference external" href="http://drupal.org/node/1180000">http://drupal.org/node/1180000</a> for more details.</li>
<li>All end user display is now through customizable Views rather than hard coded blocks. See <a class="reference external" href="http://drupal.org/node/673786">http://drupal.org/node/673786</a> for more details.</li>
<li>You can isolate the resource intensive recommender computation from the Drupal production site.</li>
<li>It now supports more algorithms (e.g. SVD) provided by Apache Mahout.</li>
</ul>
</blockquote>
<p>The 4.x release is the same as 3.x release, except that it uses Async Command 2.x release.</p>
</div>
</div>
<div class="section" id="installation-configuration-execution">
<h1><a class="toc-backref" href="#id3">Installation, Configuration &amp; Execution</a></h1>
<div class="section" id="how-does-recommender-api-work">
<h2><a class="toc-backref" href="#id4">How does Recommender API work?</a></h2>
<p>Conceptually, you need 2 computers to run Recommender API: the Drupal server for your Drupal site, and the recommender server to compute recommendations. Of course, you can run the recommender server on the same Drupal server if you like, but note that the recommender computation could easily consume all resources. On the Drupal server, you simply issue a command to run recommenders. On the recommender server, you would run the real recommender program (written in Java), which takes the command, computes results, and saves the results back to the Drupal server database for display.</p>
</div>
<div class="section" id="out-of-the-box-installation">
<h2><a class="toc-backref" href="#id5">Out-of-the-box Installation</a></h2>
<p>This simplified installation process assumes you have Recommender API installed under &quot;sites/all/modules/recommender&quot; on your Drupal server. If you want to offload recommendations computations to a separate recommender server, please follow the generic installation process in the next section.</p>
<p><strong>Step 1</strong>. Install the Recommender API module to &quot;sites/all/modules/recommender&quot;.</p>
<p><strong>Step 2</strong>. Download Apache Mahout v.0.6+ from <a class="reference external" href="http://mahout.apache.org/">http://mahout.apache.org/</a>, and extract it to &quot;sites/all/libraries/mahout&quot;. Make sure you have Java 6 or greater installed.</p>
<p><strong>Step 3</strong>. Install the Async Command module (<a class="reference external" href="http://drupal.org/project/async_command">http://drupal.org/project/async_command</a>) to &quot;sites/all/modules/async_command&quot;.</p>
<p><strong>Step 4</strong>. Install any helper modules to your Drupal server, such as Browsing History Recommender, Fivestar Recommender, Ubercart Product Recommender, etc.</p>
<p><strong>Step 5</strong>. Periodically execute &quot;run.sh&quot; (Linux/Mac/Unix) or &quot;run.bat&quot; (Windows) under &quot;sites/all/modules/recommender&quot; to compute recommendations. You might need to edit &quot;run.sh&quot; or &quot;run.bat&quot; and change the DRUPAL_HOME parameter in case of errors.</p>
</div>
<div class="section" id="installation-generic-process">
<h2><a class="toc-backref" href="#id6">Installation (Generic Process)</a></h2>
<p>(You can skip this section if you use the out-of-the-box installation process.)</p>
<p><strong>Step 1</strong>. Download Apache Mahout v.0.6+ from <a class="reference external" href="http://mahout.apache.org/">http://mahout.apache.org/</a>, and extract it to any directory on the recommender server. Make sure you have Java 6 or greater installed.</p>
<p><strong>Step 2</strong>. Install the Async Command module (<a class="reference external" href="http://drupal.org/project/async_command">http://drupal.org/project/async_command</a>) to your Drupal server under sites/all/modules/async_command. Copy the 'lib' sub-directory and async-command.jar to any folder on the recommender server.</p>
<p><strong>Step 3</strong>. Install the Recommender API module (<a class="reference external" href="http://drupal.org/project/recommender">http://drupal.org/project/recommender</a>) to your Drupal server under sites/all/modules/recommender. Copy the 'recommender.jar' file to any folder on the recommender server.</p>
<p><strong>Step 4</strong>. Install any helper modules to your Drupal server, such as Browsing History Recommender, Fivestar Recommender, Ubercart Product Recommender, etc.</p>
<p><strong>Step 5</strong>. On your recommender server, create the 'config.properties' file similar to 'config.properties.example'. This file specifies how the recommender program (&quot;recommender.jar&quot;) access your Drupal database. The  access doesn't need full DB privileges. The minimum privileges are:</p>
<pre class="literal-block">
SELECT/UPDATE on {async_command}, {recommender_app}
SELECT on {node} (nid, created), {users} (uid, created), or other tables (see documentation of relevant helper modules)
SELECT/INSERT/UPDATE/DELETE on {recommender_similarity}, {recommender_prediction}, {recommender_preference_staging}
</pre>
<p><strong>Step 6</strong>. Copy 'run.sh' (Linux/Mac/Unix) or 'run.bat' (Windows) to any directory on the recommender server, and manually set the parameters to point to the correct directories.</p>
<p><strong>Step 7</strong>. Periodically execute 'run.sh' or 'run.bat' on your recommender server to compute recommendations.</p>
</div>
<div class="section" id="configuration-execution">
<h2><a class="toc-backref" href="#id7">Configuration &amp; Execution</a></h2>
<p>On your Drupal server, you can go to admin-&gt;configure-&gt;recommender to run various recommenders. Note that this doesn't do any computation, but merely issues the commands in the {async_command} queue to be executed on your recommender server. Remember to set &quot;administer recommender&quot; permission too.</p>
<p>On your recommender server, you need to periodically execute 'run.sh' or 'run.bat' to actually compute the recommendations. If you use Linux, you might want to run 'run.sh' as a cron job on your recommender server. This is different from the cron settings on the Drupal server, which just issues the commands. The cron settings could be like this (run every 30 minutes):</p>
<pre class="literal-block">
# in crontab -e, add the following line.
*/30 * * * * flock -n /tmp/recommender.lock recommender/run.sh &gt;&gt; /tmp/recommender.log 2&gt;&amp;1
</pre>
</div>
<div class="section" id="similarity-vs-prediction">
<h2><a class="toc-backref" href="#id8">Similarity vs. prediction</a></h2>
<p>Recommender API offers two types of recommendations, similarity-based and prediction-based, although different algorithms might implement both or either (e.g. SlopeOne algorithm only has prediction-based recommendations).</p>
<p>One type of recommendations is based on the similarity among nodes (or users, or other types of entities). For example, if you are viewing a node, it will recommend other similar nodes. The recommended nodes are the same for this particular node regardless of which user is viewing it. The similarity scores are computed based on the fact that, for example, if two nodes are usually viewed together, or two products usually purchased together, then the two nodes/products are similar. The helper modules actually define what information to use to compute the similarity scores. The similarity scores range from -1 (completed dissimilar) to +1 (completely similar), and are directional: A is similar to B doesn't mean B is similar to A.</p>
<p>The other type of recommendations is based on the &quot;prediction scores&quot;, which predict how much a user would like a node. The recommendations are personalized: different users would see different recommendations. But for each user, she would see the same recommendations regardless which page she is viewing. The prediction scores are computed based on the user's personal history. For example, if a user purchased products A and B, she might be interested in purchasing C which is similar to A and B. Exactly what &quot;personal history&quot; to use is defined by the helper modules. (A side note: If you treat users as nodes and nodes as users, you can then predict how much a node would &quot;like&quot; a user. This is useful when you want to promote a node to the most interested users.)</p>
<p>You need to understand the distinction between similarity and prediction in order to work with Views.</p>
</div>
<div class="section" id="views-support">
<h2><a class="toc-backref" href="#id9">Views support</a></h2>
<p>Recommender API supports Views 3, which is the preferred way to display recommendations. Most helper modules would create default Views, and you can just customize those.</p>
<p>However, if you do want to create your own recommender views, here's how:</p>
<p><strong>Step 1</strong>: Choose the views base table, either Recommender Similarity or Recommender Prediction, depending on which type of recommendations you would show.</p>
<p><strong>Step 2</strong>: In &quot;filter criteria&quot;, you need to select which recommender application to provide the recommendations. Usually you just need the &quot;Application ID&quot; filter (if you are a helper module developer, please use &quot;Application Name&quot; filter). Use other filters if you want.</p>
<p><strong>Step 3</strong>: In &quot;relationships&quot;, add a new &quot;Entity ID (Target)&quot; relationship. In the next page, select the entity type of the recommended items. For example if your recommendations are nodes, then use &quot;Content&quot;. Also check the &quot;Require this relationship&quot;.</p>
<p><strong>Step 4</strong>: In &quot;contextual filters&quot; (a.k.a. &quot;Arguments&quot;), add &quot;Entity ID (Source)&quot;. This is where the recommendations are made for. If your recommendations are made for the current user, then here is the UserID of the current user. Usually you want provide a default value of either the current node or the current user.</p>
<p><strong>Step 5</strong>: Add &quot;fields&quot;, &quot;sort criteria&quot;, or make other Views settings as you see fit. When you sort by similarity scores or prediction scores, choose &quot;descending&quot;.</p>
</div>
</div>
<div class="section" id="recommender-algorithms-explained">
<h1><a class="toc-backref" href="#id10">Recommender Algorithms Explained</a></h1>
<div class="section" id="user-user-vs-item-item">
<h2><a class="toc-backref" href="#id11">User-user vs. item-item</a></h2>
<p>The two most popular recommender algorithms are user-user and item-item. The user-user algorithm first computes similarities among users based on the users history records (such as purchasing history, nodes browsing history, etc), and then predicts how much a user likes an item based on how much the user's similar users like the item. The item-item algorithm first computes similarities among items based on some information (e.g., the items are always purchased together, the items are always rated the same scores, etc.), and then predicts how much a user likes an item based on how much the user likes the item's similar items.</p>
<p>Academic research shows that the item-item algorithm usually works better than user-user. Amazon.com uses the item-item algorithm in its recommender system.</p>
</div>
<div class="section" id="slopeone">
<h2><a class="toc-backref" href="#id12">SlopeOne</a></h2>
<p>The advantage of SlopeOne is performance. But it doesn't compute similarity scores, and I don't know many real systems use this algorithm. (Note: This algorithm will be added later.)</p>
</div>
<div class="section" id="svd">
<h2><a class="toc-backref" href="#id13">SVD</a></h2>
<p>This algorithm worked really well in the Netflix Prize (<a class="reference external" href="http://www.netflixprize.com/">http://www.netflixprize.com/</a>). It is especially useful when you have sparse datasets. (Note: This algorithm will be added later.)</p>
</div>
</div>
<div class="section" id="for-developers">
<h1><a class="toc-backref" href="#id14">For Developers</a></h1>
<p>In the simplest case, there are 2 steps to use Recommender API. First, use &quot;recommender_app_register()&quot; to register your recommender application with the system (see parameters descriptions in the sub-section). Second, use &quot;recommender_create_command()&quot; to issue an execution command to compute recommendations.</p>
<p>You can use &quot;recommender_app_update()&quot; to update your recommender application's parameters after the app is registered. In &quot;hook_disable()&quot; or &quot;hook_uninstall()&quot; of your module, you would use &quot;recommender_app_unregister()&quot; to clean up the data for your recommender application.</p>
<p>You would use &quot;$app_name&quot;, a machine readable string, to identify your recommender application for the recommender_app_*() functions. You can use &quot;recommender_app_load()&quot; to retrieve the numeric ID used internally for the recommender applications.</p>
<p>To display the recommendations, the suggested approach is by using Views. You can create default Views for your module using the &quot;export&quot; UI command in the Views module, and then copy the exported code to &quot;yourmodule_views_default_views()&quot;. You can also use &quot;recommender_default_view_template()&quot; to easily generate default views for your module. In addition, you can use &quot;recommender_top_similarity()&quot; and &quot;recommender_top_prediction()&quot; to retrieve a list of recommendations programatically.</p>
<p>The best way to learn how to program with Recommender API is through examples. You can read the code of &quot;rec_example&quot; module that comes with this module, or read the code of other recommender modules (eg, history_rec, fivestar_rec, etc). To see how to use the API functions, read the comments in recommender.module. If you need further support, please create issues in the module's issue queue.</p>
<div class="section" id="parameters-for-recommender-app-register-apps">
<h2><a class="toc-backref" href="#id15">Parameters for recommender_app_register($apps)</a></h2>
<p>$apps is an array of recommender applications, where the key is a machine readable string to identify a recommender application:</p>
<pre class="literal-block">
$apps = array(
        $app_name1 =&gt; $app_definition1,
        $app_name2 =&gt; $app_definition2,
        ...
);
</pre>
<p>$app_definition is an array defined as follows:</p>
<pre class="literal-block">
$app_definition = array(
        $title =&gt; 'My Recommender App', // human readable name for the app
        $params =&gt; array(...),
);
</pre>
<p>$params is an array for your recommender application's parameters. The fields are as follows:</p>
<ul>
<li><dl class="first docutils">
<dt>'algorithm' (required): You can choose from the following recommender algorithms:</dt>
<dd><ul class="first last simple">
<li>item2item: The most commonly used algorithm.</li>
<li>item2item_increment: Incremental update to a base item2item application.</li>
<li>user2user: The user-user algorithm.</li>
<li>svd: The SVD algorithm.</li>
</ul>
</dd>
</dl>
</li>
<li><p class="first">'table' (required): This specifies the user-item preference table. For example, in history_rec, this is the {history} table that logs which users browsed which nodes. The table needs to be enclosed with {}. Alternatively, you can use a SQL statement to generate the preference table dynamically. The SQL statement should starts with &quot;SELECT&quot; (case sensitive). If it equals to '&lt;BUILTIN&gt;',the API will automatically uses records in the {recommender_preference} table with the same recommender app ID. If it is equal to '&lt;FILE&gt;', the API will use an external file as input.</p>
</li>
<li><p class="first">'sql' (obsolete): This is for backward compatibility. Use 'table' instead even if you use SQL statement.</p>
</li>
<li><p class="first">'fields' (required if 'table' is a db table or sql statement): This is an array that specifies the required column names for 'table' or 'sql': array(user_id, item_id, [preference], [timestamp]), where [preference] and [timestamp] can be omitted.</p>
</li>
<li><p class="first">'preference_file' (optional): You can specify the preference file if table=&lt;FILE&gt;. This can be overriden in async_command &quot;string1&quot; field.</p>
</li>
<li><p class="first">'similarity' (optional): You can select which similarity algorithm to use: 'auto' (default), 'cityblock', 'euclidean', 'loglikelihood', 'pearson', 'spearman', 'tanimoto', and 'cosine'. To see the explanation of each algorithm, go to <a class="reference external" href="http://goo.gl/E85TR">http://goo.gl/E85TR</a></p>
</li>
<li><p class="first">'performance' (optional): You can choose 'database' to run the recommender application in the database (saves RAM), or 'memory' to pre-load all data into memory (improves speed), or the default 'auto' to let the program decide.</p>
</li>
<li><p class="first">'preference' (optional): Specifies whether the preference is a 'boolean' (true or false) or a 'score' (1-5 score). In history_rec where preference is whether a user browsed a node or not, this should be 'boolean'. In fivestar_rec where preference is a 1-5 score, this should be 'score'.</p>
</li>
<li><p class="first">'max_keep' (optional): Specifies the maximum number of similarity/prediction records to keep in the database for each user or item. Default is 100.</p>
</li>
<li><p class="first">'base_app_name' (required if 'algorithm' is 'item2item_increment'): Specifies the $app_name of the base item2item application upon which the incremental update is for.</p>
</li>
<li><p class="first">'staging' (optional): If sets to be 'view', then instead of loading data into {recommender_preference_staging}, we create a database view instead.</p>
</li>
<li><p class="first">'prediction_source_filter' (optional): Specifiers whether you want to limit computation of predictions to certain sources (usually users). E.g., '&lt;10000' in history_rec means compute predictions only for users with uid&lt;10000. Currently only support '&lt;' operator.</p>
</li>
<li><p class="first">'prediction_score_filter' (optional): Specifiers whether you want to filter prediction scores. E.g., '&gt;0.1' means you want to keep those predictions that has scores&gt;0.1. Currently only support '&gt;' operator.</p>
</li>
<li><p class="first">'similarity_score_filter' (optional): Specifiers whether you want to filter similarity scores. E.g., '&gt;0.1' means you want to keep those similarity pairs that has scores&gt;0.1. Currently only support '&gt;' operator.</p>
</li>
</ul>
</div>
</div>
<div class="section" id="faq">
<h1><a class="toc-backref" href="#id16">FAQ</a></h1>
<div class="section" id="why-not-using-rest-to-access-apache-mahout">
<h2><a class="toc-backref" href="#id17">Why not using REST to access Apache Mahout?</a></h2>
<p>Apache Mahout provides REST access. However, this module choose not to use it for the following reasons:</p>
<ul class="simple">
<li>Each recommender application (Fivestar Recommender, Browsing History Recommender, etc.) would require an independent Mahout REST instance, which involves lots of administration overhead.</li>
<li>Even though we can use the REST interface to query recommendations, Mahout still requires direct database access through its JDBCDataModel.</li>
<li>The recommender algorithms usually requires access to the entire database tables all at once. It's much more efficient using direct database access than using REST.</li>
</ul>
</div>
<div class="section" id="what-happens-to-the-mouse-cheese-metaphor-used-in-the-6-x-2-x-release">
<h2><a class="toc-backref" href="#id18">What happens to the mouse/cheese metaphor used in the 6.x-2.x release?</a></h2>
<p>The mouse/cheese metaphor was used for two reasons. First, it's more lively than the user/item terminology. Second, from a programming perspective, users and items are usually inter-changeable. So a &quot;mouse&quot; can act as a user at one time but as an item at another time, same for &quot;cheese&quot;. But Mahout adopts the user/item terminology, and it handles the user/item inter-changeability by class hierarchy. To avoid confusion in 3.x, I'm not using the mouse/cheese metaphor anymore.</p>
</div>
<div class="section" id="is-there-a-cloud-service-alternative">
<h2><a class="toc-backref" href="#id19">Is there a cloud service alternative?</a></h2>
<p>We will launch a cloud service shortly.</p>
</div>
<div class="section" id="where-to-find-more-documentation-and-support">
<h2><a class="toc-backref" href="#id20">Where to find more documentation and support?</a></h2>
<p>The HTML version of this documentation is posted at <a class="reference external" href="http://drupal.org/node/1207634">http://drupal.org/node/1207634</a>, but will not get updated as frequently as this one. You can use rst2html to generate HTML too.</p>
<p>For bugs report, new features requests and all other requests, please submit issues at <a class="reference external" href="http://drupal.org/project/issues/recommender">http://drupal.org/project/issues/recommender</a>.</p>
<p>If you need customization or consulting services, please contact the author at <a class="reference external" href="mailto:danithaca&#64;gmail.com">danithaca&#64;gmail.com</a>.</p>
</div>
</div>
</div>
</body>
</html>