You are here

README.txt in Feed Import 7

Same filename and directory in other branches
  1. 8 README.txt
  2. 7.3 README.txt
  3. 7.2 README.txt
FEED IMPORT

Project page: http://drupal.org/sandbox/SorinSarca/1331632

------------------------------
Features
------------------------------

  -easy to use interface
  -alternative xpaths support and default value
  -ignore field & skip item import
  -multi value fields support
  -pre-filters & filters
  -some usefull provided filters
  -auto-import/delete at cron
  -import/export feed configuration
  -reports
  -add taxonomy terms to field (can add new terms)

------------------------------
About Feed Import
------------------------------

Feed Import module allows you to import content from XML files into entities
like (node, user, ...) using XPATH to fetch whatever you need.
You can create a new feed using php code in you module or you can use the
provided UI (recommended). If you have regular imports you can enable import
to run at cron. Now Feed Import provides two methods to process XML file:
    Normal  - loads the xml file with simplexml_load_file() and parses
              it's content. This method isn't good for huge files because
              needs very much memory.
    Chunked - gets chunks from xml file and recompose each item. This is a good
              method to import huge xml files.

------------------------------
How Feed Import works
------------------------------

Step 1: Downloading xml file and creating items

  -if we selected processFeedNormal function for processing this feed then all
   xml file is loaded. We apply parent xpath, we create entity objects and we
   should have all items in an array.
  -if we selected processFeedChunked function for processing then xml file is
   read in chunks. When we have an item we create the SimpleXMLElement object
   and we create entity object. We delete from memory content read so far and we
   repeat process until all xml content is processed.
  -if we selected another process function then we should take a look at that
   function

Step 2: Creating entities

Well this step is contained in Step 1 to create entity objects from
SimpleXMLElement objects using feed info:
We generate an unique hash for item using unique xpath from feed. Then for each
field in feed we apply xpaths until one xpath passes pre-filter. If there is an
xpath that passed we take the value and filter it. If filtered value is empty
(or isn't a value) we use default action/value. In this mode we can have
alternative xpaths. Example:

<Friends>
  <Friend type="bestfriend">Jerry</Friend>
  <Friend type="normal">Tom</Friend>
</Friends>

Here we can use the following xpaths to take friend name:
Friends/Friend[@type="bestfriend"]
Friends/Friend[@type="normal"]

If bestfriend is missing then we go to normal friend. If normal friend is
missing too, we can specify a default value like "Forever alone".

Step 3: Saving/Updating entities

First we get the IDs of generated hashes to see if we need to create a new
entity or just to update it.
For each object filled with data earlier we check the hash:
  -if hash is in IDs list then we check if entity data changed to see if we have
   to save changes or just to update the expire time.
  -if hash isn't in list then we create a new entity and hash needs to be
   inserted in database.

Feed Import can add multiple values to fields which support this. For example
above we need only one xpath:
Friends/Friend
and both Tom and Jerry will be inserted in field values, which is great.

Expire time is used to automatically delete entities (at cron) if they are
missing from feed for more than X seconds.
Expire time is updated for each item in feed. For performance reasons we make a
query for X items at once to update or insert.

------------------------------
Using Feed Import UI
------------------------------

First, navigate to admin/config/services/feed_import. You can change global settings
using "Settings" link. To add a new feed click "Add new feed" link and fill the
form with desired data. After you saved feed click "Edit" link from operations
column. Now at the bottom is a fieldset with XPATH settings. Add XPATH for
required item parent and unique id (you can now save feed). To add a new field
choose one from "Add new field" select and click "Add selected field" button.
A fieldset with field settings appeared and you can enter xpath(s) and default
action/value. If you wish you can add another field and when you are done click
"Save feed" submit button.
Check if all fields are ok. If you want to (pre)filter values select
"Edit (pre)filter" tab. You can see a fieldset for each selected field. Click
"Add new filter" button for desired field to add a new filter. Enter unique
filter name per field (this can be anything that let you quickly identify
filter), enter function name (any php function, even static functions
ClassName::functionName) and enter parameters for function, one per line.
To send field value as parameter enter [field] in line. There are some static
filter functions in feed_import_filter.inc.php file >> class FeedImportFilter
that you can use. Please take a look. I'll add more soon.
If you want to change [field] with somenthing else go to Settings.
You can add/remove any filters you want but don't forget to click "Save filters"
submit button to save all.
Now you can enable feed and test it.

------------------------------
Feed Import API
------------------------------

If you want, you can use your own function to parse content. To do that you have
to implement hook_feed_import_process_info() which returns an array keyed by
function alias and with value of function name. If function is a static member
of a class then value is an array containing class name and function name.
Please note that in process function EVERY possible exception MUST BE CAUGHT!
Example:

function hook_feed_import_process_info() {
  return array(
    'processFeedSuperFast' => 'php_process_function_name',
    'processFeedByMyClass' => array('MyClassName', 'myProcessFunction'),
    // Other functions ...
  );
}

Every function is called with a parameter containing feed info and must return
an array of objects (stdClass). For example above we will have:

function php_process_function_name(array $feed) {
  $items = array();
  // ...
  // Here process feed items
  // ...
  return $items;
}

For the static function:

class MyClassName {
  // Class stuff
  // ...

  public static function myProcessFunction(array $feed) {
    $items = array();
    // ...
    // Here process feed items
    // ...
    return $items;
  }

  // Other class stuff
}

Concrete example (we assume that the module name is test_module):

/**
 * Implements hook_feed_import_process_info().
 */
function test_module_feed_import_process_info() {
  return array(
    'Test module process function' => 'test_module_process_function',
  );
}


/**
 * This function simulates FeedImport::processFeedNormal function
 *
 * @param array
 *   An array containing feed info
 *
 * @return array
 *   An array containing objects
 */
function test_module_process_function(array $feed) {
  // Every possible warning or error must be caught!!!
  // Load xml file from url
  try {
    $xml = simplexml_load_file($feed['url'], FeedImport::$simpleXMLElement,
                                LIBXML_NOCDATA);
  }
  catch (Exception $e) {
    // Error in xml file
    return NULL;
  }
  // If there is no SimpleXMLElement object
  if (!($xml instanceof FeedImport::$simpleXMLElement)) {
    return NULL;
  }
  // Now we are sure that $xml is an SimpleXMLElement object
  // Get items from root
  $xml = $xml->xpath($feed['xpath']['#root']);
  // Get total number of items
  $count_items = count($xml);

  // Check if there are items
  if (!$count_items) {
    return NULL;
  }

  // Check feed items
  foreach ($xml as &$item) {
    // Set this item value to entity, so all entities will be in $xml at end
    // You must use FeedImport::createEntity to get an object which will turn
    // into an entity at the end of import process
    $item = FeedImport::createEntity($feed, $item);
  }
  // Return created entities
  return $xml;
}

Now you can go to edit your feed and select for processing your new function.

------------------------------
Feed info structure
------------------------------

Feed info is an array containing all info about feeds: name, url, xpath keyed
by feed name.
A feed is an array containing the following keys:

name => This is feed name

id => This is feed unique id

enabled => Shows if feed is enabled or not. Enabled feeds are processed at cron
           if import at cron option is activated from settings page.

url => URL to xml file. To avoid problems use an absolute url.

time => This contains feed items lifetime. If 0 then items are kept forever else
        items will be deleted after this time is elapse and they don't exist
        in xml file anymore. On each import existing items will be rescheduled.

entity_info => This is an array containing two elements

  #entity => Entity name like node, user, ...

  #table_pk => This is entity's table index. For node is nid, for
               user si uid, ...


xpath => This is an array containing xpath info and fields

  #root => This is XPATH to parent item. Every xpath query will run in
           this context.

  #uniq => This is XPATH (relative to #root xpath) to a unique value
           which identify the item. Resulted value is used to create a
           hash for item so this must be unique per item.

  #process_function => This is function alias used to process xml file.
                       See documentation above about process functions.

  #items => This is an array containing xpath for fields and filters keyed by
            field name.
    [field_name] => An array containing info about field, xpath, filters

      #field => This is field name

      #column => This is column in field table. For body is value, for taxonomy
                 is tid and so on. If this field is a column in entity field
                 then this must be NULL.

      #xpath => This is an array containig xpaths for this field. Xpaths are
                used from first to last until one passes pre-filter functions.
                All xpaths are relative to #root.

      #default_value => This is default value for field if none of xpaths passes
                        pre-filter functions. This is used only for
                        default_value and default_value_filtered actions.

      #default_action => Can be one of (see FeedImport::getDefaultActions()):
          default_value           -field will have this value
          default_value_filtered  -field will have this value after was filtered
          ignore_field            -field will have no value
          skip_item               -item will not be imported

      #filter => An array containing filters info keyed by filter name
        [filter_name] => An array containing filter function and params
          #function => This is function name. Can also be a public static
                       function from a class with value ClassName::functionName
          #params => An array of parameters which #function recives. You can use
                     [field] (this value can be changed from settings page) to
                     send current field value as parameter.

      #pre_filter => Same as filter, but these functions are used to pre-filter
                     values to see if we have to choose an alternative xpath.


To see a real feed array, first create some feeds using UI and then you can use
code below to print its structure:
$feeds = FeedImport::loadFeeds();
drupal_set_message('<pre>' . print_r($feeds, TRUE) . '</pre>');

------------------------------
Real example
------------------------------

Please check project page for an example.
http://drupal.org/sandbox/SorinSarca/1331632

File

README.txt
View source
  1. FEED IMPORT
  2. Project page: http://drupal.org/sandbox/SorinSarca/1331632
  3. ------------------------------
  4. Features
  5. ------------------------------
  6. -easy to use interface
  7. -alternative xpaths support and default value
  8. -ignore field & skip item import
  9. -multi value fields support
  10. -pre-filters & filters
  11. -some usefull provided filters
  12. -auto-import/delete at cron
  13. -import/export feed configuration
  14. -reports
  15. -add taxonomy terms to field (can add new terms)
  16. ------------------------------
  17. About Feed Import
  18. ------------------------------
  19. Feed Import module allows you to import content from XML files into entities
  20. like (node, user, ...) using XPATH to fetch whatever you need.
  21. You can create a new feed using php code in you module or you can use the
  22. provided UI (recommended). If you have regular imports you can enable import
  23. to run at cron. Now Feed Import provides two methods to process XML file:
  24. Normal - loads the xml file with simplexml_load_file() and parses
  25. it's content. This method isn't good for huge files because
  26. needs very much memory.
  27. Chunked - gets chunks from xml file and recompose each item. This is a good
  28. method to import huge xml files.
  29. ------------------------------
  30. How Feed Import works
  31. ------------------------------
  32. Step 1: Downloading xml file and creating items
  33. -if we selected processFeedNormal function for processing this feed then all
  34. xml file is loaded. We apply parent xpath, we create entity objects and we
  35. should have all items in an array.
  36. -if we selected processFeedChunked function for processing then xml file is
  37. read in chunks. When we have an item we create the SimpleXMLElement object
  38. and we create entity object. We delete from memory content read so far and we
  39. repeat process until all xml content is processed.
  40. -if we selected another process function then we should take a look at that
  41. function
  42. Step 2: Creating entities
  43. Well this step is contained in Step 1 to create entity objects from
  44. SimpleXMLElement objects using feed info:
  45. We generate an unique hash for item using unique xpath from feed. Then for each
  46. field in feed we apply xpaths until one xpath passes pre-filter. If there is an
  47. xpath that passed we take the value and filter it. If filtered value is empty
  48. (or isn't a value) we use default action/value. In this mode we can have
  49. alternative xpaths. Example:
  50. Jerry
  51. Tom
  52. Here we can use the following xpaths to take friend name:
  53. Friends/Friend[@type="bestfriend"]
  54. Friends/Friend[@type="normal"]
  55. If bestfriend is missing then we go to normal friend. If normal friend is
  56. missing too, we can specify a default value like "Forever alone".
  57. Step 3: Saving/Updating entities
  58. First we get the IDs of generated hashes to see if we need to create a new
  59. entity or just to update it.
  60. For each object filled with data earlier we check the hash:
  61. -if hash is in IDs list then we check if entity data changed to see if we have
  62. to save changes or just to update the expire time.
  63. -if hash isn't in list then we create a new entity and hash needs to be
  64. inserted in database.
  65. Feed Import can add multiple values to fields which support this. For example
  66. above we need only one xpath:
  67. Friends/Friend
  68. and both Tom and Jerry will be inserted in field values, which is great.
  69. Expire time is used to automatically delete entities (at cron) if they are
  70. missing from feed for more than X seconds.
  71. Expire time is updated for each item in feed. For performance reasons we make a
  72. query for X items at once to update or insert.
  73. ------------------------------
  74. Using Feed Import UI
  75. ------------------------------
  76. First, navigate to admin/config/services/feed_import. You can change global settings
  77. using "Settings" link. To add a new feed click "Add new feed" link and fill the
  78. form with desired data. After you saved feed click "Edit" link from operations
  79. column. Now at the bottom is a fieldset with XPATH settings. Add XPATH for
  80. required item parent and unique id (you can now save feed). To add a new field
  81. choose one from "Add new field" select and click "Add selected field" button.
  82. A fieldset with field settings appeared and you can enter xpath(s) and default
  83. action/value. If you wish you can add another field and when you are done click
  84. "Save feed" submit button.
  85. Check if all fields are ok. If you want to (pre)filter values select
  86. "Edit (pre)filter" tab. You can see a fieldset for each selected field. Click
  87. "Add new filter" button for desired field to add a new filter. Enter unique
  88. filter name per field (this can be anything that let you quickly identify
  89. filter), enter function name (any php function, even static functions
  90. ClassName::functionName) and enter parameters for function, one per line.
  91. To send field value as parameter enter [field] in line. There are some static
  92. filter functions in feed_import_filter.inc.php file >> class FeedImportFilter
  93. that you can use. Please take a look. I'll add more soon.
  94. If you want to change [field] with somenthing else go to Settings.
  95. You can add/remove any filters you want but don't forget to click "Save filters"
  96. submit button to save all.
  97. Now you can enable feed and test it.
  98. ------------------------------
  99. Feed Import API
  100. ------------------------------
  101. If you want, you can use your own function to parse content. To do that you have
  102. to implement hook_feed_import_process_info() which returns an array keyed by
  103. function alias and with value of function name. If function is a static member
  104. of a class then value is an array containing class name and function name.
  105. Please note that in process function EVERY possible exception MUST BE CAUGHT!
  106. Example:
  107. function hook_feed_import_process_info() {
  108. return array(
  109. 'processFeedSuperFast' => 'php_process_function_name',
  110. 'processFeedByMyClass' => array('MyClassName', 'myProcessFunction'),
  111. // Other functions ...
  112. );
  113. }
  114. Every function is called with a parameter containing feed info and must return
  115. an array of objects (stdClass). For example above we will have:
  116. function php_process_function_name(array $feed) {
  117. $items = array();
  118. // ...
  119. // Here process feed items
  120. // ...
  121. return $items;
  122. }
  123. For the static function:
  124. class MyClassName {
  125. // Class stuff
  126. // ...
  127. public static function myProcessFunction(array $feed) {
  128. $items = array();
  129. // ...
  130. // Here process feed items
  131. // ...
  132. return $items;
  133. }
  134. // Other class stuff
  135. }
  136. Concrete example (we assume that the module name is test_module):
  137. /**
  138. * Implements hook_feed_import_process_info().
  139. */
  140. function test_module_feed_import_process_info() {
  141. return array(
  142. 'Test module process function' => 'test_module_process_function',
  143. );
  144. }
  145. /**
  146. * This function simulates FeedImport::processFeedNormal function
  147. *
  148. * @param array
  149. * An array containing feed info
  150. *
  151. * @return array
  152. * An array containing objects
  153. */
  154. function test_module_process_function(array $feed) {
  155. // Every possible warning or error must be caught!!!
  156. // Load xml file from url
  157. try {
  158. $xml = simplexml_load_file($feed['url'], FeedImport::$simpleXMLElement,
  159. LIBXML_NOCDATA);
  160. }
  161. catch (Exception $e) {
  162. // Error in xml file
  163. return NULL;
  164. }
  165. // If there is no SimpleXMLElement object
  166. if (!($xml instanceof FeedImport::$simpleXMLElement)) {
  167. return NULL;
  168. }
  169. // Now we are sure that $xml is an SimpleXMLElement object
  170. // Get items from root
  171. $xml = $xml->xpath($feed['xpath']['#root']);
  172. // Get total number of items
  173. $count_items = count($xml);
  174. // Check if there are items
  175. if (!$count_items) {
  176. return NULL;
  177. }
  178. // Check feed items
  179. foreach ($xml as &$item) {
  180. // Set this item value to entity, so all entities will be in $xml at end
  181. // You must use FeedImport::createEntity to get an object which will turn
  182. // into an entity at the end of import process
  183. $item = FeedImport::createEntity($feed, $item);
  184. }
  185. // Return created entities
  186. return $xml;
  187. }
  188. Now you can go to edit your feed and select for processing your new function.
  189. ------------------------------
  190. Feed info structure
  191. ------------------------------
  192. Feed info is an array containing all info about feeds: name, url, xpath keyed
  193. by feed name.
  194. A feed is an array containing the following keys:
  195. name => This is feed name
  196. id => This is feed unique id
  197. enabled => Shows if feed is enabled or not. Enabled feeds are processed at cron
  198. if import at cron option is activated from settings page.
  199. url => URL to xml file. To avoid problems use an absolute url.
  200. time => This contains feed items lifetime. If 0 then items are kept forever else
  201. items will be deleted after this time is elapse and they don't exist
  202. in xml file anymore. On each import existing items will be rescheduled.
  203. entity_info => This is an array containing two elements
  204. #entity => Entity name like node, user, ...
  205. #table_pk => This is entity's table index. For node is nid, for
  206. user si uid, ...
  207. xpath => This is an array containing xpath info and fields
  208. #root => This is XPATH to parent item. Every xpath query will run in
  209. this context.
  210. #uniq => This is XPATH (relative to #root xpath) to a unique value
  211. which identify the item. Resulted value is used to create a
  212. hash for item so this must be unique per item.
  213. #process_function => This is function alias used to process xml file.
  214. See documentation above about process functions.
  215. #items => This is an array containing xpath for fields and filters keyed by
  216. field name.
  217. [field_name] => An array containing info about field, xpath, filters
  218. #field => This is field name
  219. #column => This is column in field table. For body is value, for taxonomy
  220. is tid and so on. If this field is a column in entity field
  221. then this must be NULL.
  222. #xpath => This is an array containig xpaths for this field. Xpaths are
  223. used from first to last until one passes pre-filter functions.
  224. All xpaths are relative to #root.
  225. #default_value => This is default value for field if none of xpaths passes
  226. pre-filter functions. This is used only for
  227. default_value and default_value_filtered actions.
  228. #default_action => Can be one of (see FeedImport::getDefaultActions()):
  229. default_value -field will have this value
  230. default_value_filtered -field will have this value after was filtered
  231. ignore_field -field will have no value
  232. skip_item -item will not be imported
  233. #filter => An array containing filters info keyed by filter name
  234. [filter_name] => An array containing filter function and params
  235. #function => This is function name. Can also be a public static
  236. function from a class with value ClassName::functionName
  237. #params => An array of parameters which #function recives. You can use
  238. [field] (this value can be changed from settings page) to
  239. send current field value as parameter.
  240. #pre_filter => Same as filter, but these functions are used to pre-filter
  241. values to see if we have to choose an alternative xpath.
  242. To see a real feed array, first create some feeds using UI and then you can use
  243. code below to print its structure:
  244. $feeds = FeedImport::loadFeeds();
  245. drupal_set_message('
    ' . print_r($feeds, TRUE) . '
    ');
  246. ------------------------------
  247. Real example
  248. ------------------------------
  249. Please check project page for an example.
  250. http://drupal.org/sandbox/SorinSarca/1331632