You are here

protected function GdprSqlDump::sanitizeData in General Data Protection Regulation 8

Same name and namespace in other branches
  1. 8.2 modules/gdpr_dump/src/Service/GdprSqlDump.php \Drupal\gdpr_dump\Service\GdprSqlDump::sanitizeData()
  2. 3.0.x modules/gdpr_dump/src/Service/GdprSqlDump.php \Drupal\gdpr_dump\Service\GdprSqlDump::sanitizeData()

Go through the data and sanitize it.

Throws

\Exception

1 call to GdprSqlDump::sanitizeData()
GdprSqlDump::prepare in modules/gdpr_dump/src/Service/GdprSqlDump.php
Prepare the database for the dump.

File

modules/gdpr_dump/src/Service/GdprSqlDump.php, line 276

Class

GdprSqlDump
Class GdprSqlDump.

Namespace

Drupal\gdpr_dump\Service

Code

protected function sanitizeData() {

  /* @todo
   * Remote API call optimization:
   *   Prefetch the required amount of data from remote APIs.
   *   Maybe do it on a table level.
   */

  /** @var array $anonymizationOptions */
  foreach ($this->tablesToAnonymize as $table => $anonymizationOptions) {
    if (\array_key_exists($table, $this->tablesToSkip)) {
      continue;
    }
    $selectQuery = $this->database
      ->select($table);
    $selectQuery
      ->fields($table);
    $oldRows = $selectQuery
      ->execute();
    if (NULL === $oldRows) {

      // @todo: notify
      continue;
    }
    $clonedTable = self::GDPR_TABLE_PREFIX . $table;
    $tableColumns = $this->databaseManager
      ->fetchColumnNames($table);
    $insertQuery = $this->database
      ->insert($clonedTable);
    $insertQuery
      ->fields($tableColumns);
    $query = $this->database
      ->select('information_schema.columns', 'columns');
    $query
      ->fields('columns', [
      'COLUMN_NAME',
      'CHARACTER_MAXIMUM_LENGTH',
    ]);
    $query
      ->condition('TABLE_SCHEMA', $this->database
      ->getConnectionOptions()['database']);
    $query
      ->condition('TABLE_NAME', $table);
    $columnDetails = $query
      ->execute()
      ->fetchAllAssoc('COLUMN_NAME');
    while ($row = $oldRows
      ->fetchAssoc()) {
      foreach ($anonymizationOptions as $column => $pluginId) {

        /* @todo
         * Maybe it would be better to use 'per table' sanitation,
         * so username, email, etc can be the same.
         * E.g myuser could have myuser@example.com as a mail, not
         * somethingelse@example.com
         *
         * @todo:
         * Also add a way to make exceptions
         * e.g option for 'don't alter uid 1 name', etc.
         */
        $tries = 0;
        do {
          $isValid = TRUE;
          $value = $this->pluginFactory
            ->get($pluginId)
            ->anonymize($row[$column]);
          if (!empty($columnDetails[$column]->CHARACTER_MAXIMUM_LENGTH) && strlen($value) > $columnDetails[$column]->CHARACTER_MAXIMUM_LENGTH) {
            $isValid = FALSE;
          }
        } while (!$isValid && $tries++ < 50);
        if ($tries > 50) {
          throw new GdprDumpAnonymizationException("Too many retries for column '{$column}'.");
        }
        $row[$column] = $value;
      }
      $insertQuery
        ->values($row);
    }
    $insertQuery
      ->execute();
  }
}