Skip to main content

 

Interana Docs

About Interana privacy purges

The European Union (EU) General Data Protection Regulation (GDPR) was designed to protect EU citizens' data privacy, and reshape the way organizations approach data privacy.

An Interana privacy purge lets you comply with GDPR and other privacy regulations, as well as any voluntary privacy policies your company might adhere to. For more information about GDPR regulations, see How to comply with GDPR in the Interana version 2 docs.

This document covers the following topics:

What happens in a privacy purge

An Interana privacy purge lets you protect the privacy of Interana users and users of services whose data resides in Interana, as illustrated below.

Behavioral information about your users—Interana as the repository of privacy data

GDPR_scenario1.png

 

Information about Interana users—Interana as the producer of privacy data

GDPR_scenario2.png

The following table lists the types of data affected during a purge, with an explanation of what happens to each data type in the process.

Type of purge data  Meaning
Source event data files

You are responsible for purging your source event data files of an individual's behavioral data.

If you do not sanitize your original material, we recommend that you maintain a cumulative set of purged IDs. Then if you have to reingest, you can rerun the purge.

Event data When data stored in Interana contains references to purged user actions, the entire event record is purged when it matches the purge user in any of the purge identifier columns.
String data

Strings that exactly match a purge identifier are deleted from the string server. Other strings that are associated with deleted events are de-linked and not purged. 

IMPORTANT: The Interana admin must be careful to only pass purge identifier values that are Personally Identifiable Information (PII), such as email addresses and GUIDs. The purge utility removes whatever values the admin requests, from whatever columns.

Query result history Dashboard caches are refreshed or aged out within 30 days.
Named expressions, global filters, dashboards, and derived columns

Named expressions, global filters, dashboards, and derived columns created by a purged user are considered the intellectual property of the company for which the purge user worked, and are not removed. 

For named expressions, global filters, dashboards, and derived columns that reference the purged user in their filters or other query parameters, the purge command deletes them entirely. An Interana admin has the option to run the purge in preview mode (without the --run flag) which will print a list of references without deleting them.

Interana user account

Named expressions, global filters, dashboards, and derived columns created by the user, and any audit history of changes to these objects edited by the purged user, are considered the intellectual property of the company for which the purge user worked and are left intact. Interana admins can remove them manually, as necessary. 

System backups

For system backups, you can set a policy of not retaining backups for longer than your governing policies allow.

NOTE: If you are subject to the EU's GDPR, a 30 day retention policy ensures compliance.

Human readable and structured system logs

We recommend that you rotate the logs on the Interana cluster within seven days.

If these files are downloaded off the cluster for longer storage, keep a cumulative list of purge user identifiers, so you can rerun privacy requests should the logs be needed for analysis at a later time.

Available scheduling of privacy purges

You can choose to perform an ad hoc privacy purge, or you can use the automated purge pipeline available in Interana version 4.5, or a combination of both.

The ad hoc privacy purge is intended to be used infrequently, perhaps once a month. Some customers hosting Interana on multiple clusters run custom-built middleware that makes use of the ad hoc purges by queuing purge jobs and running them in order.

What you should know before scheduling a privacy purge

Before you schedule a privacy purge, it's important that you are aware of the downstream affects:

  • Deleted references to a purged user can cause query failures and dashboard charts to disappear.
  • A purge scans and deletes specified rows at a speed of 10 GB/hour per data node. Performance may be impacted by importing a large volume of data while the purge is running. Such as, the equivalent of one import node (with 4 CPUs) importing as much as it can into one data node (also 4 CPUs), or about 250 million events/day per data node CPU.
  • A privacy purge is a cluster-wide operation with resource-intensive processes. Although it doesn't affect the performance of most queries, longer running queries can take up to 30% more time to complete while a purge is in progress. For this reason, we recommend that you schedule privacy purges at non-peak hours.
  • You can use query structured logging to determine which objects were deleted in a privacy purge.
  • We recommend that you maintain a cumulative set of purged IDs, if you do not to purge your raw logs. That way if the original source files are re-ingested, you can rerun the necessary privacy purges.

The dashboard cache is not included in a privacy purge. However, the cache refreshes every week, clearing out old data. For this reason, there might be a short time when dashboards still display privacy information that has been purged. 

An Interana privacy purge removes the following PII data:
  • Event data
  • String data
  • Query result history
  • Query definition history
  • References in derived columns
  • References in funnels
You are responsible for removing the following PII data:
  • PII data in original source logs
  • PII data in lookup files

Requirements for a privacy purge

This section covers the procedural and data structure requirements you must adhere to for a successful privacy purge, then outlines the information you should have on hand before you begin.

A privacy purge runs across all data available on the cluster at the time of the purge, including data that is in the process of being imported. Data that is imported after the purge pass completes, is not scanned unless a new purge is run.

Procedural requirements and limitations

  • DO NOT launch a privacy purge while a cluster rebalance is in progress. Wait until the cluster rebalance is complete before starting a privacy purge.
  • DO NOT run a privacy purge on a lookup table. Privacy purge does not currently support lookup tables.
  • DO NOT attempt to use a file larger than 16 MB (16000 K bytes) in a purge, or the job will hang.

Data structure requirements

The following points are required for a successful privacy purge:

  • If a column name exists in multiple tables, the columns must be of the same type. 
  • If there are columns with the same name but are of different types, change the column names in the Interana UI so they are unique.
  • Hexadecimal/Identifier columns must be in the format of the original ingested (raw source) data, such as the original value of a GUID: 
    "30dd879c-ee2f-11db-8314-0800200c9a66". A privacy purge requires the original ingested (raw source) value.
  • String and integer sets are not deleted in a privacy purge.
  • Each string must be individually specified for a purge. For example, a "userID" and the "userID@mailaddress" must be individually specified to be deleted.
  • A userID that is in a column description (in the Interana UI) will not be deleted in a privacy purge. You must manually remove any privacy information that appears in column descriptions.
  • If a userID appears in a dashboard title, that dashboard will not be deleted in a privacy purge. However, the dashboard title will appear in the metadata delete preview, flagging it for manual deletion.
  • Advanced filters, titles, descriptions, and derived columns that contain decimal values or a plain text string that contains a space, are not deleted in a privacy purge.
  • Only exact instances of a string are deleted. If the string appears with a letter or number adjacent, it is considered a different string (because it's not an exact match) and is not deleted.
  • Privacy purges do not currently support deleting UTF-8 characters, such as kanji and emoji.

Perform an automated privacy purge

For details about creating an automated pipeline for GDPR or other privacy purges, see Perform an automated privacy purge.

Schedule an ad hoc privacy purge

For details about scheduling an ad hoc privacy purge, see Perform an ad hod privacy purge.

For more information

  • Was this article helpful?