Skip to main content
Interania

Admin Guide: Manage data rolloff and deletion

0votes
26updates
77views

Rolling off and deleting old data efficiently minimizes storage usage across all tiers.

  • Data rolloff—Removes data after a specified time period. This can be set up as a continuous process, or a onetime event.
  • Data deletion—Removes a column of data, or data within a specified time interval. This is a onetime event.

This document covers the following topics:

Guidelines for data rolloff and deletion

In general, it's recommended that you have enough storage space to accommodate 90 days of data. If your company requires a larger data retention period, estimate the required storage capacity for that time period. The following are a few recommended guidelines for managing data rolloff and deletion:

Expert data management is available as an Interana Managed Service. For more information, contact help@interana.com.

Rolling off data

You can roll off data for a specified time range in the following ways:

Interana supports rolling off event data, but not filter sequence data.

Automatic data rolloff

You can run an Interana script to schedule automatic data rolloff for a fixed time interval, specifying values for the following script variables: 

  • <import-node-ip>— IP address of the import node on which the job will be scheduled
  • <table_name> — Name of the table from which the data will be rolled off
  • <#> — Number of days after which data will be rolled off
To schedule automatic data rolloff, do the following:
  1. Open a terminal window and log in to the config node of your cluster.
  2. Enter the following command, substituting your values for the variables.
/usr/share/python/interana-python/bin/python /opt/interana/backend/import_server/retention_window.pyo 
--host <import-node-ip> -t <table_name> -w <#>d -y

The following example deletes data older than 90 days for the Event_Log_Data table on the import host 10.10.10.10

/usr/share/python/interana-python/bin/python /opt/interana/backend/import_server/retention_window.pyo 
--host 10.10.10.10 -t Event_Log_Data -w 90d -y

Manual data rolloff

You can manually rolloff data for a specified time range with the Interana CLI ia table delete-time-range command. The following guidelines apply:

  • The table_name and end_time arguments are required, while start_time is optional. 
  • This command can only be used for Event tables.
  • Omit the start_time if you are rolling off data.
  • The --im-feeling-lucky option is recommended for only rolling off old data.
  • Specify a start_time if you are attempting to delete a time-range of data in the middle of a data-set. 
  • Both timestamps must be in milliseconds (ms) of Epoch or Unix time, such as: 1487106411644
  • The default is dry-run mode. Use the --run or -r option to execute the command.

The following table provides the command syntax and explains the options.

ia table delete-time-range table_name [start_time] end_time [--keep-strings] [--delete-import-records] [--im-feeling-lucky] [--run]
ia table delete-time-range
  table_name Name given to the Event table.
  start_time The start time of the deletion is optional. If specified, strings are not deleted. Use epoch or Unix time. For a list of valid parameters, see the CLI ingest Quick Start.
  end_time The end time of the deletion is required. Use epoch or Unix time. For a list of valid parameters, see the CLI ingest Quick Start.
  Optional arguments  
  --delete-import-records Deletes all import records for this table.
  --im-feeling-lucky

Expedites data deletion by only deleting folders that are completely within the specified time range. This method typically deletes less data than may be desired, but is a good choice if you are simply rolling off old data.

  --instance-name <name> Name of the Interana cluster, in the event there are multiple configured clusters.
  --keep-strings Forces strings to be kept in the deletion process.
  -r, --run Executes the command. The default is dry-run mode.
  --unsafe Use when there is not a valid certificate. To acquire a valid certificate, see How to replace a self-signed certificate.
  -v Sets verbose mode, shows crash stack trace.
  --version Displays the version of Interana and Interana CLI currently installed.
  -help, --help Prints help for this command and then exits.
To manually roll off data with the CLI, do the following:
  1. Open a Terminal window and log in to the config node of the cluster.
  2. Enter the following command, substituting values for the variables.
ia table delete-time-range <table_name> <end_time> --im-feeling-lucky -r

In the following example, data rolloff will occur from the IAexample table after 1400106411644 (time in milliseconds). The --im-feeling-lucky option rolls off any (complete) folder that occurs after the specified time. This option provides faster response, because it deletes complete folders.

The -r option executes the command. The default is for dry-run mode.

ia table delete-time-range IAexample 1400106411644 --im-feeling-lucky -r
Time-range specified (IN UTC):
Start: Not specified, End: 2016-05-14 22:26:51
Note: Please do not close this prompt. To monitor the progress, tail the import -api-server logs on import -api node. 
Table # Folders Affected # Events Deleted String Delete Status Import Record Status
IAexample approx. 48 4,042,839 Scheduled pruning of 2 string columns Unaffected
Table IAexample has completed time-range delete.

The precise delete-time-range takes longer to execute than the --im-feeling-lucky option, that deletes entire folders. The precise time-range deletion parses every folder to protect buckets that have events that fall outside the specified time range. 

Deleting data

There may be times when you want to delete the data from an entire table, or delete the table entirely. You can do this with the Interana CLI ia table delete command. You can use ia table delete to remove any table from your cluster, including lookup tables. Auto completion displays both types of tables (event and lookup) when applicable. 

The following table shows the ia table delete command syntax and explains the options. 

ia table delete table_name [--keep-metadata] [--run/-r]
ia table delete      
  table_name Name used for the table.
  Optional arguments  
  --instance-name <name> Name of the Interana cluster, in the event there are multiple configured clusters.
  --keep-metadata Use to persist the metadata (table definitions).
  -r, --run Use to execute the command. The default is dry-run mode.
  --unsafe Use when there is not a valid certificate. To acquire a valid certificate, see How to replace a self-signed certificate.
  -v Sets verbose mode, shows crash stack trace.
  --version Displays the version of Interana and Interana CLI currently installed.
  -help, --help Prints help for this command and then exits.

Deleting data from a data or string tier

You can use the ia table delete and ia table delete-time-range commands to delete data from a data or string tier. You can use the ia table delete-time-range command to delete data in the middle of a table, as well as for rolling off data

  • The ia table delete default is dry-run mode. This allows you to verify that a specified table can be deleted. You then use the -r or --run option to execute the command.
  • The --keep-metadata parameter allows you too keep table definitions, while deleting all of the existing data (event and strings) and import records.

Lookup tables will not have a # of Events column, since they do not contain event data.

To delete the data from a tier, do the following:
  1. Open a terminal window and log in to the Interana config node.
  2. Enter the following command, substituting the actual table name for the <table_name> variable.
ia table delete <table_name> --keep-metadata --run

In the following example, the data from the fashion2 table is deleted while keeping the table definitions.

ia table delete fashion2 --keep-metadata  -r
Initiating deletion for table fashion2, please wait...
Table Metadata Import Record # String Colums Deleted # Folders Deleted # Events Deleted
fashion2 Unchanged Success 19 11  12,025
Table fashion2 has been deleted. Use ia table list to see the status of tables.

Deleting data from an overloaded string tier

There may be times when the number of processes running on the string tier is so high that an import becomes blocked. When this happens, it appears that data is not coming in, or is coming in very slowly. If you look in the import-pipeline log, you will see the "String tier is overloaded, back off" error message.

If the string tier is overloaded because of one or more very high-cardinality columns, you can try to delete the columns with the ia column delete command. However, the command will most likely time out. The solution is to stop the string tier, let the threads die, and then restart it. This section walks you through this process.

It's important that you schedule downtime for the cluster to resolve this issue.

 

To delete data when the string tier is overloaded, do the following:

  1. Log into the Interana config node on the cluster, and pause all jobs with the following command.
ia job -- all pause 
  1. Log in to each import node on the cluster and run the following command.
iactl stop string-server
  1. Log in to the first string node and run the following command.
iactl stop string-server-leaf
  1. Before going on to the next string node, run the following command to verify that the strings have stopped. The results should be zero.
pgrep string | wc -l 
  1. Log in to the other string nodes and run the following command.
iactl stop string-server-leaf

Now that all the services are stopped, clearing the string tier from overloaded processes, you can start the services up again.

  1. Log in to each import node and run the following command.
iactl start string-server
  1. Log in to each string node and run the following command.
iactl start string-server-leaf
  1. Verify the status of the string servers and leaf processes with the following command. The servers and processes should be running on all nodes.
iactl status
  1. You can now log in to the string tier and delete the desired column with the following command, substituting the actual table and column names for the variables.
ia column delete <table_name> <column_name> --run

For more information on the ia column command, see the Interana CLI Reference.

What's Next

For more information on how to estimate your data usage and storage needs see the Admin Guide. For more information on CLI commands see the Interana CLI Reference.

 

 

  • Was this article helpful?