Skip to main content
Interania

Transformer library reference

0votes
1updates
14views
This applies tov2.22

This page details the different transformations available in the Interana Transformer Library. See Use Transformer Library to Transform Data Before Ingest for information on how to put these to use.

If you're looking for something in particular, you may want to try the table of contents to the right.

Before loading data

gunzip

Decompress gzip files.

Example

Decompress test.json.gz

Configuration Input Output
[
["gunzip"],
["decode"]
]
test.json.gz {"a": 1, "b": 2}
{"c": 3, "d": 4}
{"e": 1, "f": 2}

bunzip2

Decompress bz2 files.

Example

Decompress test.json.bz2

Configuration Input Output
[
["bunzip2"],
["decode"]
]

test.json.bz2

{"a": 1, "b": 2}
{"c": 3, "d": 4}
{"e": 1, "f": 2}

unpack_json_array

Split array of objects into individual objects. This is useful when event logs have multiple events in a single JSON array.

Example

Unpack three JSON objects within single JSON array into line separated objects.

Configuration Input Output
[
["unpack_json_array"],
["decode"]
]
[{"a": 1, "b": 2}, {"c": 3, "d": 4}, {"e": 1, "f": 2}] {"a": 1, "b": 2}
{"c": 3, "d": 4}
{"e": 1, "f": 2}

sample

Take a random sample of each input file.

Parameter Type Required? Description
sample_size int Yes Number of lines to sample
random_seed int Yes Seed for random number generator. Used to sample same random events across multiple runs.

Example

Sample two lines from three total.

Configuration Input Output
[
["sample", {"sample_size": 2}],
["decode"]
]
{"a": 1, "b": 1}
{"a": 2, "b": 2}
{"a": 3, "b": 3}
{"a": 3, "b": 3}
{"a": 2, "b": 2}

head

Read beginning of each input file.

Parameter Type Required? Description
sample_size int Yes Number of lines to read from beginning of each file

Example

Take first two lines of input files

Configuration Input Output
[
["head", {"sample_size": 2}],
["decode"]
]
{"a": 1, "b": 1}
{"a": 2, "b": 2}
{"a": 3, "b": 3}
{"a": 1, "b": 1}
{"a": 2, "b": 2}

tail

Read end of each input file.

Parameter Type Required? Description
sample_size int Yes Number of lines to read from the end of each file

Example

Take last two lines of input files.

Configuration Input Output
[
["tail", {"sample_size": 2}],
["decode"]
]
{"a": 1, "b": 1}
{"a": 2, "b": 2}
{"a": 3, "b": 3}
{"a": 2, "b": 2}
{"a": 3, "b": 3}

regex_replace_line

Perform regular expression replacement on the line. The replacement string uses Python-style backreferences.

Parameter Type Required? Description
regex string Yes Regular expression to match
repl_string string Yes Replacement string to substitute

Example

Fix unquoted JSON key (user) before JSON load step.

Configuration Input Output
[
["decode"],
["regex_replace_line", {"regex": "user:", "repl_string": "\"user\":"}]
]
{user: "u1", "trans": "x1", "val": 12}
{user: "u1", "trans": "x2", "val": 9}
{user: "u2", "trans": "x1", "val": 21}
{"user": "u1", "trans": "x1", "val": 12}
{"user": "u1", "trans": "x2", "val": 9}
{"user": "u2", "trans": "x1", "val": 21}

decode

Decode file using specified character encoding.

Parameter Type Required? Description
encoding string No The character encoding to use. This parameter is optional, and defaults to utf-8.

Example

Decode a file.

Configuration Input Output
[
["decode"]
]
n/a n/a

Load data into Python dictionary

json_load

Deserialize JSON and calculate a unique token that Interana uses to deduplicate lines.

Parameter Type Required? Description
unique_columns list No

List of columns to use to generate a unique token.

This parameter is optional. If not specified, the original line will be used.

Example

Specify columns user and trans to generate a unique token.

Configuration Input Output
[
["decode"],
["json_load", {"unique_columns": ["user", "trans"]}],
["json_dump"]
]
{"user": "u1", "trans": "x1", "val": 12}
{"user": "u1", "trans": "x2", "val": 9}
{"user": "u2", "trans": "x1", "val": 21}
{"user":"u1","__ia__unique__token__":6073597307750309666,"trans":"x1","val":12}
{"user":"u1","__ia__unique__token__":8385417071786987290,"trans":"x2","val":9}
{"user":"u2","__ia__unique__token__":2558069477487871788,"trans":"x1","val":21}

csv_load

Load CSV data into dictionaries.

By default, csv_load assumes the first line of the file is a primary header row specifying field names. If the CSV file does not contain a header row, or you wish to override the existing header, you can configure a custom "primary header" row.  

In the case where one or more CSV lines do not match the number of fields in the primary header, csv_load uses a best-effort algorithm:

  • If a row of the CSV file has fewer fields than the header, csv_load fills as many fields as possible in order.
  • If a row of the CSV file has more fields than the header, csv_load generates extra header names like field_N, field_N+1, starting with the number of fields already present in the header.

If you more control over handling non-standard CSV lines, you can specify additional header rows (each with a distinct number of fields), in which case csv_load will attempt to find a header with exactly the same number of fields as the current line and use that one. If neither the primary header nor any alternate header matches the number of fields in the current line, csv_load will use the "primary header" and fall back to its best-effort algorithm.

Parameter Type Required? Description
unique_columns list No List of columns to use to generate unique token. If not specified, the original line will be used.
delimiter string No Character to use as delimiter, e.g. a comma or a tab.
headers list No List of list of headers. Headers are chosen by number of elements in each CSV line.
header_row string No

One of use, skip, or no_header(the default value). 

If use, the header in the file is used if no headers match.

If no primary_header is set, the file header is used as the primary.

primary_header list No Headers to use if no other headers match.
doublequote boolean No

Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar.

The default value is True.

escapechar string No Removes any special meaning from the following character. It defaults to None, which disables escaping.
quotechar string No A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to '"'.
skipinitialspace boolean No

When True, whitespace immediately following the delimiter is ignored.

The default is False.

Example

Load CSV data into dictionary, dump into JSON.

Configuration Input Output
[
["decode"],
["csv_load"],
["json_dump"]
]
User,Event,Value
John,Login,0
Todd,Logout,1
{"Event":"Login","Value":"0","User":"John","__ia__unique__token__":6739147714641650777}
{"Event":"Logout","Value":"1","User":"Todd","__ia__unique__token__":311502209663632137}

After data load (dictionary-to-dictionary transformations)

hash

Calculate non-cryptographic murmurhash3.

Parameter Type Required? Description
column string Yes The column to hash

Example

Hash username

Configuration Input Output
[
["decode"],
["json_load"],
["hash", {"column": "username"}],
["json_dump"]
]
{"username": "BenHogan", "event": "putt", "dist": 3}
{"username": "ArnoldPalmer", "event": "chip", "dist": 15}
{"username": "JackNicklaus", "event": "drive", "dist": 267}
{"username":8545539730387850342,"dist":3,"__ia__unique__token__":4345827873645831930,"event":"putt"}
{"username":4800646705760021970,"dist":15,"__ia__unique__token__":7955354759641172706,"event":"chip"}
{"username":1304984085549885728,"dist":267,"__ia__unique__token__":3709088310454520588,"event":"drive"}

anonymize

Calculate 52-bit sha1 hash.

Parameter Type Required? Description
column string Yes The column to anonymize

Example

Anonymize username.

Configuration Input Output
[
["decode"],
["json_load"],
["anonymize", {"column": "username"}],
["json_dump"]
]
{"username": "BenHogan", "event": "putt", "dist": 3}
{"username": "ArnoldPalmer", "event": "chip", "dist": 15}
{"username": "JackNicklaus", "event": "drive", "dist": 267}
{"event":"putt","username":2528885808796705,"dist":3,"__ia__unique__token__":4345827873645831930}
{"event":"chip","username":1922980560516075,"dist":15,"__ia__unique__token__":7955354759641172706}
{"event":"drive","username":1542261379125033,"dist":267,"__ia__unique__token__":3709088310454520588}

lowercase

Convert all or some column names to lowercase strings. If a name in columns is not present, a lowercased column name will be created and set to null.

This is different from the lowercase_value() parameter. 

Parameter Type Required? Description
columns list No Columns to be renamed lowercase. Default is all columns.

Example

Change uppercase characters in USERNAME to lowercase.

Configuration Input Output
[
["decode"],
["json_load"],
["lowercase"],
["json_dump"]
]
{"USERNAME": "BenHogan", "event": "putt", "dist": 3}
{"USERNAME": "ArnoldPalmer", "event": "chip", "dist": 15}
{"USERNAME": "JackNicklaus", "event": "drive", "dist": 267}
{"dist":3,"username":"BenHogan","__ia__unique__token__":8497066893068604652,"event":"putt"}
{"dist":15,"username":"ArnoldPalmer","__ia__unique__token__":587546423006921598,"event":"chip"}
{"dist":267,"username":"JackNicklaus","__ia__unique__token__":5859163929152062153,"event":"drive"}

lowercase_value

Convert a column to a lowercase string.

Parameter Type Required? Description
column string Yes Column that contains values to be lowercased.

Example

Convert values of text field to lowercase.

Configuration Input Output
[
["decode"],
["json_load"],
["lowercase_value", {"column": "text"}],
["json_dump"]
]
{"user": "Sam", "text": "Wet Paint"}
{"user": "John", "text": "STOP"}
{"user": "Elyse", "text": "Do Not Enter"}
{"__ia__unique__token__":2233568130467114876,"text":"wet paint","user":"Sam"}
{"__ia__unique__token__":4048437015924438854,"text":"stop","user":"John"}
{"__ia__unique__token__":5175482217519506709,"text":"do not enter","user":"Elyse"}

add_label

Set a column to the value label for every event.

Parameter Type Required? Description
column string Yes Name of column to put label in.
label string Yes Value to put in column.

Example

Add category : electronics to every event.

Configuration Input Output
[
["decode"],
["json_load"],
["add_label", {"column": "category", "label": "electronics"}],
["json_dump"]
]
{"user": "Sam", "amt": 1}
{"user": "John", "amt": 2}
{"user": "Elyse", "amt": 3}
{"__ia__unique__token__":6792123659669767925,"category":"electronics","user":"Sam","amt":1}
{"__ia__unique__token__":371460198962001743,"category":"electronics","user":"John","amt":2}
{"__ia__unique__token__":2941371116376991439,"category":"electronics","user":"Elyse","amt":3}

deep_copy

Make a complete copy of a column.

Parameter Type Required? Description
column string Yes Column name to copy from.
new_name string Yes Column name to copy to.

Example

Copy user values to new column usercopy.

Configuration Input Output
[
["decode"],
["json_load"],
["deep_copy", {"column": "user", "new_name": "usercopy"}],
["json_dump"]
]
{"user": "Sam", "amt": 1}
{"user": "John", "amt": 2}
{"user": "Elyse", "amt": 3}
{"amt":1,"user":"Sam","__ia__unique__token__":6792123659669767925,"usercopy":"Sam"}
{"amt":2,"user":"John","__ia__unique__token__":371460198962001743,"usercopy":"John"}
{"amt":3,"user":"Elyse","__ia__unique__token__":2941371116376991439,"usercopy":"Elyse"}

rename

Rename a column.

Parameter Type Required? Description
column string Yes Column to rename
new_name string Yes New name for the column

Example

Rename column blah to event_type.

Configuration Input Output
[
["decode"],
["json_load"],
["rename", {"column": "blah", "new_name": "event_type"}],
["json_dump"]
]
{"user": "Sam", "amt": 1, "blah": "blah"}
{"user": "John", "amt": 2, "blah": "blah2"}
{"user": "Elyse", "amt": 3, "blah": "blah3"}
{"amt":1,"event_type":"blah","user":"Sam","__ia__unique__token__":3893872012624762980}
{"amt":2,"event_type":"blah2","user":"John","__ia__unique__token__":4673211942423344710}
{"amt":3,"event_type":"blah3","user":"Elyse","__ia__unique__token__":4791908289683167877}

omit

Omit specified columns.

Parameter Type Required? Description
columns list Yes List of column names to omit

Example

Omit column blah from events.

Configuration Input Output
[
["decode"],
["json_load"],
["omit", {"columns": ["blah"]}],
["json_dump"]
]
{"user": "Sam", "amt": 1, "blah": "blah"}
{"user": "John", "amt": 2, "blah": "blah2"}
{"user": "Elyse", "amt": 3, "blah": "blah3"}
{"user":"Sam","__ia__unique__token__":3893872012624762980,"amt":1}
{"user":"John","__ia__unique__token__":4673211942423344710,"amt":2}
{"user":"Elyse","__ia__unique__token__":4791908289683167877,"amt":3}

add_filename

Set column to the name of the file the event is from.

Parameter Type Required? Description
column string Yes Column to add the file name to.

Example

Add filename in.json to events.

Configuration Input Output
[
["decode"],
["json_load"],
["add_filename", {"column": "filename"}],
["json_dump"]
]
{"user": "Sam", "amt": 1}
{"user": "John", "amt": 2}
{"user": "Elyse", "amt": 3}
{"__ia__unique__token__":6792123659669767925,"filename":"in.json","user":"Sam","amt":1}
{"__ia__unique__token__":371460198962001743,"filename":"in.json","user":"John","amt":2}
{"__ia__unique__token__":2941371116376991439,"filename":"in.json","user":"Elyse","amt":3}

add_file_date

Set column to the datestamp of the file the event is from. If specifying input files as arguments on the command line, requires passing the --file-date flag in %Y-%m-%dT%H:%M:%S.%fZ format.

Parameter Type Required? Description
column string Yes Column to add the file date to.

Example

Add file date to events.

Configuration Input Output
[
["decode"],
["json_load"],
["add_file_date", {"column": "filedate"}],
["json_dump"]
]
{"user": "Sam", "amt": 1}
{"user": "John", "amt": 2}
{"user": "Elyse", "amt": 3}
{"amt":1,"filedate":"2016-05-05T12:12:12.123Z","user":"Sam","__ia__unique__token__":6792123659669767925}
{"amt":2,"filedate":"2016-05-05T12:12:12.123Z","user":"John","__ia__unique__token__":371460198962001743}
{"amt":3,"filedate":"2016-05-05T12:12:12.123Z","user":"Elyse","__ia__unique__token__":2941371116376991439}

flatten_dict

Flatten dictionary by concatenating successive keys with a . ("dot") character.

Parameter Type Required? Description
columns list No List of column names to flatten. Defaults to all columns.
depth_limit int No Recursion limit. Defaults to no limit.

Example

Flatten user column one level.

Configuration Input Output
[
["decode"],
["json_load"],
["flatten_dict", {"columns": ["user"], "depth_limit": 1}],
["json_dump"]
]
{"user": {"name": "John", "info": {"phone": 5551234, "city": "Redwood City"}}, "event": "login"} {"user.name":"John","event":"login","__ia__unique__token__":4975147593049633177,"user.info":{"phone":5551234,"city":"Redwood City"}}

shred_array

Change array of objects into an object of arrays.

Example

Shred array exp.

Configuration Input Output
[
["decode"],
["json_load"],
["shred_array", {"column": "exp"}],
["json_dump"]
]
{"user": "John", "exp": [{"var": "a", "val": "1"}, {"var": "b", "val": "4"}]}
{"user": "Fred", "exp": [{"var": "b", "val": "5"}, {"var": "c", "val": "4"}]}
{"__ia__unique__token__":7370096177795160827,"user":"John","exp":{"val":["1","4"],"var":["a","b"]}}
{"__ia__unique__token__":2876036297403557858,"user":"Fred","exp":{"val":["5","4"],"var":["b","c"]}}

split_array

Split a column containing an array into multiple columns.

Parameter Type Required? Description
column string Yes Split this column into new columns if it is an array.
output_columns string No Column names to use. Defaults to column suffixed with .N.

Example

Split values in array exp into three new columns.

Configuration Input Output
[
["decode"],
["json_load"],
["split_array", {"column": "exp", "output_columns": ["animal1", "animal2", "animal3"]}],
["json_dump"]
]
{"user": "John", "exp": ["cat","dog","hamster"]}
{"user": "Fred", "exp": ["lion","tiger","bear"]}
{"exp":["cat","dog","hamster"],"animal1":"cat","animal3":"hamster","user":"John","__ia__unique__token__":2032956903660321772,"animal2":"dog"}
{"exp":["lion","tiger","bear"],"animal1":"lion","animal3":"bear","user":"Fred","__ia__unique__token__":4045256004748629181,"animal2":"tiger"}

convert_to_array

Split column on separator into a list of values in output_column.

Parameter Type Required? Description
separator string Yes String to split on; e.g., ,  (comma).

Example

Split blurb into an array containing individual words in blurb.

Configuration Input Output
[
["decode"],
["json_load"],
["convert_to_array", {"column": "blurb", "separator": " "}],
["json_dump"]
]
{"user": "John", "blurb": "I like trains"}
{"user": "Sam", "blurb": "I hate pizza"}
{"user": "Dave", "blurb": "I like pizza"}
{"__ia__unique__token__":8715638442104830642,"user":"John","blurb":["I","like","trains"]}
{"__ia__unique__token__":3445418821193718572,"user":"Sam","blurb":["I","hate","pizza"]}
{"__ia__unique__token__":7497843545051763820,"user":"Dave","blurb":["I","like","pizza"]}

keep

Keep only the specified columns, dropping all others. Does not drop the Interana deduplication token.

Parameter Type Required? Description
columns list Yes List of column names to keep. 

Example

Keep only user and event columns.

Configuration Input Output
[
["decode"],
["json_load"],
["keep", {"columns": ["user", "event"]}],
["json_dump"]
]
{"user": "Sam", "event": "login", "blah": "blah"}
{"user": "John", "event": "expand", "blah": "blah2", "junk": 1}
{"user": "Elyse", "event": "sendmsg", "blah": "blah3"}
{"event":"login","user":"Sam","__ia__unique__token__":8468468302936289505}
{"event":"expand","user":"John","__ia__unique__token__":2346607208574296553}
{"event":"sendmsg","user":"Elyse","__ia__unique__token__":7644083907636303077}

whitelist_string_values

Remove values from a string column not in the given list.

Parameter Type Required? Description
column string Yes Column to operate on
whitelist list Yes List of allowed string values

Example

Allow only accepted values in the source column.

Configuration Input Output
[
["decode"],
["json_load"],
["whitelist_string_values", {"whitelist": ["frontend", "backend"], "column": "source"}],
["json_dump"]
]
{"user": "Sam", "event": "click", "source": "frontend"}
{"user": "John", "event": "purchase", "source": "fronted"}
{"user": "John", "event": "load", "source": "backend"}
{"user": "Elyse", "event": "flush", "source": "backeend"}
{"event":"click","user":"Sam","__ia__unique__token__":6897873284472877507,"source":"frontend"}
{"event":"purchase","user":"John","__ia__unique__token__":5406739099689257143,"source":null}
{"event":"load","user":"John","__ia__unique__token__":7635827136580490088,"source":"backend"}
{"event":"flush","user":"Elyse","__ia__unique__token__":2880053427446368098,"source":null}

merge_keys

Merge column_1 and column_2 into output_column using sha1.

Combine two mutually exclusive shard key columns into one, allowing them to share a table copy.

  • If column_1 is None and column_2 is not None, output_column will be set to digest of column_2 with LSB set to 0.
  • If column_2 is None and column_1 is not None, output_column will be set to digest of column_1 with LSB set to 1.
  • If both are set, or both are None, output_column will be set to None.
Parameter Type Required? Description
column_1 string Yes Column to merge and set LSB to 0.
column_2 string Yes Column to merge and set LSB to 1.
output_column string Yes Column to put result in.

Example

Merge columns web1 and web2 into column merge.

Configuration Input Output
[
["decode", {"encoding": "utf8"}],
["json_load"],
["merge_keys", {"column_1": "web1", "column_2": "web2", "output_column": "merge"}],
["json_dump"]
]
{"web1": null, "web2": "w1", "val": 12}
{"web1": "u1", "web2": null, "val": 9}
{"web1": "u2", "web2": "w1", "val": 10}
{"val":12,"__ia__unique__token__":7297373526901799213,"merge":724008111296322,"web2":"w1","web1":null}
{"val":9,"__ia__unique__token__":7187483173479303532,"merge":3481771430954869,"web2":null,"web1":"u1"}
{"val":10,"__ia__unique__token__":2818846474846197375,"merge":null,"web2":"w1","web1":"u2"}

join

Join columns together with separator between each. Null columns are skipped.

Parameter Type Required? Description
columns list Yes List of columns names to join
output_column string Yes Column in which to put the resulting string
separator string No String to put between each each column. Defaults to the empty string.

Example

Join first and last name columns into a fullname column.

Configuration Input Output
[
["decode"],
["json_load"],
["join", {"columns": ["first", "last"], "output_column": "fullname", "separator": " "}],
["json_dump"]
]
{"first": "Sam", "last": "Ryan"}
{"first": "John", "last": "Johnson"}
{"first": "Elyse", "last": "Jackson"}
{"__ia__unique__token__":6631810923594382537,"fullname":"Sam Ryan","first":"Sam","last":"Ryan"}
{"__ia__unique__token__":520329767215318629,"fullname":"John Johnson","first":"John","last":"Johnson"}
{"__ia__unique__token__":3271708847906864678,"fullname":"Elyse Jackson","first":"Elyse","last":"Jackson"}

multiply

Perform floating point multiplication.

  • If multiplier is a string, treat it as a column name.
  • If multiplier column is null, result will be null.
Parameter Type Required? Description
column string Yes Column to use as multiplicand
multiplier string Yes Value or column to use as multiplier
output_column string No Column name to put the output into. Defaults to column.

Example

Multiply the price column by the quantity column and put the results in the total column.

Configuration Input Output
[
["decode"],
["json_load"],
["multiply", {"column": "price", "multiplier": "quantity", "output_column": "total"}],
["json_dump"]
]
{"first": "Sam", "event": "purchase", "price": 2.13, "quantity": 4}
{"first": "John", "event": "purchase", "price": 5, "quantity": 2}
{"first": "Elyse", "event": "purchase", "price": 8.56, "quantity": 12}
{"event":"purchase","quantity":4,"price":2.13,"first":"Sam","total":8.52,"__ia__unique__token__":2308855450660799997}
{"event":"purchase","quantity":2,"price":5,"first":"John","total":10.0,"__ia__unique__token__":2479674756660365726}
{"event":"purchase","quantity":12,"price":8.56,"first":"Elyse","total":102.72,"__ia__unique__token__":5658422012269400335}

divide

Perform floating point division.

  • If divisor is a string, treat it as a column name.
  • If divisor column is null, result will be null.
Parameter Type Required? Description
column string Yes Column to use as multiplicand
divisor string Yes Value or column to use as divisor
output_column string No Column name to put the output into. Defaults to column.

Example

Divide CPM column by one million.

Configuration Input Output
[
["decode"],
["json_load"],
["divide", {"column": "CPM", "divisor": 1000000, "output_column": "actual_cost"}],
["json_dump"]
]
{"first": "Sam", "event": "viewad", "CPM": 2385}
{"first": "John", "event": "viewad", "CPM": 4729}
{"first": "Elyse", "event": "viewad", "CPM": 1122}
{"event":"viewad","actual_cost":0.002385,"first":"Sam","CPM":2385,"__ia__unique__token__":2322298538924664253}
{"event":"viewad","actual_cost":0.004729,"first":"John","CPM":4729,"__ia__unique__token__":3128606018585001222}
{"event":"viewad","actual_cost":0.001122,"first":"Elyse","CPM":1122,"__ia__unique__token__":8493916172843408209}

remove_punctuation

Remove all punctuation from a column. Assumes unicode characters.

Parameter Type Required? Description
column string Yes The column to remove punctuation from.
output_column string No Column name to put the output into. Defaults to column.

Example

Strip punctuation from the message column.

Configuration Input Output
[
["decode"],
["json_load"],
["remove_punctuation", {"column": "message"}],
["json_dump"]
]
{"user": "Punctuation", "message": "!@#%&*()_-{}[]:;,.?"}
{"user": "NotPunctuation", "message": "$^+=<>"}
{"user": "Combined", "message": "!@#%&*()_-{}[]:;,.?$^+=<>"}
{"message":"","user":"Punctuation","__ia__unique__token__":1267513633946236610}
{"message":"$^+=<>","user":"NotPunctuation","__ia__unique__token__":1684850124204466571}
{"message":"$^+=<>","user":"Combined","__ia__unique__token__":4132463954692381155}

time_convert

Reformat time column, or do nothing if none of the read formats match. Uses Python-style time format strings, or one of epoch_seconds, epoch_millis, epoch_micros.

If testing on the command line, the max_time_ago and max_time_hence parameters require passing the --file-date flag.

Parameter Type Required? Description
column string Yes The column to perform time conversion on.
output_column string No The column to put the result in.
read_formats list Yes List of datetime format strings to try, in order.
write_format string No Datetime format string for output. Defaults to %Y-%m-%dT%H:%M:%S.%fZ
max_time_ago string No Events older than the file date minus max_time_ago hours will be silently dropped.
max_time_hence string No Events more recent than the file date plus max_time_hence hours will be silently dropped.
max_abs_time string No Events stamped after this ISO-8601 timestamp will be silently dropped.
min_abs_time string No Events stamped before this ISO-8601 timestamp will be silently dropped.


Example

Convert a variety of time formats into %Y-%m-%d %H:%M:%S.

Configuration Input Output
[
["decode"],
["json_load"],
["time_convert", {"read_formats": ["%A, %B %d %Y", "%m/%d/%Y", "%b %d, %Y", "%Y-%m-%d %H:%M:%S"], "write_format": "%Y-%m-%d %H:%M:%S", "column": "ts"}],
["json_dump"]
]
{"ts": "Tuesday, March 22 2016"}
{"ts": "3/22/2016"}
{"ts": "Mar 22, 2016"}
{"ts": "2016-03-22 12:43:30"}
{"__ia__unique__token__":1033586384520328482,"ts":"2016-03-22 00:00:00"}
{"__ia__unique__token__":6737880881032506082,"ts":"2016-03-22 00:00:00"}
{"__ia__unique__token__":803264224078215379,"ts":"2016-03-22 00:00:00"}
{"__ia__unique__token__":2708765703165006780,"ts":"2016-03-22 12:43:30"}

url_parse

Parse URL into dictionary.

Parameter Type Required? Description
column string Yes Column containing the URL to be parsed
default_scheme string No The scheme to append if there is no scheme in the URL. The default value is http.

Example

Parse the referrer URL into dictionary.

Configuration Input Output
[
["decode"],
["json_load"],
["url_parse", {"column": "referrer", "default_scheme": "https"}],
["json_dump"]
]
{"user": "John", "referrer": "john.interana.co.uk:80/videoplay?docid=-7246927612831078230&hl=en&view=dark&loggedin=yes#00h02m30s"} {"__ia__unique__token__":7912114277290934867,"referrer":{"fragment":"00h02m30s","query":"docid=-7246927612831078230&hl=en&view=dark&loggedin=yes","scheme":"https","path":"/videoplay","params":"","port":80,"hostname":"john.interana.co.uk"},"user":"John"}

url_query_string_parse

Parse a url query string of the form key1=value1&key2=value2.

Parameter Type Required? Description
column string Yes The column to remove punctuation from.
output_column string No Column to put result in. The default value is column.
keep_parameters list Yes Only keep parameters in the query string from this list.

Example

Further parse referrer.query into selected parameters.

Configuration Input Output
[
["decode"],
["json_load"],
["url_parse", {"column": "referrer", "default_scheme": "https"}],
["flatten_dict", {"columns": ["referrer"]}],
["url_query_string_parse", {"column": "referrer.query", "keep_parameters": ["hl", "view", "loggedin"]}],
["json_dump"]
]
{"user": "John", "referrer": "john.interana.co.uk:80/videoplay?docid=-7246927612831078230&hl=en&view=dark&loggedin=yes#00h02m30s"} {"user":"John","__ia__unique__token__":7912114277290934867,"referrer.port":80,"referrer.params":"","referrer.path":"/videoplay","referrer.fragment":"00h02m30s","referrer.scheme":"https","referrer.hostname":"john.interana.co.uk","referrer.query":{"hl":"en","view":"dark","loggedin":"yes"}}

name_value_extract

Extract list of generic properties into one object.

Parameter Type Required? Description
column string Yes The list column to extract values from.
name_field string Yes Name of field to treat as generic property name.
value_field string Yes Name of field to treat as generic property value.

Example

Extract price and category info from the item field.

Configuration Input Output
[
["decode"],
["json_load"],
["name_value_extract", {"column": "items", "name_field": "category", "value_field": "price"}],
["json_dump"]
]
{"user": "John", "items": [{"label":"tab.abc","category":"abc","price":17.80},{"label":"tab.f","category":"food","price":41.00}]} {"items":{"food":41.0,"abc":17.8},"user":"John","__ia__unique__token__":1918698498990518500}

regex_extract

Extract matching regular expression subgroups into new columns.

Parameter Type Required? Description
column string Yes Name of column to extract from.
regex string Yes Regular expression to match with. Regex must match the entire string.
output_columns list Yes List of column names to put regex subgroups into in order.

Example

Extract plan query parameter from non-URL field uri.

Configuration Input Output
[
["decode"],
["json_load"],
["regex_extract", {"column": "uri", "output_columns": ["plan"], "regex": "(?:(?:.*)plan=([^&]*)?(?:.*)|.*)"}],
["json_dump"]
]
{"user": "John", "uri": "/channel/videoplay/?vidid=7246927612831078230&plan=pro&view=dark&loggedin=yes"} {"user":"John","uri":"/channel/videoplay/?vidid=7246927612831078230&plan=pro&view=dark&loggedin=yes","__ia__unique__token__":6723934658596588009,"plan":"pro"}

regex_replace

Perform regular expression replacement on column. The replacement string uses Python-style backreferences.

Parameter Type Required? Description
column string Yes Column to perform the replacement with.
regex string Yes Regular expression to match.
repl_string string Yes Replacement string to substitute.
output_column string No Column name to put the output into. Defaults to column.

Example

Use regex_replace to prefix the uri field with scheme and host.

Configuration Input Output
[
["decode"],
["json_load"],
["regex_replace", {"column": "uri", "regex": "^(\/)", "repl_string": "https://john.interana.com/"}],
["json_dump"]
]
{"user": "John", "uri": "/channel/videoplay/?vidid=7246927612831078230&plan=pro&view=dark&loggedin=yes"} {"user":"John","__ia__unique__token__":6723934658596588009,"uri":"https://john.interana.com/channel/vi...k&loggedin=yes"}

regex_array_extract

For each element in array, extract matching regular expression subgroups into new columns.

Parameter Type Required? Description
column string Yes Name of the column to extract from.
regex string Yes Regular expression to match.
output_columns list Yes List of column names to put regex subgroups into, in order.

Example

Extract categories from list of products.

Configuration Input Output
[
["decode"],
["json_load"],
["regex_array_extract", {"column": "products", "regex": "^([a-z]*)-.*", "output_columns": ["categories"]}],
["json_dump"]
]
{"user": "John", "products": ["produce-hdy438ed", "cleaning-or944dus", "seafood-pd33sk2q"]} {"user":"John","__ia__unique__token__":3040505961674288811,"categories":["produce","cleaning","seafood"],"products":["produce-hdy438ed","cleaning-or944dus","seafood-pd33sk2q"]}

regex_array_replace

Perform regular expression replacement on each element in an array. Replacement string uses Python-style backreferences.

Do nothing if column is not an array.

Parameter Type Required? Description
column string Yes Column to perform replacement with.
regex string Yes Regular expression to match.
repl_string string Yes Replacement string to substitute.
output_column string No Column name to put the output into. Defaults to column.

Example

Use regex_array_replace to remove trailing identifiers from items in the products list.

Configuration Input Output
[
["decode"],
["json_load"],
["regex_array_replace", {"column": "products", "regex": "-[\S]*", "repl_string": ""}],
["json_dump"]
]
{"user": "John", "products": ["produce-hdy438ed", "cleaning-or944dus", "seafood-pd33sk2q"]} {"user":"John","products":["produce","cleaning","seafood"],"__ia__unique__token__":3040505961674288811}

dictionary_replace

Apply dictionary replacement to a column.

Parameter Type Required? Description
mapping dict Yes Dictionary mapping strings to replacements.

Example

Use dictionary_replace to change product codes into product names.

Configuration Input Output
[
["decode"],
["json_load"],
["dictionary_replace", {"mapping": {"xkdruw67": "mp3player", "skr885jc": "dvdcleaner", "kk284jds": "cartridge"},"column": "product"}],
["json_dump"]
]
{"first": "Sam", "product": "xkdruw67", "event": "purchase"}
{"first": "John", "product": "skr885jc", "event": "purchase"}
{"first": "Elyse", "product": "kk284jds", "event": "purchase"}
{"__ia__unique__token__":6913206265407409391,"event":"purchase","product":"mp3player","first":"Sam"}
{"__ia__unique__token__":496410646764207962,"event":"purchase","product":"dvdcleaner","first":"John"}
{"__ia__unique__token__":9151368650616356276,"event":"purchase","product":"cartridge","first":"Elyse"}

json_load_column

Deserialize a column containing a JSON encoded string. This is useful for CSV files containing JSON encoded strings.

Example

Load value of embed as JSON.

Configuration Input Output
[
["decode"],
["json_load"],
["json_load_column", {"column": "embed"}],
["json_dump"]
]
{"user": John, "embed": "{\"action\": \"click\"}"} {"__ia__unique__token__":673996973727761320,"embed":{"action":"click"},"user":"John"}

if_then_otherwise

Run steps conditionally. If condition evaluates to True for an individual event, the then list of steps will be run. Otherwise, the otherwise list of steps will be run (if provided).

Parameter Type Required? Description
condition step Yes Condition to determine which steps to run.
then list Yes Steps to use if condition is True.
otherwise list No Steps to use if condition is False.

Conditions 

Condition Parameter 1 Type Parameter 2 Type
regex_match column string pattern string
regex_search column string pattern string
contains column string    
starts column string prefix string
ends column string suffix string
in_list column string value string
not_in_list column string value string
equals column string value string
not_equals column string value string
lt (less than) column string value string
gt (greater than) column string value string
lte (less than or equal) column string value string
gte (greater than or equal) column string value string
bool_eval code string    
and_ conditions list    
or_ conditions list    

Example

If the URL path indicates a payment, add the page = pay label.

Configuration Input Output
[
["decode"],
["json_load"],
["if_then_otherwise", {"condition": ["regex_match", {"column": "url.path", "pattern": "^\/pay\/.*"}], "then": [["add_label", {"column": "page", "label": "pay"}]]}],
["json_dump"]
]
{"user": "John", "url.path": "/pay/addcard/"}
{"user": "John", "url.path": "/dashboard/configure/"}
{"user":"John","__ia__unique__token__":6826495190851080943,"page":"pay","url.path":"/pay/addcard/"}
{"user":"John","__ia__unique__token__":4857886812416484204,"url.path":"/dashboard/configure/"}

code_snippet

Write custom Python code to transform each event.

Parameter Type Required? Description
code string Yes Python statements operating on the variable line.
import_list list No Module names to import for use in this code snippet.

Example

Use code_snippet to copy the event column to the action column.

Configuration Input Output
[
["decode"],
["json_load"],
["code_snippet", {"code": '''
line['action'] = line['event']
'''}],
["json_dump"]
]
{"user": "John", "event": "click"}
{"user": "John", "event": "purchase"}
{"user":"John","event":"click","__ia__unique__token__":8820840440868449619,"action":"click"}
{"user":"John","event":"purchase","__ia__unique__token__":1537361993526075803,"action":"purchase"}

After Transformations

json_dump

Returns a dictionary as a JSON-encoded object. This is the only dictionary-to-line step. This should generally be the last step.

Example

Transform Python dictionary to JSON for ingest.

Configuration Input Output
["json_dump"] n/a n/a
  • Was this article helpful?