Skip to main content
Interania

Derived column syntax and examples

0votes
3updates
88views
This applies tov2.21

Use derived columns to define new calculated columns after data has been imported. You can define the derived columns with D language functions, and they must return integer (long) or decimal (double) type values.

This document describes how to create a derived column, provides a detailed overview of the language features available when coding a derived column function, and gives some examples. 

Language features available for derived column functions

Derived Columns must return an Integer or Decimal

Derived columns must return an integer (technically a long) or decimal (double) value. 

Derived Columns use the D programming language

The D programming language is a compiled language with a feel similar to C/C++. When coding a D function for use in an Interana derived column, keep in mind the following limitations:

  • Your function must comply with the @safe annotation (statically checked to exhibit no possibility of undefined behavior)
  • Your function must comply with the pure keyword (cannot access global or static, mutable state except through its arguments)

important_icon.png For performance and security reasons, many standard D library functions are not available.


Interana built-in functions that operate on columns

Within your derived column, you can reference the following custom functions:

Column type Function / variable Notes
int, string long c(string field_name)

Given a column name (as a quoted string), returns its integer value for the current row.

Typically you can make a direct, unquoted reference to any column name to get its value, but in some cases the column name has a special character (like a .). Use this function in those cases.

string long match_string(long field_value, string pattern) Tests the value of a string column against a regular expression (returns 1 if match, else 0).
int_set, string_set long set_size(long field_value) Returns the number of elements in the int_set or string_set column (for this particular row).
int_set long set_contains(long field_value, long elem) Tests all values of an int_set column for the specified integer value. Returns 1 if match, else 0.
string_set long set_match_string(long field_value, string pattern) Tests all values of a string_set column against a regular expression. Returns 1 if match, else 0.

Referencing columns

Within your D function, you can reference Interana columns of type:

  • int
  • string
  • int_set
  • string_set

You can also reference columns containing "." characters. You must surround these columns with the c("<column_name>") function.

Referencing lookup columns

As of version 2.18, you can reference a lookup column when defining a derived column. You can use the columns in the lookup table as normal columns, or use the lc() function to get the value from that column. 

Use the syntax lc("column_name") to reference a lookup column. 

Unsupported references

Interana does not support the following references:

  • time columns (for example, of type milli_time)
  • named expressions, including cohorts, sessions, metrics, funnels
  • other derived columns

set_size and get_item functions

The set_size(set_column) function returns the number of elements in a set column (int_set, string_set). 

The get_item(set_column, index) function returns the element from the set at position index.

Tips for working with derived columns

  • Outputs for derived columns are restricted to integers and decimal values
  • You must escape "." in the column name or it will not compile. You can either rename the column or surround these columns with the c("<column_name>") function.
  • Derived columns reference the friendly column name. If you change that name after creating the derived column, the derived column will no longer work.
  • Column name limitations: the D language has some reserved characters. For example, Derived Columns cannot reference columns named "c", a reserved character in D. See the D language Lexical topic for more information. 

Available D Libraries

We do not currently support any D libraries.

Derived column examples

Basic syntax

The value for this column will always be 0.

long alwaysZero()
{
  return 0;
}

Filter for a specific value

The value for this column will be 1 if the email address is within the Interana domain, and 0 otherwise:

long isInteranaEmail()
{
  return match_string(email_address, ".*@interana\.com");
}

Compute the difference between columns

This function creates a derived column based on the difference between two other integer columns.

long rxctxf()
{
  long a = c("wifi.rxc");
  long b = c("wifi.txf");
  long x = a-b;
  return x;
}

Group values into buckets

This function puts funding amounts into four buckets for more convenient grouping:

long fundingRoundSize()
{
  if (raised_amount_usd > 100000000) return 3;
  else if (raised_amount_usd > 10000000) return 2;
  else if (raised_amount_usd > 1000000) return 1;
  else return 0;
}

This example partitions user operating systems into three large buckets (1=Apple, 2=Windows, 3=Linux):

long operatingSystem()
{
  if (match_string(userAgent, ".*Linux.*")) return 3;
  else if (match_string(userAgent, ".*Windows.*")) return 2;
  else if (match_string(userAgent, ".*Mac.*") || match_string(userAgent, ".*iOS.*")) return 1;
  else return 0;
}

Determine user types

This function determines which type of user we are dealing with based on their email domain:

long getUserType()
{
  if ( 1 == match_string(user, ".*@mycompany\\.com") ) 
  { return 0; } 
  else if ( 1 == match_string(user,".*@.*\\.edu") )
  { return 1; }
  else if ( 1 == match_string(user,".*@.*\\.gov") )
  { return 2; }
  else
  { return 99; }
}

Time conversion

This example performs a time conversion (note that we are referencing the `timestamp` column as an integer).

long hour_of_day()
{ 
  return (timestamp % 86400000) / 3600000; 
}

Similarly, this returns the minute of an hour: 

long minute_of_hour()
{ 
  return (timestamp % 3600000) / 60000; 
}

This function converts seconds to minutes, and rounds the value to the closest minute. 

long minutes()
{
  long secs = c("Usage.PlaySeconds");
  long minutes = secs / 60;
  if (secs % 60 > 29) return minutes +1;
  else return minutes;
}

This example converts seconds to minutes for a duration column that is stored as a decimal. 


note_icon.png Interana automatically multiplies decimals by one million (1,000,000). When doing math operations on decimals, divide by one million to return an answer in a more readable format.


long duration_mins() {
  return(clicks_viewing_duration_seconds / 60 / 1000000);
}

Find elapsed time

Use a derived column to find the time elapsed between the current time and the time that a user signed up for your product ('dateJoined').

long time_diff() 
{ 
  return timeStamp - dateJoined;
}

Create a derived column from a column syntax

Derive a column from one that contains a "." character (dot)

long item_is_free()
{
  // The value for this column will be 1 if the item is free, and 0 otherwise
  return c("item.cost") == 0;
}

Here's an example of what this would look like in the UI. Notice that the company names are stored in the data server as integers (for example, Apptimize is stored as 2707 and Vamo is  stored as 3059).

Work with negative numbers

Although you can't use negative values with our filters, you can work around this by creating a derived column that converts those values. In this example, the function converts latitude values, changing any negative latitudes to non-negative numbers.

long make_pos()
{
  return lat + 180000000;
}

note_icon.png Interana multiples all stored numeric values by 1,000,000; this function accounts for that by adding 180 million to the value, instead of 180.


View SEO information

This example allows you to create a column with all events that come from Google referrers.

long is_seo()
{
  return match_string(referrer_domain, ".*google.*");
}

View specific events

The example matches on an event type to create a column with all of the error events. 

long is_error_event() 
{
  return match_string(event, ".*error.*") || match_string(event, ".*Error.*");
}

Similarly, this example matches on the event type "signup" to create a column containing all signup events.

long is_signup_event() 
{
  return match_string(event, ".*signup.*") || match_string(event, ".*Signup.*");
}

Create a derived column

From the Settings > Manage Data tab, click to Edit an existing table. Click the Add Derived Column button in the lower-left corner.

In the Edit Derived Column dialog, type a D programming language function that returns a long datatype, then click Save Derived Column.

Now you can refer to your derived column normally in the user interface. In this example, we are grouping by the new funding_round_size derived column which allows us to quickly see the number of funding events that fall into four size buckets (<1M, 1-10M, 10-100M and >100M).

  • Was this article helpful?