Skip to main content

 

Interana Docs

Understand scope

Understanding the concept of query scope in Interana is important to understanding exactly how Interana derives query results. This article describes:

  • what we mean by query scope,
  • how changes in query scope affect how results are calculated, and
  • how to apply knowledge of query scope to construct and interpret queries in Interana.

We begin with a definition of scope. We then inspect each of the scopes (event, actor, and flow) in Interana. Finally, we discuss how a query that references multiple types of properties (event, actor, or flow) interprets those properties so that properties of a different type fit into the query scope.

This article provides example queries, accompanied by explanations of how the queries are calculated. The goal of these explanations is not to describe the exact calculations Interana performs to arrive at the example result, but to provide a mental model for Interana users that wish to understand the role scope plays in the query calculation. This article does not cover topics like merging results from the distributed data tier, scaling sampled queries, or performance optimizations. In some cases the exact order of operations might be slightly different or asynchronous, but the results will be the same as if the query followed the described order of operations exactly.

Query scope: A general definition

In general, every Interana query or measure involves some sort of aggregation. Here are some aggregations (with example context) that you can specify in Interana:

  • count unique users that viewed the support page
  • average session durations for Android devices
  • sum play time per music service

The scope of a query refers to the type of entity an aggregation in Interana iterates over to produce a result. An aggregation can iterate over events, actors, or flows. It follows that Interana's three query scopes are called event scope, actor scope, and flow scope. 

An example of a query in each scope is as follows:

Scope Example Query
Event count(events) for events where event_name = purchase
Actor average(purchases per actor) for actors with at least two purchase events
Flow average(session duration) for Android device sessions

As you can see, counting each individual event is an event scope query, averaging a per-actor value is an actor scope query, and averaging a per-session value is a flow scope query. When computing these query results, Interana iterates over event, actors, and flows, respectively.

Interana automatically determines the scope of a query by inspecting query arguments. If a query is averaging something that is calculated on a per-actor basis, it is highly likely that the scope of that query is actor scope.

With that, let's study each individual scope.

The scopes of Interana queries

In this section, we investigate each query scope independently.

To do so, consider queries on a simple 10 event dataset. Here are the exact events that would have been ingested into Interana for the following examples:

{"user":"John","ts":"2020-01-25T12:00:00Z","action":"login","latency":43}
{"user":"John","ts":"2020-01-25T12:01:00Z","action":"read","latency":11}
{"user":"John","ts":"2020-01-25T12:02:00Z","action":"post","latency":25}
{"user":"Anca","ts":"2020-01-25T13:00:00Z","action":"read","latency":12}
{"user":"Anca","ts":"2020-01-25T13:01:00Z","action":"post","latency":27}
{"user":"Anca","ts":"2020-01-25T13:02:00Z","action":"post","latency":29}
{"user":"Manasa","ts":"2020-01-25T14:00:00Z","action":"read","latency":9}
{"user":"Manasa","ts":"2020-01-25T14:01:00Z","action":"read","latency":121}
{"user":"Manasa","ts":"2020-01-25T15:00:00Z","action":"read","latency":9}{"user":"Manasa","ts":"2020-01-25T15:01:32Z","action":"post","latency":28}

Event scope

An Interana query in event scope iterates directly over the events imported to an Interana table (also known as a dataset).

You can visualize the data aggregated over in an event scope query as a spreadsheet or relational database table, with one row per event, and each column corresponding to either a raw event property imported directly from the source data, or a manual event property created in Interana. For instance, you can visualize event scope for a query in our sample dataset as (where is_high_latency_read is a user-created manual event property that returns true if action == read and latency >= 100):

user ts action latency is_high_latency_read
John 2020-01-25T12:00:00Z login 43 false
John 2020-01-25T12:01:00Z read 11 false
John 2020-01-25T12:02:00Z post 25 false
Anca 2020-01-25T13:00:00Z read 12 false
Anca 2020-01-25T13:01:00Z post 27 false
Anca 2020-01-25T13:02:00Z post 29 false
Manasa 2020-01-25T14:00:00Z read 9 false
Manasa 2020-01-25T14:01:00Z read 121 true
Manasa 2020-01-25T15:00:00Z read 9 false
Manasa 2020-01-25T15:01:32Z post 28 false

In an actual query, Interana loads only the columns that are used in the query to perform the calculation. The table above represents the data iterated over only if the event scope query accesses all of the columns and manual event property. For example, in a simple average(latency) query, Interana loads only the latency column (still one row per event) to produce the result.

With this table in mind, we can now think about aggregating over the rows (events) in the table in order to produce a query result.

Let's do so now, answering the question how many unique users did a read action on 2020-01-25? using an event scope approach.

First, since this query relies on the user and action event properties, we can visualize the data we iterate over to produce the result as a table with two columns (representing the user and action event property values) and 10 rows (one row for each event because this query is in event scope):

user action
John login
John read
John post
Anca read
Anca post
Anca post
Manasa read
Manasa read
Manasa read
Manasa post

Also, since we applied a filter for our events, we can filter to just the events where action == read:

user action
John read
Anca read
Manasa read
Manasa read
Manasa read

Finally, we can apply our count_unique(user) aggregation over these events to produce our result:

count_unique("John","Anca","Manasa","Manasa","Manasa") = 3

The key takeaway here is that our count unique aggregation iterated over events to produce the result. Iterating over events is what makes a query event scope.

Here is what the event scope query in our example looks like in the Interana UI:

clipboard_e30f5b1ec94e1a7522d7a3cd349dcb390.png

Actor scope

An Interana query in actor scope iterates over the actors contained within the Interana table, or dataset. Unlike in event scope, the actor-based tables that an actor scope query iterates over are not stored on disk or generated during the Interana import process. Instead, Interana produces an actor-based table from a subquery over the event data, and then produces results from the actor-based table.

We can answer the question posed in the event scope section how many unique users did a read action on 2020-01-25? by issuing an actor scope query instead of an event scope query. Let's see what that looks like.

First, we need to instruct Interana to produce data in the actor scope that we can iterate over. We do this by creating an actor property. To answer the question, we create an actor property that calculates the number of reads per user. In the Interana UI that property looks something like this:

clipboard_ed010228ca1cf054f5b598cd15c6878e0.png

When we use an actor property like this in a query, behind the scenes, Interana issues a subquery to calculate the per-actor result. The result of this subquery is a (temporary) table in actor scope. We can visualize this similarly to how we visualized the event scope data in the previous section, with one key difference: in actor scope, the data that our aggregation iterates over has one row per actor instead of one row per event.

The subquery that Interana runs in the case of our reads_per_user actor property is quite simple:

count(events where action = read) split by user

This subquery produces our actor scope data, which in our example looks like this:

user reads_per_user
John 1
Anca 1
Manasa 3

From this table, it is easy to produce a query result for our query how many unique users did a read action on 2020-01-25? We can apply a filter to filter to the actor table above on reads_per_user > 0, and then do our unique user aggregation:

count_unique("John","Anca","Manasa") = 3

Or, because we are in actor scope where we have one row per actor, we could even answer this with a simple count aggregation as an optimization:

count(per-actor rows where reads_per_user > 0) = 3

The key takeaway here is that our final aggregation iterates over actors to produce the result. Iterating over actors is what makes a query actor scope.

Here is what the actor scope query in our example looks like in the Interana UI:

clipboard_e45e90caf14b542ba7a8aa73072c8e30d.png

In this particular case, we were able to get to the same result using both actor and event scope queries. The example query we have been investigating was selected intentionally so that we could compare and contrast event and actor scope queries, but leaves much to be desired when it comes to why these different scopes are useful. To help understand this, here are two queries that we can answer using actor scope queries but that are less straightforward (yet not impossible, as we'll see in later sections) to answer in event scope:

How many unique users did at least 2 read actions on 2020-01-25?

user reads_per_user
Manasa 3

clipboard_ef1e16f9ee4743bab7eabf01e63fdbe30.png

How many unique users did 0 login actions on 2020-01-25 (but did some other kind of event)?

user logins_per_user
John 1
Anca 0
Manasa 0

clipboard_ef1aa884a1883d7a353b527be09071629.png

Flow scope

As you have likely now deduced, an Interana query in flow scope iterates over the flows contained within the Interana table / dataset. Similarly to actor scope queries, the flow-based tables that a flow scope query iterates over are not stored on disk or generated during the Interana import process. Instead, Interana produces a flow-based table from the events imported to Interana using the logic defined in the flow definition, and then produces results from the flow-based table.

For our flow scope example, we'll use this simple flow that starts and transitions on the first three events of a session as defined by 5 minutes of inactivity:

clipboard_e471508ab60e1922f6fa412685d54d011.png

The count flow instances query that produces the results that the Sankey view displays is flow scope, but for the sake of simplicity we'll run a more straightforward query to illustrate flow scope in action:

clipboard_ecf89100f85b0c54354e47df8eb8b6aae.png

For this query, Interana produces flow scope data that looks something like this from the 10 raw events ingested into Interana:

flow_id user total_time_in_flow
1 John 120 s
2 Anca 120 s
3 Manasa 60 s
4 Manasa 92 s

And then to produce our final result, we simply average(120 s, 120 s, 60 s, 92 s) to get an average time of 98 s.

The key takeaway here is that our final aggregation iterates over flows to produce the result. Iterating over flows is what makes a query flow scope.

 

Resolving scope mismatch

After reading the previous section, hopefully you are gaining confidence in your understanding of the event, actor, and flow scopes of Interana. With this understanding, we are now equipped to investigate the behavior of Interana queries when event, actor, and flow properties are combined in the same query.

Recall our model of the actor scope data in the previous section example:

user reads_per_user
John 1
Anca 1
Manasa 3

Now imagine adding a filter such as latency(event property) >= 10 to this actor scope query. The event property has a value that varies per event, and the data being iterated over has only a single row for each actor. A single actor can have many values of latency for a given time range! Interana software cannot automatically combine these two properties. Instead, we need to somehow bring the event property into the actor scope for this query to be valid. 

Each time an actor or flow property is referenced in an event scope query, the actor or flow property must be brought into the event scope for the query to make sense. The same goes for an event or flow property in an actor scope query, and an event or actor property in a flow scope query. Interana often does this automatically, but to fully understand queries that mix these different types of properties, it is important to understand exactly how Interana resolves these differences in scope.

The following sections detail the ways to resolve these scope mismatches, as well as the Interana defaults for doing so. For jump links to the appropriate section, see the following matrix:

  Event property Actor property Flow property
Event scope query No action needed Bringing actor properties
into event scope
 
Bringing flow properties
into event scope
Actor scope query Bringing event properties
into actor scope
No action needed Bringing flow properties
into actor scope
Flow scope query Bringing event properties
into flow scope
Bringing actor properties
into flow scope
No action needed
Bringing actor properties into event scope

In Interana, you can access actor properties in event scope queries by left joining actor scope data (produced by an actor property subquery) to the event scope data on actor. For instance, the following query is an event scope query that references an actor property:

clipboard_e2dfefd70bda106bf262c4c3e793d22ea.png

In this query, the user has requested the average(latency) of the events associated with actors who have had more than two read events. Let's apply what we have learned to understand how Interana calculates this query result.

First, actor scope data is produced by an actor property subquery, as defined in the actor property definition:

user reads_per_user
John 1
Anca 1
Manasa 3

Next, the actor scope data is left joined to the event scope data on user:

user latency reads_per_user
John 43 1
John 11 1
John 25 1
Anca 12 1
Anca 27 1
Anca 29 1
Manasa 9 3
Manasa 121 3
Manasa 9 3
Manasa 28 3

Interana filters the events to the events where reads_per_user > 2:

user latency reads_per_user
Manasa 9 3
Manasa 121 3
Manasa 9 3
Manasa 28 3

And finally Interana performs the average(latency) aggregation:

average(9, 121, 9, 28) = 41.75

Bringing flow properties into event scope

Similarly to actor properties, flow properties can be accessed in event scope queries by left joining flow scope data (produced by a flow property subquery) to the event scope data on an internal flow ID. For instance, the following query is an event scope query that references a flow property:

clipboard_e95b7664c224310ec5ae5029d7eca2356.png

This query produces the average(latency) of events involved in flows that lasted less than 90 seconds. Let's apply what we have learned to understand how this query result is calculated.

First, Interana produces flow scope data by iterating over events and calculating flows according to the flow definition:

flow_id user total_time_in_flow
1 John 120000
2 Anca 120000
3 Manasa 60000
4 Manasa 92000

Next, Interana joins that flow scope data to event scope data in preparation for our event scope aggregation:

user flow_id total_time_in_flow latency
John 1 120000 43
John 1 120000 11
John 1 120000 25
Anca 2 120000 12
Anca 2 120000 27
Anca 2 120000 29
Manasa 3 60000 9
Manasa 3 60000 121
Manasa 4 92000 9
Manasa 4 92000 28

Interana then applies the filter to our events:

user flow_id total_time_in_flow latency
Manasa 3 60000 9
Manasa 3 60000 121

Finally, Interana performs an event scope aggregation to product the desired result:

average(9, 121) = 65

For instructions on using a flow property in an event scope query, see Query on stages in a flow.

Bringing event properties into actor scope

A common use case for Interana queries is to reference event properties in actor scope queries. Bringing an event property into actor scope is a bit more complicated than the other way around, because there are several valid ways to bring an event property into actor scope (instead of a simple left join). The correct way to go about doing this depends on the question you are trying to answer.

The ways to do this fall into two categories: an event property can be brought into actor scope by creating an actor property from that event property, or existing actor properties in the actor scope can be modified to include the event property in their definitions.

For example, imagine we have the following actor scope query:

clipboard_e83619cac76f67dcd47637e87a632a79c.png

Often Interana analysts want to apply an event property filter to a query like this, such as latency >= 50. At this point, there are multiple reasonable queries that could result from adding a latency >= 50 filter to such a query. For instance:

  1. what is the average per-user read count for users with at least one high latency event on 2020-01-25?
  2. what is the average per-user read count for users with at least one high latency event in that user's entire lifetime?
  3. what is the average per-user read count for users whose median latency over all events on 2020-01-25 >= 50?
  4. what is the average per-user high latency read count on 2020-01-25?

All of the above queries are valid interpretations of adding a latency >= 50 filter to an average(reads_per_user) actor scope query. All of these queries can be answered in Interana by either creating an actor property from the event property (Queries 1, 2, and 3) or modifying the existing actor property (Query 4).

You can do this by creating or altering the properties manually. Let's do so now for the four queries above.

Query 1: What is the average per-user read count for users with at least one high latency event on 2020-01-25?

clipboard_ecd4081899b47c832df1ec0e26adbe4e0.png

First, we produce actor scope data with a subquery that produces reads_per_user and count_high_latency_events columns:

user reads_per_user count_high_latency_events
John 1 0
Anca 1 0
Manasa 3 1

Next, we filter to actors that satisfy the condition set in the filter:

user reads_per_user count_high_latency_events
Manasa 3 1

Finally, we average(reads_per_user):

average(3) = 3.

Query 2: What is the average per-user read count for users with at least one high latency event in that user's entire lifetime?

clipboard_e049d457f0bf3c6aa48159a80d1723e28.png

This query executes similarly to the previous query, except the actor properties must be calculated in separate subqueries, since they have different time specs. We still end up with actor scope data that we iterate over to produce the final result. In this case the result is equivalent to the result of the query above, since all events in the dataset are located on 2020-01-25.

Query 3: What is the average per-user read count for users whose median latency over all events on 2020-01-25 >= 50?

clipboard_ec0db90fb075f6c2f46cca72d70083105.png

Same general process as Query 1: produce actor scope data from count(reads), median(latency) group by user subquery:

user reads_per_user median_event_latency
John 1 25
Anca 1 27
Manasa 3 28

Since none of our actors match the median_event_latency filter, average(reads_per_user) produces an empty result.

Query 4: What is the average per-user high latency read count on 2020-01-25?

This approach is different, since we modify the existing reads_per_user actor property instead of creating a new actor property.

clipboard_e4c82a3964e325dda82564aaee3e893b4.png

For this approach, we simply modify the definition of reads_per_user to include a filter for latency >= 50. From here, the query is calculated as you would expect an actor scope query to be calculated - we produce actor scope data from a subquery defined by our actor property, and aggregate over that actor scope data.

It is possible to add an event property to an actor scope query in Interana without manually resolving the scope difference with the methods listed above. The behavior of Interana in this case is similar to the approach in Query 4 above, in that the actor properties in the query are modified to incorporate the event property. Instead of adding to the filter of the actor property, however, the actor property is actually calculated on a per (user, event property) basis. This allows a consistent behavior for adding event properties to actor scope queries in both filter and split-by, but can produce surprising results if not ready for them. We are currently working on providing a way for users to easily select the behavior that they want when adding an event property to an actor scope query.

 

 

 

Bringing flow properties into actor scope

Bringing a flow property into actor scope is similar to bringing an event property into actor scope in that it is possible for an actor to have multiple values of a flow property for a given time range (if they start multiple flow instances within the time range, for example). It follows that the methods of bringing flow properties into actor scope are similar to the methods used for bringing event properties into actor scope. We can either construct a new actor property that performs an aggregation over flow scope data, or modify an existing actor property to incorporate events associated with a particular flow property.

For instance, we can produce the average read count for users whose total session duration > 2 minutes by creating an actor property that references flow total time:

clipboard_ed8b370133d1b12eefe4e4e588caf6eae.png

Produce actor scope data from actor property subqueries

user reads_per_user total_time_in_session
John 1 120000
Anca 1 120000
Manasa 3 152000

Filter

user reads_per_user total_time_in_session
Manasa 3 152000

Average(3) = 3

Or we could add flow filter to the definition of our reads_per_user actor property:

clipboard_eca009cc3c3643c7ff093d1abfc202a01.png

Here our actor property subquery is event scope, so for this filter set to work Interana will bring the required flow properties into event scope for the actor property subquery. The top level query will be executed over the data:

user reads_in_completed_flow_per_user
John 1
Anca 1
Manasa 0

Average(1, 1, 0) = .67

Bringing event properties into flow scope

Bringing an event property into flow scope is similar to bringing an event property into actor scope. The event property can be integrated by either using the event property in a flow property, or altering the definition of the flow itself.

For instance, to filter to flow instances with at least one login event, create a flow property that counts login events per flow instance. This is what the query looks like in the Interana UI:

clipboard_e3d8e7d3a12ab92635611bc47e0166a64.png

First Interana produces flow scope data using the flow definition and collect flow scope logins_per_session property:

flow_id user logins_per_session
1 John 1
2 Anca 0
3 Manasa 0
4 Manasa 0

Filter

flow_id user logins_per_session
1 John 1

Count(rows) = 1

We could also incorporate event properties directly into the flow definition (note the action matches read filter in the flow definition):

clipboard_e9c5f34ff8d0b8c3b5f3edcb0ff89fa9b.png

clipboard_e4eb8ec6e0d68d193cdde658f294ebf70.png

Produce flow scope data using new flow definition that filters to read events only.

flow_id user total time in read session
1 John 0s
2 Anca 0s
3 Manasa 60s
4 Manasa 0s

A flow that has only one event has a total time of 0 seconds, since total time = timestamp(last event in flow) - timestamp(first event in flow)

Average(0,0,60,0) = 15s.

Bringing actor properties into flow scope

Bringing actor properties into flow scope works similarly to bringing actor properties into event scope. Since actor data is present in both flow and actor scopes, we can simply left join actor scope data to flow scope.

Here is an Interana query where this happens:

clipboard_e1743a6f29dd5ff16bad303fac58241ec.png

First, we produce flow scope and actor scope data independently:

flow_id user
1 John
2 Anca
3 Manasa
4 Manasa
user reads_per_user
John 1
Anca 1
Manasa 3

Next, left join actor scope data to flow scope data:

flow_id user reads_per_user
1 John 1
2 Anca 1
3 Manasa 3
4 Manasa 3

 Filter:

flow_id user reads_per_user
3 Manasa 3
4 Manasa 3

And finally, aggregate:

Count(Rows) = 2

  • Was this article helpful?