Skip to main content
Interania

External API: query

1votes
45updates
1,042views
This applies tov2.22

Interana’s API gives you a way to extract summarized and aggregated data for use in downstream processes, data warehouses, dashboards, or reports. For some customers, Interana is one of several analytics processes they run and consolidation of these various analyses in a single report or dashboard is important. Other customers need aggregated data from Interana to feed into separate processes.  With the advent of the API, users can now run queries outside of the Interana front end.

The Interana external API is a REST API that allows integration with Interana outside of the standard interface. The API is deployed automatically as part of an Interana cluster installation. The first version of the API provides basic functionality for single measurement and time series queries.

Single Measurement Queries

Single measurement queries are queries that return a single result set. An example of a single measurement query is a Table view query that extracts a site’s top users based on the count of user events. Only one table of results is returned for the given time range. 

Time series queries are queries that return a result set for every data point in the query’s time range. In Interana Explorer, time series queries are rendered using the Time view.

Endpoints

The API currently supports one endpoint: query. The query endpoint allows the user to make queries against Interana. The URL path to the query endpoint is:

https://<cluster hostname>/api/v1/query

The query endpoint must be accessed with the GET http method.

Authentication and authorization

All requests to the API must be over SSL (https protocol).

The API uses a token based authentication model. Tokens can be created or revoked by Interana support. Tokens must be passed in the Authorization header of each request to the API in the following format:

Authorization: Token <token>

Every user account is authorized to make requests to the API, as long as they use a valid request token.

Requests

Requests to the query endpoint must be sent with the GET http method. The required query parameter defines the query to be executed on Interana. It is a JSON object that is URL-encoded and passed as a parameter to the request. 

Object format

The query object has the following format:

query

Name Type Required Description
dataset string yes The name of the dataset that you want to use.
start int yes The start time for the query. Represented as milliseconds since UNIX Epoch time.
end int yes The end time for the query. Represented as milliseconds since UNIX Epoch time.
timezone_offset int no Milliseconds offset from UTC, used for day alignment. This defaults to the configuration of your Interana instance, or Pacific daylight time if the instance is not configured (PDT = -7hr\*60m / hr\*60s / m\*1000ms / s = -25200000 ms).
queries array yes A list of objects containing details about the query. For most basic queries, this list will only contain one element (see queries).
group_by array no An array of strings listing the columns to group by. Applies to all query objects.
max_groups int no The number of groups to return if group_by is specified. Defaults to 10.
sampled boolean no Whether to run a sampled query. The default is true.
compute_all_others boolean no When group_by is specified, whether to compute the "All others" group. Defaults to false.

note_icon.png "Type" refers to the JSON type of the property. See http://www.json.org for more information.


queries
Name Type Required Description
type string no The type of query to run. Select single_measurement (the default value) or time_series.
measure object yes An object defining an aggregation to measure (see measure).
filter string no Filters to apply to the query. This uses the Advanced filter syntax.
measure
Name Type Required Description
aggregator string yes

The aggregation to measure.

One of: “count_star”, “unique_count”, “sum”, “avg”, “min”, “max”, “P1”, “P5”, “P10”, “P25”, “P50”, “P75”, “P90”, “P95”, “P99”

column string yes The column that you want to measure.

Responses

The API will return the http status code 200 for successful requests to the query endpoint, along with a JSON object containing the results of the query.

Object format

The query result object format is:

results

Name Type Description
columns array An array of column objects
rows array An array of row objects
columns
Name Type Description
label array or string A description of the column
type string

The type of the data in the column. Specify "array", 'number'', or ''time_series''

See Data types for more information.

rows
Name Type Description
values array The data corresponding to the defined columns. The order of elements in this array corresponds to the order of elements in the columns array. The length of this array will always equal the length of the columns array. The type of data will match the type defined in the corresponding column object (see Data types).
properties object A map of properties for the result. For time_series queries, this includes information about the time bucketing used to calculate the result. If no properties are applicable, this field will be omitted. See row properties, below.
row properties
Name Type Description
rate string Select ''day'', ''week'', or ''month''
resolution int The time between data points, in milliseconds
window int The length of the time window, in milliseconds. The window must be greater than or equal to the resolution setting.

Data types

The type property of column objects describes the type of data that will appear in each row. The following table describes the JSON format of the possible type values.

Name Type Description
number number  
array array  
time_series array An array of time_series objects

time_series

Name Type Description
timestamp int The timestamp of the data point in milliseconds since UNIX Epoch time
value number The value of the data point
properties array See time_series properties

time_series properties

Name Type Description
event_count int

The number of events used to compute the value. This is the number of events scanned in the time window (window).

For unsampled queries, this should equal the number of events that exist in that particular time window.

object_count int

The number of unique objects used to compute the value. 

For unsampled queries, object_count will equal value. For example, if you run an unsampled Count Unique of user_id that scans 500 events in the time window (the event_count), and count 20 unique users, the  object_count and value will be 20.

If you run a sampled Count Unique of user_id that scans 24 events in the time window (event_count), and counts 10 unique user_ids (object_count), the value returned is scaled accordingly (because of sampling) and returns 120 as the estimated unique user count for that time window.

Examples: single measurement and time series

Single measurement queries

In this single measurement example, the query is looking for the number of unique userids, grouped by artist, from April 25, 2016 to April 30, 2016.

The same query can be executed in the API with the following request and response calls. Note that start and end times are specified as milliseconds since epoch and timezone_offset is relative to GMT.

Request: single measurement

{ 
  “dataset”: “Music”,
  "start": 1461567600000,
  "end": 1461999600000,
  "timezone_offset": -25200000,
  "queries": [ {
    "type": "single_measurement",
    "measure": {
      "aggregator": "unique_count",
      "column": "userId"
    },
    "filter": "(`artist` != \"*null*\")"
  } ],
  "sampled": true,
  "group_by": ["artist"],
  "max_groups": 5,
  "compute_all_others": false
}

Response: single measurement

{ 
  "rows": [
    {"values": [ [ "3 Doors Down"], 31456] },
    {"values": [ [ "Justin Bieber"], 31336] },
    {"values": [ [ "OneRepublic"], 31772] },
    {"values": [ [ "Taylor Swift"], 30136] },
    {"values": [ [ "The White Stripes"], 27036] }
   ],
  "columns": [
    {"type": "array", "label": ["artist"] },
    {"type": "number", "label": "measure_value"}
   ]
}

Time series queries

If the single measurement query example used above is issued as a time series query, each x-axis point in the query time range returns a count of each user’s events for that given time window. In other words, the query returns a separate result set for each point in the x-axis.

The time series query in the example below looks for the number of events between April 29, 2016 12:00 am and April 29, 2016 12:00 pm.

The same query can be executed using the API with the request call below. You can also view the corresponding response. Note that the response has been abbreviated given the large number of results.

In the request call, the query start and end times are specified in milliseconds since epoch, and timezone_offset, also specified in milliseconds, is relative to GMT. Finally, the sampled flag indicates whether to use sampling when running the query.

Request: time series

{ 
  “dataset”: “Music”,
  "start": 1461913200000,
  "end": 1461956400000,
  "timezone_offset": -25200000,
  "queries": [ {
    "type": "time_series",
    "measure": {"aggregator": "count_star"},
  } ],
  "sampled": true,
}

Response: time series

{ 
  "rows": [
    {"values": [ [ "All"], [
      {“timestamp”: 1461913200000, “properties”: {“object_count”: 25243, "event_count": 25243},
       "value": 8414.333333333332},
      ...
      {"timestamp": 1461934800000, "properties": {"object_count": 20977, "event_count": 20977},
       "value": 6992.333333333333},
      ...
      {"timestamp": 1461956400000, "properties": {"object_count": 27581, "event_count": 27581},
       "value": 9193.666666666666}
      ...
      ...
     ] ],
     "properties": {"window": 43200000, "resolution": 21600000, "rate": "minute"}
  } ],
  "columns": [
    {"type": "array", "label": ["result"] },
    {"type": "time_series", "label": "measure_value"}
   ]
}

Errors and retry

The API returns appropriate HTTP status codes for error cases and JSON objects containing error information. 

Status Codes

The possible error status codes for the query endpoint are:

Code Error type
400 Malformed query parameter
401 Invalid authentication token
500 Unexpected server error
504 The request timed out before the query could complete. The server-side timeout is 180 seconds.

Object format

The JSON error object format is:

Name Type Description
error string The class of the error: “Invalid parameter”, “Invalid token”, “Server error”, or “Request timed out”
message string A description of the error

Examples

400 status

{
  "error": "Invalid parameter",
  "message": "End time must be after start time",
}

429 status

{
  "error": "Request limit exceeded",
  "message": "The request limit of 1000 queries has been reached. This token can be used for requests on 2016-03-18 00:00:00"
}

Retry

Some queries that time out in the API may be cached on the server. Retrying the API request can sometimes result in retrieving results successfully. We recommend limiting retry policies to a small number of retries to avoid excessive load on the server.

Request limits and throttling

By default, tokens are authorized to make 1 query per second and 1000 requests per day. Once that limit has been exceeded, requests using that token will be rejected with HTTP error 429 until the next day. Contact Interana customer support to request a limit increase.

Versioning

This is version 1 of the External API, indicated by the string “v1” in the URL path. This API may be expanded in future releases.

  • Was this article helpful?