Skip to main content
Interania

Work with nulls in Interana

Explains what nulls are and how to use them in Interana
0votes
3updates
113views

What are they?

Null is a special value that means "absence of data". For example if I have a dataset that is a list of purchase events, I might have a field called "price". If this field is missing from some events I would say that the value of price in those events is "null". Note that it's not exactly the same as being 0, it turns out there are some subtle differences in how it is handled. In this example a price of 0 might mean "free", while a null might mean "I don't know" or "N/A".

To extend the example, imagine I have some events that are purchases and some that are store visits without purchases. If I'm computing the "average purchase price" I probably want to average in the purchases with price 0, but I probably don't want to average in the "N/A"s from the events that weren't purchases, and if there were some purchases for which I didn't know the price I would probably also like to exclude them. The idea of "null" as a distinct idea from "zero" is that way that Interana (and most database systems) let you express this.

It's also worth noting that Interana works really well with data that has lots of null. It is common to use Interana with datasets that have thousands of fields, most of which are null in most events. This shows up a lot when data is loaded from many sources into the same table in Interana. For example a purchase event might have a price, while a photo upload event has a photo name, and a sensor reading has a temperature in degrees C. These will all be null in the other kinds of events, but Interana is cool with that.

Where do they come from?

Interana assigns the value null to any field that is missing in the source data. In json data this could be the field not listed at all in the json dictionary, or listed with value null, NaN, or empty string. For example, the rows below will all be equivalent in Interana, with message = "hi" and more = null. (note that literal NaN is not strictly valid json, but shows up in practice pretty often)

{"message": "hi"}

{"message": "hi", "more": ""}

{"message": "hi", "more": null}

{"message": "hi", "more": NaN}

If a new field shows up in data after many events have already been ingested, all previous events will implicitly have value null for the new field.

What do they do?

You can access null values in Interana using the special token *null*. When looking at charts or samples, this is how you will see them labeled. In the filters you can compare directly to *null*, for example "price is one of *null*" will be true when price is null.

  • Equals comparisons (other than to *null*) will be false on null, for example "price = 42" will always be false when price is null.
  • Not Equals is the opposite, "price != 42" is true when price is null.
  • All numeric comparisons (>, <, <=, >=) are false when the field is null.

Aggregation functions such as sum, average, median, and percentiles will ignore rows who have the value of null in the field being averaged or percentiled. Unique counts also ignore *null*, for example if the values 0, *null*, 1, and 2 appear in a field, the unique count of values for that field will be 3, not 4. Again this is because *null* isn't really a value, it's the absence of a value.

  • Was this article helpful?