Skip to main content
Interania

What you should know about structuring your data

0votes
22updates
75views

Data structure is the format, organization, and way in which you store data so it can be accessed and modified efficiently. The structure of your data is important, as it can affect how long it takes to ingest into Interana and make available for queries. Data structure can also affect performance, the time it takes to view the results of your queries. 

This document outlines the topics your Interana Customer Success Manager will discuss with you prior to uploading your data into your Interana cluster. Become familiar with these topics as they apply to your data so you can optimize performance with Interana.

Data format and organization

The format and organization of your data can affect how long it takes to process (transform), ingest (into Interana), and be available to query. 

Data formats
  • JSON is Interana's preferred file formats, for ease of transformation and ingest. Apache Parquet and other formats are supported, as well. Discuss the format of your data with your Interana Customer Support Manager.
  • ASCII format is preferred, and table names and column names must be in ASCII.
  • Unicode is encoded as UTF-8 strings on import.
  • File size is also a consideration for performance. The smaller the file size, the faster the data will be available to you in Interana. Large files take longer to load and transform, and therefore it takes longer for the data to be available to you in Interana.
  • The number of files can affect the time it takes for the data to be available in Interana, as well. A large number of files will take longer to transform and ingest. 
  • Interana recommends that you format your timestamp data according to the ISO-8601 standard. For example, 2015-10-05T14:48:00.000Z, which has a format string of %Y-%m-%dT%H:%M:%S.%fZ. If your timestamps do not follow the ISO-8601 standard or you cannot reformat your timestamps to follow the standard, Interana also supports Unix time plus a variety of common strptime() time format strings.
  • Review Best practices for logging data and Data types reference for more information on data formats.
Data organization
  • A logical organizational hierarchy improves the ingest time for your data.
  • Proof read your data for accuracy. Typos and other inherent errors can cause problems and impede performance.
  • When you have new data you want uploaded into Interana, where will it be stored? Will you use Amazon S3 or Microsoft Azure?
  • Your data should be auditable. It should be easy to tell which file an event came from.
  • Interana strongly recommends that you use the following directory structure format: mydata/{year}/{month}/{day}/{hour}/
  • Review Best practices for logging data for more recommendations on data organization.

Tables, shard keys, and columns

The number and types of tables in which you store your data on Interana, and the number of shard keys you use, are important to performance and necessary storage capacity. The number of columns and column names, also play a role in performance.

  • You should consider starting with one table. If you realize the data is too disorganized, then create another table.
  • Each shard key requires a copy of all of your data. Think carefully about how many shard keys are necessary for your data, as it directly affects the storage you'll need.
  • Think about the number of columns you need. If a column isn't necessary, then it should be omitted. The more columns there are, the slower performance will be.
  • Auto-generated column names that are based on data can cause an explosion of columns in the system (and these columns are typically not easy to work with in a dashboard). 
  • Think about what data should be stored in event tables and what data should be stored in lookup tables.
  • The less string columns in your data, the faster the performance will be.
  • In certain cases, string columns can be hexed and stored in the data tier. For example, a string column can only contain digits 0-9a-f and any spaces or punctuation will be stripped out. Storing strings as hex also eliminates the ability to do text matching queries on them. 
  • Review Best practices for formatting lookup table data for more recommendations for tables, shard keys, and columns.

Your expectations

We at Interana want all of our customers to be happy with our product. An important factor in ensuring your happiness is managing your expectations. Please discuss the following topics with your Interana Customer Success Manager:

  • How much latency do you expect? Do you want new data uploaded every minute, three minutes, ten minutes, or longer?
  • How much data retention does your company require? Is three months sufficient, or do you require thirteen months? The less data you retain, the faster the performance will be.
  • Do you expect to use Interana as a primary data storage tool, or a primary data visualization tool?
  • Was this article helpful?