Skip to main content
Interania

Apply a hex transform to a shard key column

4votes
9updates
355views

It is not uncommon for organizations to store their user id's as hexadecimal strings. Due to the high cardinality nature of these strings, in Interana it is a best practice to actually store them as integers on the data tier, saving the string tier for more valuable descriptive string information.

Interana has a hex transformation that allows us to do exactly that. You can apply this transform to any fixed-length hex column that has a character count divisible by 8. In this walkthrough I'll show you how to apply this transform to a 40-character hex shard key, whether you're creating a new table or working with an existing table.

If you wish to apply the hex transform to a non-shard-key column, you can change the column type without directly editing the configuration database using change_column_type.py - the rest of the procedure remains the same.

In this howto I will show you how to apply a hex transform to your shard key. I will use a very small example data (hex.json) set throughout the example:

{"timestamp":"1472746778394","event":"login","user":"9ea33394f43a2b9784c07ec205428b9ff1a80c37"}
{"timestamp":"1472753625636","event":"screenview","user":"fc781804067ff62b4e3441936454d851156fbe86"}
{"timestamp":"1472719762790","event":"sign up","user":"ff2d615ea9df4f796795a072dd30066d0850ab4c"}
{"timestamp":"1472730854707","event":"like","user":"f71cb2d589d0cb3d33ff9e4a4cfcf12b520db704"}
{"timestamp":"1472739684533","event":"like","user":"63df9f2f733ca15b2faae1791e356602906e731c"}

As you can see, we have the 3 parts of event data in the events above: an actor ("user") that generates the event, the time ("timestamp") that they generate the event, and an attribute "event" that describes the event. As you can see, our "user" shard key is 40 hex characters long. If we were to store a billion of these on the string tier, we would use a lot of resources! For this reason we will apply the hex transform on this column for our production table.

Before You Begin: Enable Hex Transform in Purifier Settings

Before you can use the hex transform, you must enable it in the configuration database. The configuration for this setting is in tablemap.purifier_settings. If you have not enabled this setting previously, insert the setting using the following command:

insert into tablemap.purifier_settings VALUES (-1, 1, "expand_hexN_columns", "1");

You may now proceed! If you're setting up a new table, read on! If not, skip down to "Old Table, First Step".

New Table, First Step: Create Your Table Using simple_setup.py

If you haven't yet created your table, you can use simple_setup.py to create your table with the shard key hex-transformed right off the bat. Using the example file above, issue the command:

/opt/interana/backend/import_server/tools/simple_setup.py --table_name hextable --time_column timestamp --shard_column user --time_format milliseconds --shard_type hex40

This will create your table. Next, import your file with:

/opt/interana/backend/import_server/tools/add_to_fast_import.py -t hextable ~/hex.json

You should be done! Go ahead and skip to the section entitled "Verify Hex Transform Efficacy".

Old Table, First Step: Delete and Configure

If you want to apply the hex transform to the shard key of an existing table, you've got a bit more work ahead of you. Your shard key is currently stored as a string, as shown below:

We'll need to re-configure your shard key column and perform a complete re-ingest in order to apply the hex transform.

First, wipe your table using delete_table_data.py (the -i parameter will delete your import records, only use -i if you're deleting the whole table):

/opt/interana/backend/import_server/delete_table_data.py -t hextable -i

Next, find the column id of your hex shard key:

select column_id from tablemap.table_columns where name="user" and table_id=(select table_id from tablemap.tables where table_name="hextable");

And finally update your column with the following changes (if your hex id is 32 characters, no need to set the conversion_function_params):

update tablemap.table_columns set conversion_function="to_hash_lower", conversion_function_params='{"digits": 40}', column_type="int" where column_id=12;

Now that your column is properly configured, it's time to bring in the data again!

Old Table, Second Step: Import Data with Hex Transform in Place

Import a file using add_to_fast_import.py, so we can check to make our changes were effective:

/opt/interana/backend/import_server/tools/add_to_fast_import.py -t hextable ~/hex.json

Verify Hex Transform Efficacy

Now, take a look at your import logs. Files added through add_to_fast_import.py are in /var/log/interana/import.log. You should look for several new columns detected: In our case:

You'll notice 5 new int columns are created to store the 40 character hex column - an int column for every 8 hex characters.

Finally, have a look at the column from the UI:

The UI should reconstruct your hex value from it's constituent user.i* columns. If everything's looking good, you're all done- start importing more data!

 

  • Was this article helpful?