Skip to main content
Interania

How to minimize the number of re-ingests

0votes
11updates
25views

Having to re-ingest data that has already been imported costs valuable time and access to your data. This document explains what you can do to avoid inconvenient re-ingests.

Before you begin ingesting your data

Planning ahead, properly preparing for the first ingest of your data, can prevent re-ingests later and save an enormous amount of time. This section outlines what to consider when preparing to ingest data for the first time.  

  • Have a clear idea of what you want your data to look like before sending it to be ingested. 
  • Communicate you decisions fully with your Customer Success Manager. They will need to know the following:
    • Name of the table
    • How many events the table will hold
    • How many columns the table will have
    • Format of the time column
    • How many shard keys and what each represents
  • Validate your data. If you find errors in your data after it has been ingested, a partial re-ingest is required to replace the bad data.
  • Verify that all of your columns are correct. Changing a column name or type after data has been ingested requires a re-ingest.
  • Make sure all stakeholders approve the initial data structure plan, to prevent differences of opinions that will result in time intensive re-ingests at a later date.
  • The more discussion you have with your Customer Success Manager about your table and data structure prior to ingest, the fewer re-ingests will be necessary later on. 

Planning for changes in your data

As your business and data consumption grow, changes to the structure of your data are inevitable. These changes can go smoothly, with little, if any disruption, if you plan accordingly. This section outlines the recommended guidelines you to follow.

  1. Communicate your planned changes fully with your Customer Support Manager. They will devise a plan that minimizes any potential disruption with accessing your data.
  2. Here again, it's important to include all the stakeholders in the decision making process for the changes that will be made, to avoid differences of opinion that will result in re-ingests later.
  3. Make all your decisions before you send your data to be ingested, or plan for small scale iterations until the changes are finalized.

The effects of changes made to imported data

It's important to understand the resulting effects the changes you make to your data can have, before you make them. As mentioned in the previous section, planning for changes ahead of time and communicating fully with your Customer Success Manager can avert unnecessary system downtime.

This section touches on some the common changes to data that result in a re-ingest or backfill. 

  • A backfill is when data is deleted and a one time import into a specified region (that is not the continuous import region) is performed. A backfill would also be performed if data wasn't received for a period of time, and the data for that time period needs to be ingested.
  • If you add a new shard key to data that has already been ingested, the data will need to be re-ingested.
  • If you change the format of a time column, the data must be re-ingested.
  • Changing a column during data ingest results in an unnecessary re-ingest.
  • If you change the format of raw data that has already been imported, a re-ingest is necessary.