Skip to main content
Interania

Admin Guide: Streaming ingest

1votes
65updates
215views

Streaming ingest enables you to ingest a live stream of data into Interana from a web or cloud source with an HTTP API. To avoid confusion, let's clarify the difference between streaming ingest and other methods of ingesting data:

  • streaming ingest—a dynamic flow of live data, a high volume of data received via HTTP from a web or cloud source. 
  • continuous ingest—a cron job that pushes content into the cluster at a steady (data-in-motion) rate.
  • import—a file or batch of files (static data, data-at-rest) transferred at a prescribed time.

This Quick Start covers the following topics:

See Create a pipeline for an external Kafka cluster for information on how to conifigure a pipeline that connects to a particular Kafka broker (identified by its master zookeeper node) and consumes events on a particular Kafka topic.

Streaming ingest configuration

Streaming ingest provides the capability for Interana to accept a dynamic flow of events (data-in-motion) via an HTTP API, instead of accepting events in the form of files. The ability to process live data is accomplished with an Interana Listener node that has the following capabilities:

  • A uWSGI server that receives HTTP posts at high volume
  • An internal messaging layer
  • A service to coordinate and maintain shared data by a group of nodes 

You can add a Listener node to a single-node cluster, or stack it with another node (such as the import node) in a multi-node cluster. If desired, you can make the Listener a separate node in the cluster. For more information, see the Admin Guide: Install multi-node Interana.

Both single-node and multi-node clusters should include the following nodes:

  • Listener node—Streams live data from the Web or cloud.
  • Config node—Node from which you administer the cluster. MySQL database (DB) is only installed on this node for storage of Interana metadata. Configure this node first.
  • API node—Serves Interana application, merges results of queries from data and string nodes, and then presents those results. Nginx is only installed on the API node. 
  • Import node—Polls data repositories (S3, Azure, local file system), downloads new files, processes the data and then sends to data and string tiers, as appropriate.
  • Data node—Data storage, must have enough space to accommodate all events and stream simultaneous query results.
  • String node—String storage for the active strings in the dataset, stored in compressed format. Requires sufficient memory to hold the working set of strings accessed during queries.

Streaming ingest options

You can set up streaming ingest for Interana in the following ways: 

add_events API

The add_events API enables setting up streaming ingest

  • HTTPS is required.
  • Both GET and POST requests are supported.
  • The maximum message size is 1K. This API is intended for events, or small batches of events, and is not intended for uploading large files.
  • You can pass the following content-encodings: None, deflate, gzip
  • You can pass the following content-types: application/text
  • You can send JSON, or gzipped JSON, or CSV, or any type that Interana can transform.
  • No authentication is required. You can add events to the queue without authentication from the cluster.

For more information, see the add_events API reference.

Segment 

You can use Segment as an event source that sends the data to Interana. For more information, see the Segment integration cookbook.

Setting up streaming ingest

Setting up streaming ingest is a two step process:                                    

  1. Create a table and streaming ingest pipeline.
  2. Send events into a topic.

You will need Interana admin permissions to perform these tasks.

Before you begin, gather the following information for the tasks:

Type of information Your system
Ingest node IP address  
Listener node IP address  
Table name   
Topic name  
Time column name  

Time format — For Unix (epoch) time, the options are:

  • seconds
  • milliseconds
  • microseconds
 
Shard column name  

The message

[base 64 encoded with BASH ( | base64)]

 

1. Creating a table and streaming ingest pipeline  

This task demonstrates how to use the Interana CLI to create a table and pipeline that reads from a specified topic.

To create a table and a streaming ingest pipeline, do the following:
  1. Go to whichever machine has the Interana CLI installed and configured.
  2. Enter the following command, substituting the appropriate values for the <variables> and [<optional_variables>].

ia table create <table_name> <time_column_name> <time_format_type> <shard_key> [<shard_key_2>]

ia pipeline create <pipeline_name> <table_name> kafka -p kafka_topic <your_kafka_topic>

ia job create <pipeline_name> continuous yesterday today

The following example creates a mycooltopic table with a cooltopic topic, a ts time_column with milliseconds format, and a sk shard_column.

ia table create mycooltopic ts milliseconds sk

ia pipeline create mycoolpipeline mycooltopic kafka -p kafka_topic cooltopic

ia job create mycoolpipeline continuous yesterday today

2. Sending events into a topic

You use an add_events HTTP request to send events into a topic. The HTTP request consists of headers (marked with double quotes""), the Listener node IP address, and an ia_data parameter that specifies time column, shard column, and the base 64 encoded message being sent. For more information, see the add_events API reference.

curl is a tool used in command lines or scripts to transfer data.

To send events into a topic, do the following:
  1. Log in to the Ingest node.
  2. Enter the following command to curl an HTTP request, substituting the appropriate values for the <variables>.

curl -H "Content-Type:<type>" -H "topic:<topic_name>" -X GET "https://<listener_IP>/add_events?ia_data=(echo '{"<time_column_name>":<value>, "<shard_column_name>":"<value>", "<your message>":1}' | bas64)" -k

The following example (in headers) sends a message of type application/text to the cooltopic topic. This is a GET method to the Listener (IP address) node, followed by add_events with an ia_data parameter. The ts time column and value in milliseconds, sk shard column, and base 64 encoded message comprise the ia_data parameter. 

curl -H "Content-Type: application/text" -H "topic: cooltopic" -X GET "https://127.0.0.1/add_events?ia_data=$(echo '{"ts":1474921202000, "sk" : "a truly awesome message" , "wow": 1}' | base64)" -k

Managing streaming ingest on the Listener node

This section explains how to manage Listener node processes and data retention.

Starting, stopping, and monitoring listener processes

You can start, stop, and monitor the listener processes from the command line.

To manage Listener node processes, do the following:

  1. Log in to the Listener node.
  2. To view the status of Listener node processes, enter the following command.
sudo initctl status uwsgi
  1. To start Listener node processes, enter the following command.
sudo initctl start uwsgi
  1. To stop Listener node processes, enter the following command.
sudo initctl stop uwsgi

Human-readable logs are written to /var/log/interana/uwsgi.log

Managing listener node data retention

Listener nodes have a default 1day retention for raw data. The data is stored in /data/interana/kafka. Kafka is an OpenSource toolset used for building real-time data pipelines and streaming apps.

If you require a larger retention window, do one of the following:

  • Provision a bigger disk for the default /data partition. 
  • Configure the kafka_root to point to a separate drive.

For more information, see the Apache Kafka documentation or contact help@Interana.com.

add_events API reference

The Interana external API is a REST API that enables integration with Interana outside of the standard interface. The API is deployed automatically as part of an Interana cluster installation. The API provides the ability to consume a live stream of events through an ingest pipeline, and then transfer the events to an HTTP endpoint.

add_events

The add_events API is a single endpoint for sending events into Interana for ingest.

URL

/add_events

Method

Requests can be sent with the GET or POST http method.

URL Params

If you're making a GET request, the following parameter is required:

ia_data=$(echo '<your message>' | base64)

Data Params

If you're making a POST request, include your data in the request body.

Success Response

  • HTTP 202 OK

Error Responses

  • 500 internal server error
  • 400 bad request
Sample Calls

The GET version of the add_events API expects data as a URL parameter called "ia_data" with your data base64 encoded. The GET version of the add_events API is shown in the following example BASH command:

curl -H "Content-Type: application/text" -H "topic: <your topic name>" -X GET "https://<your interana listener hostname>/add_events?ia_data=$(echo '<your json data>' | base64)"
 

The POST version of the add_events API is shown in the following example BASH command:

curl -H "Content-Type: application/text" -H "topic: <your topic name>" -X POST -d '<your json data>' "https://<your interana listener hostname>/add_events"

  • Was this article helpful?