Documentation forPapertrail

Amazon Redshift

Papertrail log archives can be loaded into Amazon Redshift for ad-hoc relational queries against years of logs, such as:

Copy
SELECT * FROM events WHERE message LIKE '%error%';

To use Amazon Elastic MapReduce (EMR) or for a step-by-step walkthrough of setting up analytics services on AWS, see Log analytics with Hadoop and Hive.

Start cluster

Start by launching a Redshift cluster and following the getting started guide to bring it online. A single-node cluster with default settings works fine. Use psql to connect to your cluster.

Create table

We recommend using the id as the sort key and evenly distributing rows across nodes with DISTSTYLE even. To create the table, paste the following into psql:

Copy
CREATE TABLE events (
  id BIGINT SORTKEY NOT NULL,
  received_at_raw VARCHAR(25) NOT NULL,
  generated_at_raw VARCHAR(25) NOT NULL,
  source_id INT NOT NULL,
  source_name VARCHAR(128) ENCODE Text32k,
  source_ip VARCHAR(15) NOT NULL ENCODE Text32k,
  facility VARCHAR(8) NOT NULL ENCODE Text255,
  severity VARCHAR(9) NOT NULL ENCODE Text255,
  program VARCHAR(64) ENCODE Text32k,
  message VARCHAR(8192) DEFAULT NULL,
  PRIMARY KEY(id))                                                                                                                                                                            DISTSTYLE even;

Redshift does not recognize ISO 8601 timestamps at import (more), so timestamps are initially imported as VARCHAR and then converted to TIMESTAMP columns (below).

Load data

Load data from log archives:

Copy
copy events from 's3://<bucket_name>/<object_prefix>'
  credentials 'aws_access_key_id=<my aws_access key id>;aws_secret_access_key=<my aws secret access key>'
  delimiter '\t'
  gzip
  emptyasnull
  blanksasnull
  maxerror as 100000;

For example:

Copy
copy events from 's3://mylogbucket/papertrail/logs'
  credentials 'aws_access_key_id=abc123;aws_secret_access_key=def456ghi789'
  delimiter '\t'
  gzip
  emptyasnull
  blanksasnull
  maxerror as 100000;

Replace /papertrail/logs with the path on Archives.

Data from only a specific day or days may be loaded by running the above command with specific paths instead of the key root. For example:

Copy
copy events from 's3://mylogbucket/papertrail/logs/dt=2013-02-10/2013-02-10.tsv.gz'

Load errors

S3 Redirect

Redshift assumes that the S3 bucket is in the same region. To specify a bucket in a different region, see the REGION operator.

Invalid or unsupported characters

You may receive load errors due to UTF-8 characters, such as:

Copy
String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence: e9 (error 2)

Use the ACCEPTINVCHARS option (here) to tell Redshift how to handle these un-importable characters. With ACCEPTINVCHARS, Redshift will replace unrecognized UTF-8 characters with ?. Consider instructing Redshift to replace them with an underscore by adding this to the COPY:

Copy
ACCEPTINVCHARS AS '_'

Alternatively, consider escaping them with:

Copy
ESCAPE ACCEPTINVCHARS

Confirm load

To confirm that events were loaded, run:

Copy
SELECT COUNT(*) FROM events;

To see load errors, run:

Copy
SELECT * from stl_load_errors;

Optimizations

Change timestamps

The generated_at and received_at timestamps were imported as VARCHAR because Redshift cannot recognize ISO 8601 timestamps. After import, add new TIMESTAMP columns and use the CAST() function to populate them:

Copy
ALTER TABLE events ADD COLUMN received_at TIMESTAMP DEFAULT NULL;
UPDATE events SET received_at = CAST(received_at_raw as timestamp);
ALTER TABLE events ADD COLUMN generated_at TIMESTAMP DEFAULT NULL;
UPDATE events SET generated_at = CAST(generated_at_raw as timestamp);

Finally, if you forsee no more imports to this table, the raw VARCHAR timestamp columns may be removed. If you forsee importing more events from S3, do not remove these columns. To remove the columns, run:

Copy
ALTER TABLE events DROP COLUMN received_at_raw;
ALTER TABLE events DROP COLUMN generated_at_raw;

Multi-node clusters

On multi-node clusters, consider distributing keys and sorting keys to suit your queries and data retrieval, such as by sender name or program name.

Delete data

To reset the events table back to its initial empty state (and delete all loaded data), use either DROP TABLE events or TRUNCATE events.

Questions

Trying to accomplish something? Please ask us.

The scripts are not supported under any SolarWinds support program or service. The scripts are provided AS IS without warranty of any kind. SolarWinds further disclaims all warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The risk arising out of the use or performance of the scripts and documentation stays with you. In no event shall SolarWinds or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the scripts or documentation.