How to Setup Snowplow Analytics Production-Ready in 15 Minutes

Setting up a Snowplow Analytics data pipeline can be complex for new-comers. This how-to allows you to go from zero to hero in less than 15 minutes.

By Joao Correia
Dec 4, 2018 in Snowplow Analytics

Joao Correia
Driving Growth & Innovation With Data

December 4, 2018

You want to set up a data pipeline to track events, but set-up and management can quickly become overwhelming if you don't have the right technical knowledge. In this how-to, I'll guide you on how to set up a production-ready Snowplow Analytics event data pipeline in less than 15 minutes.

Motivation for the 15-minute setup

You don't have the resources to set up a full Snowplow pipeline
You want your set-up to be mostly hands-free
You want to run Snowplow Analytics in a production environment
You don't want to break the bank
You want to learn how to work with clickstream data

Snowplow Analytics Technology

Below is a brief explanation of the different components of a Snowplow Analytics data pipeline.

Trackers are libraries written in Javascript, Python, Unity, Objective-C, and others that allow you to send events to Snowplow with one line of code. Think of firing a Google Analytics pageview or Segment event; Snowplow is just as simple.
Collector receives the event level data from the trackers and stores it for processing (or sends it to enrich if the pipeline is real-time).
Enrich processes the data stored by the collector at regular time-intervals (or in real-time). Snowcat fully supports enrichments.
Storage Loads the enriched data into storage: Redshift, S3, etc.

Setting up an entire Snowplow Analytics data pipeline requires you to configure multiple services in either Amazon Web Services or in Google Cloud, which requires extensive data engineering and AWS/GCP knowledge, not to mention the actual management.

Meet Snowcat Cloud, a fully managed Snowplow data pipeline that helps analysts get the data they need.

Snowcat Cloud

Snowcat's value proposition is simple; "We maintain a high-availability Snowplow collector, enricher, and loader, so you don't have to."
With Snowcat Cloud you get all the benefits of Snowplow, without owning the infrastructure.

Snowcat Cloud provides a Snowplow collector, that allows you to collect Snowplow events in Snowcat infrastructure, and then move them to your data warehouse.

Snowcat Cloud Snowplow Analytics diagram

Benefits of Snowcat Cloud

Supports Snowplow Analytics Custom Contexts
Affordable
Can deliver event level data to multiple databases, not just AWS/GCP
Near real-time data replication frequency
It is fast and easy to set up
No management

How to Set Up Snowplow Analytics

Step 1: Create a Data Warehouse

If you reached this far, you likely already have a data warehouse. If not, open an account on AWS and create a Redshift cluster. One node dc2.large is good to start.

After the cluster is running go to the Redshift security group and add an inbound rule to allow SnowCat Cloud to connect to your data warehouse (50.19.2.6/32).

Allow Snowcat Cloud to connect to your Redshift cluster

Create a user for Snowcat and a database owned by snowcat.

CREATE USER storageloader WITH PASSWORD 'xxxxxxxxxxxx';
CREATE DATABASE snowplow WITH OWNER storageloader;

If you plan to run this data warehouse as more than a test, don't forget to buy a Redshift reserved node with at least partial upfront; you will save 36% annually.

Step 2: Setup Snowcat Cloud

Go to snowcatcloud.com and open an account.
Get access to your domain DNS and follow the instructions to configure your domain for the Snowplow collector e.g. sp.yourdomain.com.

Select the domain for your Snowplow collector

Create the DNS records to configure your Snowplow collector and validate them.

Create the corresponding DNS records for the Snowplow collector

Step 3: Install Snowplow Tracker in your Website/App

In the "Snippet Generator", select a name for your application id, click "Generate" and copy the Javascript snippet to your tag management or website.

<script type='text/javascript'>

;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)};p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,'script','//xxxxxxxxxxxx.cloudfront.net/x.x.x/xxxxxxxx.js','snowplow'));

// Initialize tracker
window.snowplow('newTracker', 'cf', '<< COLLECTOR URL >>', { 
  appId: '<< APPID >>',
  cookieDomain: '<< YOURDOMAIN.COM >>'
});

window.snowplow('trackPageView');

</script>

WHERE

<< COLLECTOR URL >> is your Snowplow collector URL e.g. sp.yourdomain.com
<< APPID >> is your unique ID for this app e.g. igloo-web
<< YOURDOMAIN.COM >> is your top level domain e.g. iglooanalytics.com

Conclusion

If you want to run Snowplow Analytics and don't have enough resources to build and manage your own pipeline, Snowcat Cloud provides you with an easy, affordable and reliable way to get started, with data freshness as low as 5 minutes and support for custom contexts.

In the future, if your resources or needs change, you can always run your Snowplow collector and migrate from Snowcat Cloud. No vendor lock-in.

Excuses not to own your data pipeline are getting scarcer and scarcer.

Need help getting up and running with Snowplow? Contact us.

snowplow-analytics