Joao Correia
Driving Growth & Innovation With Data

You want to set up a data pipeline to track events, but set-up and management can quickly become overwhelming if you don't have the right technical knowledge. In this how-to, I'll guide you on how to set up a production-ready Snowplow Analytics event data pipeline in less than 15 minutes.

Motivation for the 15-minute setup

Snowplow Analytics Technology

Below is a brief explanation of the different components of a Snowplow Analytics data pipeline.

Setting up an entire Snowplow Analytics data pipeline requires you to configure multiple services in either Amazon Web Services or in Google Cloud, which requires extensive data engineering and AWS/GCP knowledge, not to mention the actual management.

Meet Snowcat Cloud, a fully managed Snowplow data pipeline that helps analysts get the data they need.

Snowcat Cloud

Snowcat's value proposition is simple; "We maintain a high-availability Snowplow collector, enricher, and loader, so you don't have to."
With Snowcat Cloud you get all the benefits of Snowplow, without owning the infrastructure.

Snowcat Cloud provides a Snowplow collector, that allows you to collect Snowplow events in Snowcat infrastructure, and then move them to your data warehouse.

Snowcat Cloud Snowplow Analytics diagram

Benefits of Snowcat Cloud

How to Set Up Snowplow Analytics

Step 1: Create a Data Warehouse

If you reached this far, you likely already have a data warehouse. If not, open an account on AWS and create a Redshift cluster. One node dc2.large is good to start.

Create Redshift cluster

After the cluster is running go to the Redshift security group and add an inbound rule to allow SnowCat Cloud to connect to your data warehouse (50.19.2.6/32).

Allow Snowcat Cloud to connect to your Redshift cluster

Create a user for Snowcat and a database owned by snowcat.

CREATE USER snowcat WITH PASSWORD 'xxxxxxxxxxxx';
CREATE DATABASE snowplow WITH OWNER snowcat;

If you plan to run this data warehouse as more than a test, don't forget to buy a Redshift reserved node with at least partial upfront; you will save 36% annually.

Step 2: Setup Snowcat Cloud

Go to snowcatcloud.com and open an account.
Get access to your domain DNS and follow the instructions to configure your domain for the Snowplow collector e.g. sp.yourdomain.com.

Step 3: Install Snowplow Tracker in your Website/App

In the "Snippet Generator", select a name for your application id, click "Generate" and copy the Javascript snippet to your tag management or website.

<script type='text/javascript'>

;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)};p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,'script','//xxxxxxxxxxxx.cloudfront.net/x.x.x/xxxxxxxx.js','snowplow'));

// Initialize tracker
window.snowplow('newTracker', 'cf', '<< COLLECTOR URL >>', { 
  appId: '<< APPID >>',
  cookieDomain: '<< YOURDOMAIN.COM >>'
});

window.snowplow('trackPageView');

</script>

WHERE

Conclusion

If you want to run Snowplow Analytics and don't have enough resources to build and manage your own pipeline, Snowcat Cloud provides you with an easy, affordable and reliable way to get started, with data freshness as low as 5 minutes and support for custom contexts.

In the future, if your resources or needs change, you can always run your Snowplow collector and migrate from Snowcat Cloud. No vendor lock-in.

Excuses not to own your data pipeline are getting scarcer and scarcer.

Need help getting up and running with Snowplow? Contact us.

Share your comments below

Share your view in the comments section below.