Joao Correia
Driving Growth & Innovation With Data

You want to set up a data pipeline to track events, but set-up and management can quickly become overwhelming if you don't have the right technical knowledge. In this how-to, I'll guide you on how to set up a production-ready Snowplow Analytics event data pipeline in less than 15 minutes.

Motivation for the 15-minute setup

Snowplow Analytics Technology

Below is a brief explanation of the different components of a Snowplow Analytics data pipeline.

Setting up an entire Snowplow Analytics data pipeline requires you to configure multiple services in either Amazon Web Services or in Google Cloud, which requires extensive data engineering and AWS/GCP knowledge, not to mention the actual management.

Meet Fivetran, a fully managed data pipeline that helps analysts manage data pipelines, quickly and easily.

Fivetran

Fivetran's value proposition is simple; "Effortlessly replicate your business data into the cloud warehouse of your choice." Fivetran is like a patch cable for data engineers, making it easy to move data from point A to point B.

Fivetran provides a Snowplow collector, that allows you to collect Snowplow events in Fivetrans infrastructure, and then move them to your data warehouse.

Fivetran Snowplow Analytics connector diagram

Benefits of FiveTran

How to Set Up Snowplow Analytics

Step 1: Create a Data Warehouse

If you reached this far, you likely already have a data warehouse. If not, open an account on AWS and create a Redshift cluster. One node dc2.large is good to start.

Create Redshift cluster

If you are unsure which data warehouse to choose from, I recommend watching George Fraser, CEO of Fivetran on Redshift vs Snowflake vs BigQuery.

After the cluster is running go to the Redshift security group and add an inbound rule to allow Fivetran to connect to your data warehouse (52.0.2.4/32).

Allow Fivetran to connect to your Redshift cluster

Create a user for Fivetran and a database owned by fivetran.

CREATE USER fivetran WITH PASSWORD 'xxxxxxxxxxxx';
CREATE DATABASE fivetran WITH OWNER fivetran;

If you plan to run this data warehouse as more than a test, don't forget to buy a Redshift reserved node with at least partial upfront; you will save 36% annually.

Step 2: Setup Fivetran

Go to Fivetran.com and open an account. Add a new Warehouse (Redshift in our example) and type in all the credentials. For further details, you will find detailed configuration instructions right next to the form.

Setting up a data warehouse in Fivetran

Step 3: Install Snowplow Tracker in your Website/App

In the "Connect Sources" step, select "Snowplow" as the connector and Fivetran will provide you a Javascript snippet with the Snowplow collector URL. You will need to customize the snippet with your appId and cookieDomain.

Setting up the Snowplow Analytics collector in Fivetran
<script type='text/javascript'>

;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)};p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,'script','//d1fc8wv8zag5ca.cloudfront.net/2.5.3/sp.js','snowplow'));

// Initialize tracker
window.snowplow('newTracker', 'cf', '<< COLLECTOR URL >>', { 
  appId: '<< APPID >>',
  cookieDomain: '<< YOURDOMAIN.COM >>'
});

window.snowplow('trackPageView');

</script>

WHERE

Step 4: Watch Your Events Flow In

And like magic, your events will start flowing into your data warehouse. If necessary adjust your data replication frequency to have data from Snowplow every five minutes!

Adjust Fivetran Snowplow connector

Conclusion

If you want to run Snowplow Analytics and don't have enough resources to build and manage your own pipeline, Fivetran provides you with an easy and reliable way to get started, with data freshness as low as 5 minutes and support for custom contexts.

In the future, if your resources or needs change, you can always run your Snowplow collector and migrate from Fivetran. No vendor lock-in.

Excuses not to own your data pipeline are getting scarcer and scarcer.

Need help getting up and running with Snowplow? Contact us.

Share your comments below

Share your view in the comments section below.