You want to set up a data pipeline to track events, but set-up and management can quickly become overwhelming if you don't have the right technical knowledge. In this how-to, I'll guide you on how to set up a production-ready Snowplow Analytics event data pipeline in less than 15 minutes.
Motivation for the 15-minute setup
- You don't have the resources to set up/manage a full Snowplow pipeline
- You want your set-up to be mostly hands-free
- You want to run Snowplow Analytics in a production environment
- You want to own your data pipeline
Snowplow Analytics Technology
Below is a brief explanation of the different components of a Snowplow Analytics data pipeline.
- Collector receives the event level data from the trackers and stores it for processing (or sends it to enrich if the pipeline is real-time).
- Enrich processes the data stored by the collector at regular time-intervals (or in real-time). Notice Fivetran doesn't enrich data at this stage like the native Snowplow pipeline, but you can always enrich it in your data warehouse.
- Storage Loads the enriched data into storage: Redshift, S3, etc.
Setting up an entire Snowplow Analytics data pipeline requires you to configure multiple services in either Amazon Web Services or in Google Cloud, which requires extensive data engineering and AWS/GCP knowledge, not to mention the actual management.
Meet Fivetran, a fully managed data pipeline that helps analysts manage data pipelines, quickly and easily.
Fivetran's value proposition is simple; "Effortlessly replicate your business data into the cloud warehouse of your choice." Fivetran is like a patch cable for data engineers, making it easy to move data from point A to point B.
Fivetran provides a Snowplow collector, that allows you to collect Snowplow events in Fivetrans infrastructure, and then move them to your data warehouse.
Benefits of FiveTran
- Supports Snowplow Analytics Custom Contexts
- Can deliver event level data to multiple databases, not just AWS/GCP
- 5 minute data replication frequency
- It is fast and easy to set up
- No management
How to Set Up Snowplow Analytics
Step 1: Create a Data Warehouse
If you reached this far, you likely already have a data warehouse. If not, open an account on AWS and create a Redshift cluster. One node dc2.large is good to start.
If you are unsure which data warehouse to choose from, I recommend watching George Fraser, CEO of Fivetran on Redshift vs Snowflake vs BigQuery.
After the cluster is running go to the Redshift security group and add an inbound rule to allow Fivetran to connect to your data warehouse (184.108.40.206/32).
Create a user for Fivetran and a database owned by fivetran.
If you plan to run this data warehouse as more than a test, don't forget to buy a Redshift reserved node with at least partial upfront; you will save 36% annually.
Step 2: Setup Fivetran
Go to Fivetran.com and open an account. Add a new Warehouse (Redshift in our example) and type in all the credentials. For further details, you will find detailed configuration instructions right next to the form.
Step 3: Install Snowplow Tracker in your Website/App
- << COLLECTOR URL >> is the collector URL provided by Fivetran e.g. events.fivetran.com/snowplow/xxxxxxxxx
- << APPID >> is your unique ID for this app e.g. igloo-web
- << YOURDOMAIN.COM >> is your top level domain e.g. iglooanalytics.com
Step 4: Watch Your Events Flow In
And like magic, your events will start flowing into your data warehouse. If necessary adjust your data replication frequency to have data from Snowplow every five minutes!
If you want to run Snowplow Analytics and don't have enough resources to build and manage your own pipeline, Fivetran provides you with an easy and reliable way to get started, with data freshness as low as 5 minutes and support for custom contexts.
In the future, if your resources or needs change, you can always run your Snowplow collector and migrate from Fivetran. No vendor lock-in.
Excuses not to own your data pipeline are getting scarcer and scarcer.
Need help getting up and running with Snowplow? Contact us.