You want to set up a data pipeline to track events, but set-up and management can quickly become overwhelming if you don't have the right technical knowledge. In this how-to, I'll guide you on how to set up a production-ready Snowplow Analytics event data pipeline in less than 15 minutes.
Motivation for the 15-minute setup
- You don't have the resources to set up a full Snowplow pipeline
- You want your set-up to be mostly hands-free
- You want to run Snowplow Analytics in a production environment
- You don't want to break the bank
- You want to learn how to work with clickstream data
Snowplow Analytics Technology
Below is a brief explanation of the different components of a Snowplow Analytics data pipeline.
- Collector receives the event level data from the trackers and stores it for processing (or sends it to enrich if the pipeline is real-time).
- Enrich processes the data stored by the collector at regular time-intervals (or in real-time). Snowcat fully supports enrichments.
- Storage Loads the enriched data into storage: Redshift, S3, etc.
Setting up an entire Snowplow Analytics data pipeline requires you to configure multiple services in either Amazon Web Services or in Google Cloud, which requires extensive data engineering and AWS/GCP knowledge, not to mention the actual management.
Meet Snowcat Cloud, a fully managed Snowplow data pipeline that helps analysts get the data they need.
Snowcat's value proposition is simple; "We maintain a high-availability Snowplow collector, enricher, and loader, so you don't have to."
With Snowcat Cloud you get all the benefits of Snowplow, without owning the infrastructure.
Snowcat Cloud provides a Snowplow collector, that allows you to collect Snowplow events in Snowcat infrastructure, and then move them to your data warehouse.
Benefits of Snowcat Cloud
- Supports Snowplow Analytics Custom Contexts
- Can deliver event level data to multiple databases, not just AWS/GCP
- Near real-time data replication frequency
- It is fast and easy to set up
- No management
How to Set Up Snowplow Analytics
Step 1: Create a Data Warehouse
If you reached this far, you likely already have a data warehouse. If not, open an account on AWS and create a Redshift cluster. One node dc2.large is good to start.
After the cluster is running go to the Redshift security group and add an inbound rule to allow SnowCat Cloud to connect to your data warehouse (126.96.36.199/32).
Create a user for Snowcat and a database owned by snowcat.
If you plan to run this data warehouse as more than a test, don't forget to buy a Redshift reserved node with at least partial upfront; you will save 36% annually.
Step 2: Setup Snowcat Cloud
Go to snowcatcloud.com and open an account.
Get access to your domain DNS and follow the instructions to configure your domain for the Snowplow collector e.g. sp.yourdomain.com.
Create the DNS records to configure your Snowplow collector and validate them.
Step 3: Install Snowplow Tracker in your Website/App
- << COLLECTOR URL >> is your Snowplow collector URL e.g. sp.yourdomain.com
- << APPID >> is your unique ID for this app e.g. igloo-web
- << YOURDOMAIN.COM >> is your top level domain e.g. iglooanalytics.com
If you want to run Snowplow Analytics and don't have enough resources to build and manage your own pipeline, Snowcat Cloud provides you with an easy, affordable and reliable way to get started, with data freshness as low as 5 minutes and support for custom contexts.
In the future, if your resources or needs change, you can always run your Snowplow collector and migrate from Snowcat Cloud. No vendor lock-in.
Excuses not to own your data pipeline are getting scarcer and scarcer.
Need help getting up and running with Snowplow? Contact us.