What if you could deploy Snowplow Analytics in a single machine to test your implementation?.
That is what you can do with Snowplow Mini. In this post, you'll learn how to setup Snowplow mini to test your custom contexts and implementation.
Table of Contents
- What is Snowplow Mini
- How to Setup Snowplow Mini
- Installing Snowplow Mini on Amazon Web Services
- Using Snowplow Mini to test Custom Contexts
- Step 1: Let's validate the custom context
- Step 2: Upload Custom Context to Snowplow Mini
- Step 3: Sending an event with custom context to Snowplow Mini
- How do I update my custom context?
What is Snowplow Mini?
Snowplow Mini is a Snowplow real-time pipeline running on a single instance; with Elastic Search and Kibana to visualize events in real-time, and an Iglu schema repository for custom contexts, Snowplow Mini helps developers roll out, demoing and debugging Snowplow.
Kibana or the ELK stack (Elastic, Logstash and Kibana) is widely used in operational intelligence, where real-time data is critical.
How to Setup Snowplow Mini
Snowplow wrote great documentation on how to set up Snowplow Mini, which you can read on Github but I'll do a step-by-step here, as things don't always work as specified in the documentation.
Installing Snowplow Mini on Amazon Web Services
Depending on which zone you prefer to deploy your Snowplow Mini, you'll need to select the corresponding AWS AMI, a ready to use Snowplow Mini.
Important AMIs are a regional resource. When searching for a shared AMI, you must search for it from within the region from which it is being shared.
Select your preferred AWS zone and copy the AMI reference to the clipboard (use a t2.large):
On the AWS EC2 console select AMIs from the side menu, and Public Images from the search filter. Paste the AMI reference you copied from above.
Next, you need to adjust the security group, for Snowplow Mini. The security group is an AWS feature that works like a firewall, which manages the traffic to and from your Snowplow Mini.
Back to the EC2 instance detail page is the Public DNS (IPv4). Copy and paste this URL in your browser, add /home and press [ENTER].
You will be prompted for a username and password.
Below is the default user and pass, but you can (and should) change them in the Control Panel (/home/#/control-plane)
Now that we've got Snowplow Mini set up let's test our custom contexts.
Using Snowplow Mini to test Custom Contexts
We've seen in a previous blog post how to create and validate your custom contexts; now it's time to test them in an environment as close to production as possible.
I mentioned these steps on the blog post above, but I'll repeat them here.
Step 1: Let's validate the custom context
Let's validate the Snowplow iglu-example-schema-registry example_event
Ooops, our example schema is not quite ready. Let's make some changes.
Documenting our schemas is always a good idea, add a description to each of the fields, and notice exampleTimestampField is optional, if it is optional it can be null. Let's add this as a possibility by changing type to: ["string","null"].
We run lint and our schema is now valid.
Step 2: Upload Custom Context to Snowplow Mini
In the AMI I used to write this blog post the Iglu Server was not running, I had to upload a new configuration file. Check if yours is running by visiting: http://YOUR EC2 PUBLIC HOSTNAME/iglu-server/.
If it is not running, download this Iglu server configuration file and change only line 25 (baseURL). Do not remove /iglu-server.
Go to your Snowplow Mini Control Panel /home/#/control-plane, upload the configuration file, and click restart all services.
Now you should be able to see the API documentation for Iglu server on http://YOUR EC2 PUBLIC HOSTNAME/iglu-server/.
In order to validate the custom context you'll need to upload the schema to Snowplow Mini through the Iglu server API.
Go to uuidgenerator.net and generate a UUID, this will be your API key. Copy the key to your clipboard and write it down somewhere for easy reference (don't share it of course).
Go to Snowplow Mini /home/#/control-plane and add the UUID. If you forget, you can always add another UUID.
Now open a Terminal, go to the example schema registry path and set two environment variables with your Snowplow Mini IP address and your API key.
Now you need to generate the JSONPaths and SQL table definitions. I prefer to set the database owner right away, so I won't forget to change the table later.
What are JSONPath files?
The JSONPaths files are collections of JSONPath expressions that are used to parse data from the Snowplow custom context JSON, which is then loaded to the corresponding fields in a database by the Snowplow Relational Database Loader (RDB).
What are SQL Table definitions?
The SQL table definitions are the SQL equivalent of custom contexts, the CREATE TABLE statements, with the corresponding fields where the custom context events data will be loaded.
Push the schemas to Snowplow Mini, will upload the schemas to the Iglu schema repository through the API.
Step 3: Sending an event with custom context to Snowplow Mini
We are now ready to send events with our newly created custom context to Snowplow Mini. Here comes the fun part!.
If you followed the article, just type your collector URL (without hostname) below and click one of the buttons to send an event with the custom context example_company/example_event/jsonschema/1-0-0 to your collector.
Snowplow Event Generator With Custom Context
Dont use the protocol, use only the hostname.
How to update Snowplow Custom Contexts
You deployed your schema to production, and now, a month later you want to add a new field.
You probably noticed the schemas use a strange file name, 1-0-0, this is semantic versioning.
In a nutshell semantic versioning let's you and Snowplow know what type of change you are making to your custom context, where each digit is a different type of change 1-0-0 (MODEL-REVISION-ADDITION).
Given a version number MODEL.REVISION.ADDITION, increment the:
MODEL when you make a breaking schema change which will prevent interaction with any historical data
REVISION version when you add functionality in a backwards-compatible manner, and
ADDITION version when you make backwards-compatible bug fixes.
Read the Snowplow documentation on SchemaVer
If you are adding a field to the schema, it is an addition; you change your semantic version to 1-0-1. Please copy the following to a file named 1-0-1 next to 1-0-0
Important You do not want to just edit the 1-0-0 and make it 1-0-1, because this would effectively erase that version, making all 1-0-0 events potentially invalid. You want to keep all versions, if you are making changes you will be able to join data using SQL.
Run the igluctl to generate JSONPaths and SQL
Notice igluctl created a change you need to execute to your existing table to load 1-0-1 context, and didn't create a new table, because the change was small (ADDITION). The 1-0-1 events will be loaded into the same table.
If you change the version to 2-0-1 you will notice icluctl is going to create a new table.
Custom contexts are like objects, metadata objects that can be extended or contracted as you desire and are not subject to arbitrary limits imposed by technology or product.
With Snowplow Mini you can test your custom contexts with ease and confidence they will work on your production Snowplow.