Joao Correia
Driving Growth & Innovation With Data

What if you could deploy Snowplow Analytics in a single machine to test your implementation?.

That is what you can do with Snowplow Mini. In this post, you'll learn how to setup Snowplow mini to test your custom contexts and implementation.

What is Snowplow Mini?

Snowplow Mini is a Snowplow real-time pipeline running on a single instance; with Elastic Search and Kibana to visualize events in real-time, and an Iglu schema repository for custom contexts, Snowplow Mini helps developers roll out, demoing and debugging Snowplow.

Kibana or the ELK stack (Elastic, Logstash and Kibana) is widely used in operational intelligence, where real-time data is critical.

Don't let the raw looks fool you, Kibana is very powerful. Kibana showing Snowplow events

How to Setup Snowplow Mini

Snowplow wrote great documentation on how to set up Snowplow Mini, which you can read on Github but I'll do a step-by-step here, as things don't always work as specified in the documentation.

Installing Snowplow Mini on Amazon Web Services

Depending on which zone you prefer to deploy your Snowplow Mini, you'll need to select the corresponding AWS AMI, a ready to use Snowplow Mini.

Important AMIs are a regional resource. When searching for a shared AMI, you must search for it from within the region from which it is being shared.

Select your preferred AWS zone and copy the AMI reference to the clipboard (use a t2.large):
ami-b890b6c0

On the AWS EC2 console select AMIs from the side menu, and Public Images from the search filter. Paste the AMI reference you copied from above.

Once you find the AMI select it and go through the launch wizard. On the instance type select t2.large. Select AWS Snowplow Mini AMI for your region

Next, you need to adjust the security group, for Snowplow Mini. The security group is an AWS feature that works like a firewall, which manages the traffic to and from your Snowplow Mini.

Select the Snowplow Mini instance and then click the security group that is associated. Instance details

Allow HTTP and HTTPS traffic from anywhere, and SSH from your IP address as shown below. Configure Snowplow Mini Security Group

Back to the EC2 instance detail page is the Public DNS (IPv4). Copy and paste this URL in your browser, add /home and press [ENTER].

You will be prompted for a username and password.

Below is the default user and pass, but you can (and should) change them in the Control Panel (/home/#/control-plane)

username: USERNAME_PLACEHOLDER
password: PASSWORD_PLACEHOLDER

Welcome to Snowplow Mini Welcome to Snowplow Mini

Now that we've got Snowplow Mini set up let's test our custom contexts.

Using Snowplow Mini to test Custom Contexts

We've seen in a previous blog post how to create and validate your custom contexts; now it's time to test them in an environment as close to production as possible.

I mentioned these steps on the blog post above, but I'll repeat them here.

  1. Download and install Igluctl
  2. Download the sample schema-registry template

Step 1: Let's validate the custom context

Let's validate the Snowplow iglu-example-schema-registry example_event

joaocorreia$ igluctl lint schemas/

FAILURE: Schema [iglu-example-schema-registry-master/schemas/com.example_company/example_event/jsonschema/1-0-0] contains following errors: 

1. Optional field doesn't allow null type
2. Schema doesn't contain description property
3. Schema doesn't contain description property
4. Schema doesn't contain description property
5. Schema doesn't contain description property

TOTAL: 0 Schemas were successfully validated
TOTAL: 1 invalid Schemas were encountered
TOTAL: 5 errors were encountered

joaocorreia$

Ooops, our example schema is not quite ready. Let's make some changes.

Documenting our schemas is always a good idea, add a description to each of the fields, and notice exampleTimestampField is optional, if it is optional it can be null. Let's add this as a possibility by changing type to: ["string","null"].

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for an example event",
  "self": {
    "vendor": "com.example_company",
    "name": "example_event",
    "format": "jsonschema",
    "version": "1-0-0"
  },

  "type": "object",
  "properties": {
    "exampleStringField": {
      "description": "Example string field",
      "type": "string",
      "maxLength": 255
    },
    "exampleIntegerField": {
      "description": "Example integer field",   
      "type": "integer",
      "minimum": 0,
      "maximum": 100000
    },
    "exampleNumericField": {
      "description": "Example numeric field",       
      "type": ["number","null"],
      "multipleOf": 0.0001,
      "minimum": -1000000,
      "maximum":  1000000
    },
    "exampleTimestampField": {
      "description": "Example timestamp field",       
      "type": ["string","null"],
      "format": "date-time"
    }
  },
  "minProperties":1,
  "required": ["exampleStringField", "exampleIntegerField"],
  "additionalProperties": false
}

We run lint and our schema is now valid.

joaocorreia$ igluctl lint schemas/

SUCCESS: Schema [iglu-example-schema-registry-master/schemas/com.example_company/example_event/jsonschema/1-0-0] is successfully validated
TOTAL: 1 Schemas were successfully validated
TOTAL: 0 invalid Schemas were encountered
TOTAL: 0 errors were encountered

joaocorreia$ 

Step 2: Upload Custom Context to Snowplow Mini

In the AMI I used to write this blog post the Iglu Server was not running, I had to upload a new configuration file. Check if yours is running by visiting: http://YOUR EC2 PUBLIC HOSTNAME/iglu-server/.

If it is not running, download this Iglu server configuration file and change only line 25 (baseURL). Do not remove /iglu-server.

Go to your Snowplow Mini Control Panel /home/#/control-plane, upload the configuration file, and click restart all services. Upload Iglu Server configuration file

Now you should be able to see the API documentation for Iglu server on http://YOUR EC2 PUBLIC HOSTNAME/iglu-server/.

In order to validate the custom context you'll need to upload the schema to Snowplow Mini through the Iglu server API.

Go to uuidgenerator.net and generate a UUID, this will be your API key. Copy the key to your clipboard and write it down somewhere for easy reference (don't share it of course).

ab230573-8099-4878-a24c-e41f41e975d0

Go to Snowplow Mini /home/#/control-plane and add the UUID. If you forget, you can always add another UUID.

Add apikey

Now open a Terminal, go to the example schema registry path and set two environment variables with your Snowplow Mini IP address and your API key.

SNOWPLOW_MINI_IP=35.165.21.41
IGLU_REGISTRY_MASTER_KEY=ab230573-8099-4878-a24c-e41f41e975d0

Now you need to generate the JSONPaths and SQL table definitions. I prefer to set the database owner right away, so I won't forget to change the table later.

What are JSONPath files?

The JSONPaths files are collections of JSONPath expressions that are used to parse data from the Snowplow custom context JSON, which is then loaded to the corresponding fields in a database by the Snowplow Relational Database Loader (RDB).

What are SQL Table definitions?

The SQL table definitions are the SQL equivalent of custom contexts, the CREATE TABLE statements, with the corresponding fields where the custom context events data will be loaded.

joaocorreia$ igluctl static generate --with-json-paths ./schemas --set-owner storageloader

File [./sql/com.example_company/example_event_1.sql] was overridden successfully (no change)!
File [./jsonpaths/com.example_company/example_event_1.json] was overridden successfully (no change)!

Push the schemas to Snowplow Mini, will upload the schemas to the Iglu schema repository through the API.

igluctl static push ./schemas $SNOWPLOW_MINI_IP/iglu-server/ $IGLU_REGISTRY_MASTER_KEY --public

joaocorreia$ igluctl static push ./schemas $SNOWPLOW_MINI_IP/iglu-server/ $IGLU_REGISTRY_MASTER_KEY --public

SUCCESS: Schema successfully updated at /api/schemas/com.example_company/example_event/jsonschema/1-0-0  (200)
Read key bc768ea6-1fb5-496c-bb9b-5e73d5c9f28c deleted
Write key 878e5f99-0ec8-4afe-9057-1dcade9bc6e3 deleted
TOTAL: 1 Schemas successfully uploaded (0 created; 1 updated)
TOTAL: 0 failed Schema uploads

joaocorreia$

Step 3: Sending an event with custom context to Snowplow Mini

We are now ready to send events with our newly created custom context to Snowplow Mini. Here comes the fun part!.

If you followed the article, just type your collector URL (without hostname) below and click one of the buttons to send an event with the custom context example_company/example_event/jsonschema/1-0-0 to your collector.

Snowplow Event Generator With Custom Context


Dont use the protocol, use only the hostname.
Snowplow Chrome Debugger

Install our free Snowplow Chrome Debugger to view Snowplow hits in detail, or read our blog post about it.

You should see the events in your Kibana, available at:
http://YOUR SNOWPLOW MINI HOSTNAME/app/kibana#/discover Events with custom context in Snowplow Mini

How to update Snowplow Custom Contexts

You deployed your schema to production, and now, a month later you want to add a new field.

You probably noticed the schemas use a strange file name, 1-0-0, this is semantic versioning.

In a nutshell semantic versioning let's you and Snowplow know what type of change you are making to your custom context, where each digit is a different type of change 1-0-0 (MODEL-REVISION-ADDITION).

Semantic Versioning

Given a version number MODEL.REVISION.ADDITION, increment the:

MODEL when you make a breaking schema change which will prevent interaction with any historical data
REVISION version when you add functionality in a backwards-compatible manner, and
ADDITION version when you make backwards-compatible bug fixes.

Read the Snowplow documentation on SchemaVer

If you are adding a field to the schema, it is an addition; you change your semantic version to 1-0-1. Please copy the following to a file named 1-0-1 next to 1-0-0

Important You do not want to just edit the 1-0-0 and make it 1-0-1, because this would effectively erase that version, making all 1-0-0 events potentially invalid. You want to keep all versions, if you are making changes you will be able to join data using SQL.

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "description": "Schema for an example event",
  "self": {
    "vendor": "com.example_company",
    "name": "example_event",
    "format": "jsonschema",
    "version": "1-0-1"
  },

  "type": "object",
  "properties": {
    "exampleStringField": {
      "description": "Example string field",
      "type": "string",
      "maxLength": 255
    },
    "exampleIntegerField": {
      "description": "Example integer field",   
      "type": "integer",
      "minimum": 0,
      "maximum": 100000
    },
    "exampleNumericField": {
      "description": "Example numeric field",       
      "type": ["number","null"],
      "multipleOf": 0.0001,
      "minimum": -1000000,
      "maximum":  1000000
    },
    "exampleTimestampField": {
      "description": "Example timestamp field",       
      "type": ["string","null"],
      "format": "date-time"
    },
    "IglooExampleField": {
      "description": "Example field from Igloo",       
      "type": ["string","null"],
      "maxLength": 255
    }    
  },
  "minProperties":1,
  "required": ["exampleStringField", "exampleIntegerField"],
  "additionalProperties": false
}

Run the igluctl to generate JSONPaths and SQL

joaocorreia$ igluctl static generate --with-json-paths ./schemas --set-owner storageloader

File [./sql/com.example_company/example_event_1.sql] was overridden successfully (no change)!
File [./jsonpaths/com.example_company/example_event_1.json] was overridden successfully (no change)!

Notice igluctl created a change you need to execute to your existing table to load 1-0-1 context, and didn't create a new table, because the change was small (ADDITION). The 1-0-1 events will be loaded into the same table.

BEGIN TRANSACTION;

  ALTER TABLE atomic.com_example_company_example_event_1
    ADD COLUMN "igloo_example_field" VARCHAR(255) ENCODE ZSTD;

  COMMENT ON TABLE atomic.com_example_company_example_event_1 IS 'iglu:com.example_company/example_event/jsonschema/1-0-1';

END TRANSACTION;

If you change the version to 2-0-1 you will notice icluctl is going to create a new table.

Conclusion

Custom contexts are like objects, metadata objects that can be extended or contracted as you desire and are not subject to arbitrary limits imposed by technology or product.

With Snowplow Mini you can test your custom contexts with ease and confidence they will work on your production Snowplow.

Looking for Snowplow Analytics Consulting?. Contact us.

Share your comments below

Share your view in the comments section below.

Hi,

No Data Engineers? No problem.

START TRIAL Snowcat Cloud, Hosted Snowplow Analytics