Data Modeling for PLG in Data Warehouses
Read our recommended best practices on how to model data in your data warehouse so that your GTM teams can use it to connect with the best customers
Why does your data model matter in PLG?
Cloud Data Warehouses are absolute gold mines that contain product and customer data. However, data that is blindly extracted from sources and loaded into Cloud Data Warehouses must be transformed in order to have any value to the business teams who need to leverage this data.Many products today sit on top of data warehouses to provide things like product analytics, marketing automation, customer lifecycle automation, reporting, or visibility (PLG CRMs). These products cannot be leveraged effectively if the right views are not built in CDWs.
Best Practices you should follow when modeling your data
When working with customers looking to build out a PLG GTM motion, we recommend the following best practices when designing a data schema:
Create one table that includes ALL your accounts. This will act as your “source of truth” list of all accounts.
Create one table that includes ALL your users. This will act as your “source of truth” list of all users. Include account IDs in this table as well, so that you can use this table to map users to accounts.
Use positive usage signals in conversations with stakeholders
As much as possible, store information about your accounts in the accounts table, and users in the users table. You can create multiple tables that describe accounts and users, but if downstream applications consume data from multiple tables, there is no guarantee that both tables will be read at the same time or in the right order.
Include timestamps for when the row in the cloud data warehouse was updated, not when the data was generated in the upstream data source. This enables incremental processing for downstream applications.
Carefully consider how you define NULLs in your columns. If NULLs are equivalent to 0, COALESCE these values.
Here’s an example of how your Users table might be structured. The Users table should only have one distinct row per (updated_at, account_id) value. Included are some examples of column values you might want to include in your Users table.
Here’s an example of how your Accounts table might be structured. The Accounts table should only have one distinct row per (updated_at, account_id) value. Included are some examples of column values you might want to include in your Accounts table.
Modeling and Transforming Product Usage Events
One of the most powerful benefits associated with using a CDW is that you can combine data from multiple sources, transform that data, and build views that are useful to business teams. There are a variety of ways to represent Product Usage Events, which we will outline below.
These metrics are “point-in-time” metrics, so each unique (updated_at, accountId) or (updated_at,userId) row will have one and only one metric value.
This is useful if you want to save space, speed up queries, and reduce noise when querying metric data. It’s also useful to store metrics in this way so that you can track change over time, given that you are storing the full history and a specified granularity level. We typically see users opt to do daily rollups.
You’ll notice that you can add additional columns to describe the user who is performing the event. This is useful to filter out events based on who performed the event. We typically do not recommend that you add additional columns to describe the event itself. For example, let’s say that you have an event called “Added Integration”.
You could choose to add a column for “Integration Type”. However, we recommend that you instead create distinct events for “Added Integration Type A” and “Added Integration Type B”. This is because nesting of events is hard to define and also harder to query over time.
The above recommended schemas should be taken only as recommendations. Each business is different and comes with complex data models. However, the recommended schemas are in-line with the types of schemas that tools like Correlated support. By sticking to these guidelines, you’ll be more likely to fit into the various requirements downstream applications have on the types of views and data models they accept.
Advanced Data Modeling for PLG / bottoms-up business models
Many PLG companies struggle with the challenge of successfully grouping users into Accounts. The challenge here, particularly in bottoms-up, self-serve businesses, is that any user could create an organization on their own. Common synonyms for an organization include: teams, workspaces, projects, etc. We’ll refer to organizations as groups going forward. That same user could be a member of multiple teams, workspaces, projects, etc. On top of that, you have the challenge of trying to figure out which Company groups together teams, workspaces, and projects! This is a difficult challenge when it comes to modeling data.
First, before getting any deeper into the data model that we suggest, it’s important to identify the group that you want your GTM teams to take action on. For example, are they selling top-down into the Company? Are they selling individually into teams, because teams in the Company have different use cases and don’t talk to each other? This will help you figure out which unique ID to choose as your canonical “Account ID”, which will often be used by SaaS tools to display Account Lists or other reports and dashboards.
So what schema do we recommend within a Data Warehouse? Our suggestion is to maintain the schema described earlier in this document (one Account and one User table) and add an additional mapping table for Groups.
The data model we describe above is what we've discovered to work best based on working with numerous PLG companies. However, it's important to note that every business is different. The most important thing is to figure out how you want to uniquely identify users and various groupings of users, as well as which group of users you want your GTM teams to action off of.