Data Modeling for PLG  in Data Warehouses

Read our recommended best practices on how to model data in your data warehouse so that your GTM teams can use it to connect with the best customers

Why does your data model matter in PLG?

Cloud Data Warehouses are absolute gold mines that contain product and customer data. However, data that is blindly extracted from sources and loaded into Cloud Data Warehouses must be transformed in order to have any value to the business teams who need to leverage this data.Many products today sit on top of data warehouses to provide things like product analytics, marketing automation, customer lifecycle automation, reporting, or visibility (PLG CRMs). These products cannot be leveraged effectively if the right views are not built in CDWs.

Best Practices you should follow when modeling your data

When working with customers looking to build out a PLG GTM motion, we recommend the following best practices when designing a data schema:

  • Create one table that includes ALL your accounts. This will act as your “source of truth” list of all accounts.

  • Create one table that includes ALL your users. This will act as your “source of truth” list of all users. Include account IDs in this table as well, so that you can use this table to map users to accounts.

  • Use positive usage signals in conversations with stakeholders

  • As much as possible, store information about your accounts in the accounts table, and users in the users table. You can create multiple tables that describe accounts and users, but if downstream applications consume data from multiple tables, there is no guarantee that both tables will be read at the same time or in the right order.

  • Include timestamps for when the row in the cloud data warehouse was updated, not when the data was generated in the upstream data source. This enables incremental processing for downstream applications.

  • Carefully consider how you define NULLs in your columns. If NULLs are equivalent to 0, COALESCE these values.

Here’s an example of how your Users table might be structured. The Users table should only have one distinct row per (updated_at, account_id) value. Included are some examples of column values you might want to include in your Users table.

User table for PLG

Here’s an example of how your Accounts table might be structured. The Accounts table should only have one distinct row per (updated_at, account_id) value. Included are some examples of column values you might want to include in your Accounts table.

Accounts Table for PLG

Modeling and Transforming Product Usage Events

One of the most powerful benefits associated with using a CDW is that you can combine data from multiple sources, transform that data, and build views that are useful to business teams. There are a variety of ways to represent Product Usage Events, which we will outline below.

Dimensional Metrics Table for PLG

Option 1

The first option, and a common way to represent metrics, is to treat them as singular values in a column, within a table that is similar to the Accounts and User Tables described above.

These metrics are “point-in-time” metrics, so each unique (updated_at, accountId) or (updated_at,userId) row will have one and only one metric value.
Metric Rollups for PLG

Option 2

The second option is to roll up your metrics by some fixed time period, and treat these rollups as time-series.

This is useful if you want to save space, speed up queries, and reduce noise when querying metric data. It’s also useful to store metrics in this way so that you can track change over time, given that you are storing the full history and a specified granularity level. We typically see users opt to do daily rollups.
Raw Events Table for PLG

Option 3

The third option is to store your metrics as raw events. This can be more expensive to query, but ultimately offers the most flexibility over time. You can choose to store all your raw events and transform them into rollups as views, which are then consumed by downstream integrations.

You’ll notice that you can add additional columns to describe the user who is performing the event. This is useful to filter out events based on who performed the event. We typically do not recommend that you add additional columns to describe the event itself. For example, let’s say that you have an event called “Added Integration”.

You could choose to add a column for “Integration Type”. However, we recommend that you instead create distinct events for “Added Integration Type A” and “Added Integration Type B”. This is because nesting of events is hard to define and also harder to query over time.

The above recommended schemas should be taken only as recommendations. Each business is different and comes with complex data models. However, the recommended schemas are in-line with the types of schemas that tools like Correlated support. By sticking to these guidelines, you’ll be more likely to fit into the various requirements downstream applications have on the types of views and data models they accept.

Advanced Data Modeling for PLG / bottoms-up business models

Many PLG companies struggle with the challenge of successfully grouping users into Accounts. The challenge here, particularly in bottoms-up, self-serve businesses, is that any user could create an organization on their own. Common synonyms for an organization include: teams, workspaces, projects, etc. We’ll refer to organizations as groups going forward. That same user could be a member of multiple teams, workspaces, projects, etc. On top of that, you have the challenge of trying to figure out which Company groups together teams, workspaces, and projects! This is a difficult challenge when it comes to modeling data.

First, before getting any deeper into the data model that we suggest, it’s important to identify the group that you want your GTM teams to take action on. For example, are they selling top-down into the Company? Are they selling individually into teams, because teams in the Company have different use cases and don’t talk to each other? This will help you figure out which unique ID to choose as your canonical “Account ID”, which will often be used by SaaS tools to display Account Lists or other reports and dashboards.

PLG Data Schema Diagram

Next, you’ll want to map how you group your Accounts and Users to the above diagram. Notice that Users can be members of multiple groups, so there is a many:many relationship. Many groups can be members of one Account, so there is a many:1 relationship. By traversing the relationship tree, Users are members of only one Account, in a many:1 relationship. We recommend that companies try their best to map to this schema, as going with a more complex schema is unlikely to be supported by any SaaS tools in the market today. If you do have another conceptual grouping, for example, perhaps you sell into companies by “region” so underneath the Account level, you group “groups” by the region they are in, we recommend treating that as an attribute of the group. For example, rather than creating another object for “Region”, you include the region a group is in as an attribute of that group.

So what schema do we recommend within a Data Warehouse? Our suggestion is to maintain the schema described earlier in this document (one Account and one User table) and add an additional mapping table for Groups.

Groups table for PLG

Now typically, if you don’t already have a way to join groups to Accounts, the most common way we see this done is to use domains to match groups to Accounts. Alternatively, if your Sales reps are updating Salesforce with Account ⇒ Contact relationships, you can also use that mapping to tag groups with the correct Account based on the Account a user in the group is linked to in Salesforce. This will require data modeling internally to get right. Finally, you'll want to create a Users table as shown before, with the additional "group_id" column so that you can map users to groups.

The data model we describe above is what we've discovered to work best based on working with numerous PLG companies. However, it's important to note that every business is different. The most important thing is to figure out how you want to uniquely identify users and various groupings of users, as well as which group of users you want your GTM teams to action off of.

Excited to try things out?

Get a product tour