Skip to main content
Loading...
Skip to article
  • Qualtrics Platform
    Qualtrics Platform
  • Customer Journey Optimizer
    Customer Journey Optimizer
  • XM Discover
    XM Discover
  • Qualtrics Social Connect
    Qualtrics Social Connect

Joins (CX)


Was this helpful?


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The feedback you submit here is used only to help improve this page.

That’s great! Thank you for your feedback!

Thank you for your feedback!


Qtip: The data modeler and related functionality is not yet available to all customers. If you’re interested in this feature, please reach out to your XM Success Representative. Qualtrics may, in its sole discretion and without liability, change the timing of any product feature rollout, change the functionality for any in preview or in development product feature, or choose not to release a product feature or functionality for any reason or for no reason.

About Joins

Joins allow you to combine rows from 2 or more data sources based on a related column of data that they share. By using a join, you can gather and analyze the combined data more efficiently and effectively, creating more insights.

Example: Join the support tickets resolved for a customer with their survey data, directory contact data, and more, giving you a full profile of a customer’s interactions with your company.

The CX data modeler only supports left outer joins, inner joins, and outer joins. Only left outer joins update continuously.

Qtip: Confused about what datasets are vs. data sources? Not sure how to tell the data mapper and modeler apart? Learn more about these and more key terms.

Understanding Left Outer Joins

To understand how a left outer join works, let’s look at an example.

Look at the image of the dataset below. The first data source on top is our “left” data source, and the second data source on the bottom is our “right” data source.

Each data source is a block. The one on top is an imported data project named store locations. The block on bottom is a survey named customer feedback

Let’s say these tables represent the data you can find in each survey:

Store Locations (Left Source)

Location ID Location name
555 Provo
777 Dublin
999 Seattle
1000 Tokyo

Customer Feedback (Right Source)

Customer ID Satisfaction (1-5) Location ID
101 2 555
102 4 777
103 5 999
104 5 222

You choose to join the data based on Location ID. This field is called a “join key.”

This is the data unique to the second dataset. This would be removed:

Customer ID Satisfaction (1-5) Location ID
104 5 222

This is the final output, or all of the data that would be included in your results:

Location ID Location name Customer ID Satisfaction (1-5)
555 Provo 101 2
777 Dublin 102 4
999 Seattle 103 5
1000 Tokyo Null Null

Notice how the Provo, Dublin, and Seattle results contain columns of data from both the Right and Left sources, because these rows shared common location IDs.

There was no data for Tokyo in the Right source, and so the Tokyo row has null values under the new Customer ID and Satisfaction columns.

Importance of Unique Join Keys

Because join keys help to identify rows that need to be combined from the left and right sources, we recommend making sure the join key you use acts like a unique identifier. Otherwise, if there are multiple records in the right source that match the join key of the left source, only one of them will be pulled randomly.

Example: Let’s look at the example we discussed above. We have the same left source. But in the right source, we have these rows:

Customer ID Satisfaction (1-5) Location ID
101 2 555
107 4 555

In the resulting join, only one of the 555 rows will be saved, but not both.

If the join key isn’t unique for each record on a right source, and you want to include all records from both the right and left source, a union should be used instead. Unions pull in each of the records separately instead of combining the rows of information.

Understanding Inner Joins

Attention: Inner joins update periodically with new data instead of continuously. See linked page for details.

With an inner join, the resulting, merged dataset only includes matching rows found in both datasets. Because inner joins exclude records from either source that are missing data, the resulting datasets tend to have fewer empty columns, and the order of sources (which is left or right) matters less than with other join types.

Let’s say these tables represent the data you can find in each survey:

Customer Reward Tier (Left Source)

Customer ID Name Reward Tier
101 Phil Stein Emerald
102 Amir Dar Gold
103 Beth Green Silver
104 Lucia Vasquez Emerald

Store Experience Feedback (Right Source)

Customer ID Satisfaction (1-5) Store
101 1 Provo
104 3 Provo
113 5 Scranton

You choose to join the data based on Customer ID. This field is called a “join key.”

This is all of the data from each source that would be excluded from the final dataset.

Customer ID Name Reward Tier Satisfaction (1-5) Store
102 Amir Dar Gold Null Null
103 Beth Green Silver Null Null
113 Null Null 5 Scranton

 

This is the final output, or all of the data that would be included in your results:

Customer ID Satisfaction (1-5) Name Store Reward Tier
101 1 Phil Stein Provo Emerald
104 3 Lucia Vasquez Provo Emerald

Even though these sources are being joined by customer ID, the customers with IDs 102, 103, and 113 are excluded from the final dataset because they are missing too much information. Only Phil (101) and Lucia (104) have full data available from both sources.

Understanding Full Outer Joins

Attention: Full outer joins update periodically with new data instead of continuously. See linked page for details.

With a full outer join, the resulting, merged dataset includes all rows from both datasets. While records will be matched and merged by join key, even records that are missing data for the join key will be included in the final dataset.

Let’s say these tables represent the data you can find in each survey:

Customer Reward Tier (Left Source)

Customer ID Name Reward Tier
101 Phil Stein Emerald
102 Amir Dar Gold
104 Lucia Vasquez Emerald
Null Beth Green Silver

Store Experience Feedback (Right Source)

Customer ID Satisfaction (1-5) Store
101 1 Provo
104 3 Provo
113 5 Scranton

You choose to join the data based on Customer ID. This field is called a “join key.”

This is the final output, or all of the data that would be included in your results.

Customer ID Satisfaction (1-5) Name Store Reward Tier
101 1 Phil Stein Provo Emerald
102 Null Amir Dar Null Gold
104 3 Lucia Vasquez Provo Emerald
113 5 Null Scranton Null
Null Null Beth Green Null Silver

Notice how no data is excluded. Even Beth Green, who is missing data for Customer ID, is included in the results. If there were multiple null Customer ID rows, each of those rows would be included and remain separate, unique records.

Creating Joins

Qtip: A source can only be used in a join 8 times, regardless of dataset. (That same source can still be used in additional unions, so long as there are no more than 8 joins total.)
Qtip: There’s a maximum of 6 joins per dataset. (You can still add unions to that dataset once the join limit is reached, so long as here are no more than 6 joins total.)
  1. Create a data model.
  2. Add at least 2 sources to your data model.
    Plus sign next to source block expands a menu where you can select join

    Qtip: Make sure you include all the fields you need in your data sources, including the common field you’ll use to join your data (e.g., a unique ID).
  3. Click the plus sign ( + ) next to the data source you want to act as your Left data source.
  4. Select Join.
  5. Name the output. This is helpful if you plan to add multiple joins to your dataset.
    Menu opens along bottom of data modeler where you can configure these settings
  6. Under Input, select the Right data source.
  7. Create a join condition. Match the field that each dataset has in common.
    Example: Here we are mapping our Unique ID field from each data source to each other.
    Qtip: Almost all field types can be used in join conditions, except Date fields and Multi-Answer Text Set fields. We highly recommend using a unique identifier that matches across both data sources.
    Qtip: Recodes on join keys can’t be used.
  8. You can create separate joins in the same dataset. In this screenshot, you’d join the data from 2022 Survey and Actionability, but not the top 2 data sources.
    Scrolling down the data modeler, there are 2 more source blocks disconnected from the join happening above it. You can click the plus signs next to these blocks to get started building joins
  9. Finish creating your data model with an output dataset.
    Plus sign next to join block expands a menu where you can select output dataset

Using Auxiliary Joins

Auxiliary joins let you join multiple sources with the same join condition. Thus, they are useful when you want to create multiple, overlapping joins using the same data sources.

Earlier, we talked about how left outer joins work with 2 sources: a right source and a left source. Auxiliary joins let you set multiple left sources for the same right source in a join.

Example of an Auxiliary Join

Let’s say you have a database of store locations, with names linked to IDs. You have 2 years of surveys where you collect feedback on your stores. In the example below, we’ll find the satisfaction rating and customer ID for 2020 and 2021, and link them to a location name.

Store Locations (Left Source)

Location ID Location name
555 Provo
777 Dublin
999 Seattle
1000 Tokyo

Customer Feedback 2020 (Right Source)

Customer ID Satisfaction (1-5) Location ID
101 2 555
102 4 777
103 5 999
104 5 222

Customer Feedback 2021 (Auxiliary Right Source)

Customer ID Satisfaction (1-5) Location ID
656 5 1000
838 4 222
979 3 999
343 5 777

You join the data based on Location ID.

This is the final output, or all of the data that would be included in your results:

Location ID Location Name 2020 Customer ID 2020 Satisfaction 2021 Customer ID 2021 Satisfaction
777 Dublin 102 4 343 5
1000 Tokyo N/A N/A 656 5
999 Seattle 103 5 979 3
555 Provo 101 2 N/A N/A

Notice how 2020 and 2021 data have become separate columns in the same output dataset.

Because 2020 did not have data for Tokyo, but 2021 did, the 2020 columns are empty (N/A) for Tokyo. Likewise, 2021 had no Provo data.

Records with location ID “222” from either year were excluded from the final data source, since the store locations file had no corresponding location for that ID. See Understanding Left Outer Joins for an explanation of how data is excluded.

Creating an Auxiliary Join

Attention: You can only add up to 4 auxiliary joins per each individual join. This counts towards your overall limit of 4 joins per dataset.
  1. Click the join.
    Clicking a join makes an editor appear along the bottom of the dataset
  2. Go to Join settings.
  3. Click the plus sign ( + ).
  4. Select the Right Input.
    Adding an auxiliary join at the bottom of the join editing pane

    Example: This is the second source you want to join to your left source. In our example above, this would be the 2021 customer feedback survey.
  5. Set the left column’s join condition.
  6. Set the right column’s join condition.

Repeat steps as needed to add more auxiliary joins.

Troubleshooting Joins

You can have multiple joins in a data model. However, it’s important to be careful how these joins interact (or don’t interact). Data models currently do not support chained joins.

A chained join is different from an auxiliary join. In a chained join, you create a join with another join inside it.

The below example is an auxiliary join and is supported. Note how the join is defined inside the same single node, with overlapping sources:

image of auxiliary join with editor along the bottom for a single join node showing multiple join conditions inside the same node. the larger flow chart of the data model only has 1 box labeled join

The below example is a chained join and is not supported. Note how there are separate joins inside other joins.

the flow chart editor of the data model shows multiple joins being combined into another later join

Qtip: In general, we recommend creating your union nodes first, then joins. If you need to make joins first, this will results in slower data, but is ok so long as your join nodes come together into a union node, not another join node.

FAQs