Qualtrics Platform
Customer Journey Optimizer
XM Discover
Qualtrics Social Connect

Joins (CX)

About Joins

Joins allow you to combine rows from 2 or more data sources based on a related column of data that they share. By using a join, you can gather and analyze the combined data more efficiently and effectively, creating more insights.

Example: Join the support tickets resolved for a customer with their survey data, directory contact data, and more, giving you a full profile of a customer’s interactions with your company.

The CX data modeler only supports left outer joins, inner joins, and outer joins. Only left outer joins update continuously.

Qtip: Confused about what datasets are vs. data sources? Not sure how to tell the data mapper and modeler apart? Learn more about these and more key terms.

Join Keys

A join key is a field that helps match multiple datasets, identifying what records should be combined. For example, unique identifiers you map to your dashboard data, such as IDs and usernames. Join keys are important to making sure a join is set up correctly.

Example: The same username identifies the same customer across multiple surveys, helping you link all of their feedback in one place.

Example: A ticket created from a survey response saves the corresponding survey response ID. This response ID can be used to link the ticket information and the survey response so a customer support interaction and the customer’s feedback can be viewed as whole.

Example: Location IDs are used to identify what store location customers are reviewing. You use the location ID to combine records across stores so you can compare store performance.

Warning: You cannot use fields that have been recoded as join keys. If you recode a field after it’s been used as a field key, you won’t be able to publish dataset changes.

Understanding Left Outer Joins

To understand how a left outer join works, let’s look at an example.

Look at the image of the dataset below. The first data source on top is our “left” data source, and the second data source on the bottom is our “right” data source.

Let’s say these tables represent the data you can find in each survey:

Store Locations (Left Source)

Location ID	Location name
555	Provo
777	Dublin
999	Seattle
1000	Tokyo

Customer Feedback (Right Source)

Customer ID	Satisfaction (1-5)	Location ID
101	2	555
102	4	777
103	5	999
104	5	222

You choose to join the data based on Location ID. This field is called a “join key.”

This is the data unique to the second dataset. This would be removed:

Customer ID	Satisfaction (1-5)	Location ID
104	5	222

This is the final output, or all of the data that would be included in your results:

Location ID	Location name	Customer ID	Satisfaction (1-5)
555	Provo	101	2
777	Dublin	102	4
999	Seattle	103	5
1000	Tokyo	Null	Null

Notice how the Provo, Dublin, and Seattle results contain columns of data from both the Right and Left sources, because these rows shared common location IDs.

There was no data for Tokyo in the Right source, and so the Tokyo row has null values under the new Customer ID and Satisfaction columns.

Importance of Unique Join Keys

Because join keys help to identify rows that need to be combined from the left and right sources, we recommend making sure the join key you use acts like a unique identifier. Otherwise, if there are multiple records in the right source that match the join key of the left source, only one of them will be pulled randomly.

Example: Let’s look at the example we discussed above. We have the same left source. But in the right source, we have these rows:

Customer ID	Satisfaction (1-5)	Location ID
101	2	555
107	4	555

In the resulting join, only one of the 555 rows will be saved, but not both.

If the join key isn’t unique for each record on a right source, and you want to include all records from both the right and left source, a union should be used instead. Unions pull in each of the records separately instead of combining the rows of information.

Understanding Inner Joins

Attention: Inner joins update periodically with new data instead of continuously. See linked page for details.

With an inner join, the resulting, merged dataset only includes matching rows found in both datasets. Because inner joins exclude records from either source that are missing data, the resulting datasets tend to have fewer empty columns, and the order of sources (which is left or right) matters less than with other join types.

Let’s say these tables represent the data you can find in each survey:

Customer Reward Tier (Left Source)

Customer ID	Name	Reward Tier
101	Phil Stein	Emerald
102	Amir Dar	Gold
103	Beth Green	Silver
104	Lucia Vasquez	Emerald

Store Experience Feedback (Right Source)

Customer ID	Satisfaction (1-5)	Store
101	1	Provo
104	3	Provo
113	5	Scranton

You choose to join the data based on Customer ID. This field is called a “join key.”

This is all of the data from each source that would be excluded from the final dataset.

Customer ID	Name	Reward Tier	Satisfaction (1-5)	Store
102	Amir Dar	Gold	Null	Null
103	Beth Green	Silver	Null	Null
113	Null	Null	5	Scranton

This is the final output, or all of the data that would be included in your results:

Customer ID	Satisfaction (1-5)	Name	Store	Reward Tier
101	1	Phil Stein	Provo	Emerald
104	3	Lucia Vasquez	Provo	Emerald

Even though these sources are being joined by customer ID, the customers with IDs 102, 103, and 113 are excluded from the final dataset because they are missing too much information. Only Phil (101) and Lucia (104) have full data available from both sources.

Understanding Full Outer Joins

Attention: Full outer joins update periodically with new data instead of continuously. See linked page for details.

With a full outer join, the resulting, merged dataset includes all rows from both datasets. While records will be matched and merged by join key, even records that are missing data for the join key will be included in the final dataset.

Let’s say these tables represent the data you can find in each survey:

Customer Reward Tier (Left Source)

Customer ID	Name	Reward Tier
101	Phil Stein	Emerald
102	Amir Dar	Gold
104	Lucia Vasquez	Emerald
Null	Beth Green	Silver

Store Experience Feedback (Right Source)

Customer ID	Satisfaction (1-5)	Store
101	1	Provo
104	3	Provo
113	5	Scranton

You choose to join the data based on Customer ID. This field is called a “join key.”

This is the final output, or all of the data that would be included in your results.

Customer ID	Satisfaction (1-5)	Name	Store	Reward Tier
101	1	Phil Stein	Provo	Emerald
102	Null	Amir Dar	Null	Gold
104	3	Lucia Vasquez	Provo	Emerald
113	5	Null	Scranton	Null
Null	Null	Beth Green	Null	Silver

Notice how no data is excluded. Even Beth Green, who is missing data for Customer ID, is included in the results. If there were multiple null Customer ID rows, each of those rows would be included and remain separate, unique records.

Qtip: To include every row from both datasets, use a Union. A full outer join combines data from both sets while matching records based on the join key, which will result in fewer rows than the combined total from both datasets. It will include all joined records, as well as any records that failed to join from both datasets.

Creating Joins

Qtip: A source can only be used in a join 8 times, regardless of dataset. (That same source can still be used in additional unions, so long as there are no more than 8 joins total.)

Qtip: There’s a maximum of 6 joins per dataset. (You can still add unions to that dataset once the join limit is reached, so long as here are no more than 6 joins total.)

Create a data model.
Add at least 2 sources to your data model.

Qtip: Make sure you include all the fields you need in your data sources, including the common field you’ll use to join your data (e.g., a unique ID).
Click the plus sign ( + ) next to the data source you want to act as your Left data source.
Select Join.
Name the output. This is helpful if you plan to add multiple joins to your dataset.
Under Input, select the Right data source.
Create a join condition. Match the field that each dataset has in common.
Example: Here we are mapping our Unique ID field from each data source to each other.

Qtip: Almost all field types can be used in join conditions, except Date fields and Multi-Answer Text Set fields. We highly recommend using a unique identifier that matches across both data sources.
You can create separate joins in the same dataset. In this screenshot, you’d join the data from 2022 Survey and Actionability, but not the top 2 data sources.
Finish creating your data model with an output dataset.

Using Auxiliary Joins

Auxiliary joins let you join multiple sources with the same join condition. Thus, they are useful when you want to create multiple, overlapping joins using the same data sources.

Earlier, we talked about how left outer joins work with 2 sources: a right source and a left source. Auxiliary joins let you set multiple left sources for the same right source in a join.

Example of an Auxiliary Join

Let’s say you have a database of store locations, with names linked to IDs. You have 2 years of surveys where you collect feedback on your stores. In the example below, we’ll find the satisfaction rating and customer ID for 2020 and 2021, and link them to a location name.

Store Locations (Left Source)

Location ID	Location name
555	Provo
777	Dublin
999	Seattle
1000	Tokyo

Customer Feedback 2020 (Right Source)

Customer ID	Satisfaction (1-5)	Location ID
101	2	555
102	4	777
103	5	999
104	5	222

Customer Feedback 2021 (Auxiliary Right Source)

Customer ID	Satisfaction (1-5)	Location ID
656	5	1000
838	4	222
979	3	999
343	5	777

You join the data based on Location ID.

This is the final output, or all of the data that would be included in your results:

Location ID	Location Name	2020 Customer ID	2020 Satisfaction	2021 Customer ID	2021 Satisfaction
777	Dublin	102	4	343	5
1000	Tokyo	N/A	N/A	656	5
999	Seattle	103	5	979	3
555	Provo	101	2	N/A	N/A

Notice how 2020 and 2021 data have become separate columns in the same output dataset.

Because 2020 did not have data for Tokyo, but 2021 did, the 2020 columns are empty (N/A) for Tokyo. Likewise, 2021 had no Provo data.

Records with location ID “222” from either year were excluded from the final data source, since the store locations file had no corresponding location for that ID. See Understanding Left Outer Joins for an explanation of how data is excluded.

Creating an Auxiliary Join

Attention: You can add up to 6 auxiliary joins per each individual join. This counts towards your overall limit of 6 joins per dataset.

Click the join.
Go to Join settings.
Click the plus sign ( + ).
Select the Right Input.

Example: This is the second source you want to join to your left source. In our example above, this would be the 2021 customer feedback survey.
Set the left column’s join condition.
Set the right column’s join condition.

Repeat steps as needed to add more auxiliary joins.

Troubleshooting Joins

You can have multiple joins in a data model. However, it’s important to be careful how these joins interact (or don’t interact). Data models currently do not support chained joins.

A chained join is different from an auxiliary join. In a chained join, you create a join with another join inside it.

The below example is an auxiliary join and is supported. Note how the join is defined inside the same single node, with overlapping sources:

The below example is a chained join and is not supported. Note how there are separate joins inside other joins.

Qtip: In general, we recommend creating your union nodes first, then joins. If you need to make joins first, this will results in slower data, but is ok so long as your join nodes come together into a union node, not another join node.