Skip to main content
Loading...
Skip to article
  • Qualtrics Platform
    Qualtrics Platform
  • Customer Journey Optimizer
    Customer Journey Optimizer
  • XM Discover
    XM Discover
  • Qualtrics Social Connect
    Qualtrics Social Connect

Joins (CX)


Was this helpful?


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The feedback you submit here is used only to help improve this page.

That’s great! Thank you for your feedback!

Thank you for your feedback!


Qtip: The data modeler and related functionality is not yet available to all customers. If you’re interested in this feature, please reach out to your XM Success Representative. Qualtrics may, in its sole discretion and without liability, change the timing of any product feature rollout, change the functionality for any in preview or in development product feature, or choose not to release a product feature or functionality for any reason or for no reason.

About Joins

Joins allow you to combine rows from 2 or more data sources based on a related column of data that they share. By using a join, you can gather and analyze the combined data more efficiently and effectively, creating more insights.

Example: Join the support tickets resolved for a customer with their survey data, directory contact data, and more, giving you a full profile of a customer’s interactions with your company.

The data modeler only supports left outer joins.

Qtip: Confused about what datasets are vs. data sources? Not sure how to tell the data mapper and modeler apart? Learn more about these and more key terms.

Understanding Left Outer Joins

To understand how a left outer join works, let’s look at an example.

Look at the image of the dataset below. The first data source on top is our “left” data source, and the second data source on the bottom is our “right” data source.

Each data source is a block. The one on top is an imported data project named store locations. The block on bottom is a survey named customer feedback

Let’s say these tables represent the data you can find in each survey:

Store Locations (Left Source)

Location ID Location name
555 Provo
777 Dublin
999 Seattle
1000 Tokyo

Customer Feedback (Right Source)

Customer ID Satisfaction (1-5) Location ID
101 2 555
102 4 777
103 5 999
104 5 222

You choose to join the data based on Location ID. This field is called a “join key.”

This is the data unique to the second dataset. This would be removed:

Customer ID Satisfaction (1-5) Location ID
104 5 222

This is the final output, or all of the data that would be included in your results:

Location ID Location name Customer ID Satisfaction (1-5)
555 Provo 101 2
777 Dublin 102 4
999 Seattle 103 5
1000 Tokyo Null Null

Notice how the Provo, Dublin, and Seattle results contain columns of data from both the Right and Left sources, because these rows shared common location IDs.

There was no data for Tokyo in the Right source, and so the Tokyo row has null values under the new Customer ID and Satisfaction columns.

Importance of Unique Join Keys

Because join keys help to identify rows that need to be combined form the left and right sources, we recommend making sure the join key you use acts like a unique identifier. Otherwise, if there are multiple records in the right source that match the join key of the left source, only one of them will be pulled randomly.

Example: Let’s look at the example we discussed above. We have the same left source. But in the right source, we have these rows:

Customer ID Satisfaction (1-5) Location ID
101 2 555
107 4 555

In the resulting join, only one of the 555 rows will be saved, but not both.

If the join key isn’t unique for each record on a right source, and you want to include all records from both the right and left source, a union should be used instead. Unions pull in each of the records separately instead of combining the rows of information.

Creating Joins

Qtip: A source can only be used in a join 12 times, regardless of dataset.
Qtip: There’s a maximum of 6 joins per dataset.
  1. Create a data model.
  2. Add at least 2 sources to your data model.
    Plus sign next to source block expands a menu where you can select join

    Qtip: Make sure you include all the fields you need in your data sources, including the common field you’ll use to join your data (e.g., a unique ID).
  3. Click the plus sign ( + ) next to the data source you want to act as your Left data source.
  4. Select Join.
  5. Name the output. This is helpful if you plan to add multiple joins to your dataset.
    Menu opens along bottom of data modeler where you can configure these settings
  6. Under Input, select the Right data source.
  7. Create a join condition. Match the field that each dataset has in common.
    Example: Here we are mapping our Unique ID field from each data source to each other.
    Qtip: Fields of any type can be used in join conditions. We highly recommend using a unique identifier that matches across both data sources.
    Qtip: Recodes on join keys can’t be used.
  8. If you have at least 1 other, separate data source below the sources you joined, you can create another join using the existing join.
    Plus sign next to source block expands a menu where you can select join
  9. You can create separate joins in the same dataset. In this screenshot, you’d join the data from 2022 Survey and Actionability, but not the top 2 data sources.
    Scrolling down the data modeler, there are 2 more source blocks disconnected from the join happening above it. You can click the plus signs next to these blocks to get started building joins
  10. Finish creating your data model with an output dataset.
    Plus sign next to join block expands a menu where you can select output dataset

Using Auxiliary Joins

Auxiliary joins let you join multiple sources with the same join condition. Thus, they are useful when you want to create multiple, overlapping joins using the same data sources.

Earlier, we talked about how left outer joins work with 2 sources: a right source and a left source. Auxiliary joins let you set multiple left sources for the same right source in a join.

Example of an Auxiliary Join

Let’s say you have a database of store locations, with names linked to IDs. You have 2 years of surveys where you collect feedback on your stores. In the example below, we’ll find the satisfaction rating and customer ID for 2020 and 2021, and link them to a location name.

Store Locations (Left Source)

Location ID Location name
555 Provo
777 Dublin
999 Seattle
1000 Tokyo

Customer Feedback 2020 (Right Source)

Customer ID Satisfaction (1-5) Location ID
101 2 555
102 4 777
103 5 999
104 5 222

Customer Feedback 2021 (Auxiliary Right Source)

Customer ID Satisfaction (1-5) Location ID
656 5 1000
838 4 222
979 3 999
343 5 777

You join the data based on Location ID.

This is the final output, or all of the data that would be included in your results:

Location ID Location Name 2020 Customer ID 2020 Satisfaction 2021 Customer ID 2021 Satisfaction
777 Dublin 102 4 343 5
1000 Tokyo N/A N/A 656 5
999 Seattle 103 5 979 3
555 Provo 101 2 N/A N/A

Notice how 2020 and 2021 data have become separate columns in the same output dataset.

Because 2020 did not have data for Tokyo, but 2021 did, the 2020 columns are empty (N/A) for Tokyo. Likewise, 2021 had no Provo data.

Records with location ID “222” from either year were excluded from the final data source, since the store locations file had no corresponding location for that ID. See Understanding Left Outer Joins for an explanation of how data is excluded.

Creating an Auxiliary Join

Attention: You can only add up to 4 auxiliary joins per each individual join. This counts towards your overall limit of 4 joins per dataset.
  1. Click the join.
    Clicking a join makes an editor appear along the bottom of the dataset
  2. Go to Join settings.
  3. Click the plus sign ( + ).
  4. Select the Right Input.
    Adding an auxiliary join at the bottom of the join editing pane

    Example: This is the second source you want to join to your left source. In our example above, this would be the 2021 customer feedback survey.
  5. Set the left column’s join condition.
  6. Set the right column’s join condition.

Repeat steps as needed to add more auxiliary joins.

Auxiliary Joins vs. Subsequent Joins

In addition to auxiliary joins, you can click the plus sign ( + ) next to your completed join and select Join. We’ll call this method a “subsequent join.”

Adding a standard join to the data modeler

Generally, auxiliary joins are useful when you want to create multiple overlapping joins using the same data sources. Subsequent joins are best if you want to create a join with 2 additional data sources that were not included in your first join.

Qtip: Subsequent joins are also useful if you want to make multiple joins where a second join key needs to be pulled from another source. For example, let’s say you have a survey with Store ID.

  • You use the Store ID to join this data to a separate source with more location data.
  • The location data source also has a field called “Store Manager.”
  • In your third source, you have company hierarchy information that you’re joining based on the “Store Manager” field.

Using this method, survey data, location data, and the company hierarchy can all be combined in one dataset.

However, there are times that auxiliary joins and subsequent joins can have the same output.

Example: Expanding on our example above, you would get the same results if you created a subsequent join with Location ID from Store Locations on the left and 2021’s location ID on the right.

Image of subsequent left join condition

However, when you create the first join, it doesn’t result in just one Location ID field. For this example, you end up with a column for both the Store Locations and 2020 Customer Feedback versions of the Location ID. If you try to join your data using the Location ID from the 2020 Customer Feedback, the results would be different than they’d be with the auxiliary join.

Image of subsequent right join condition

Here’s what that table of results would look like. Because 2020 has no data for Tokyo, the existing 2021 Tokyo data gets excluded fro the final results.

Location ID Location Name 2020 Customer ID 2020 Satisfaction 2021 Customer ID 2021 Satisfaction
777 Dublin 102 4 343 5
1000 Tokyo N/A N/A N/A N/A
999 Seattle 103 5 979 3
555 Provo 101 2 N/A N/A

FAQs