Lexicons
About Lexicons
A lexicon is a dictionary that is used for improved accuracy with sentiment and reporting. Each dictionary contains entries that define specific terms with their associated linguistic metadata.
XM Discover contains many standard dictionaries that are essential for correct natural language processing. You can customize lexicon dictionaries associated with your projects in order to account for specialization in industry and data type. Custom lexicon dictionaries allow you to define single or multi-word phrases that should be understood as single entities or concepts within XM Discover. Custom lexicons work in addition to standard dictionaries that are provided out of the box. In case of overlapping items, a custom lexicon takes precedence over a standard one.
Use Cases
Custom lexicons help you to tune sentiment to specific projects and to clarify certain reporting features. Identifying the right lexicon candidates will improve categorization development, sentiment accuracy, and root cause analysis when extending your model.
Here are some use cases where lexicons can be useful:
- Business and industry-specific terms: Every industry has specific terms or phrases which represent key concepts. Often, these concepts are multi-word and may not be automatically identified in XM Discover. By adding these concepts to a lexicon, you can encapsulate them in a single entity. This step adds value to your reporting so that you see the most meaningful terms. For example, the automobile industry would add terms like “compact car.” This phrase is an industry-specific term and represents a single entity.
- Idiomatic expressions: Idiomatic expressions may also be added as lexicons when they have several words but represent a single concept. For example, the idiomatic expression “train wreck” would be a lexicon as it is a word pair that should be seen as one entity. Creating lexicons for idiomatic expressions like “train wreck” or “top notch” allows us to set a specific sentiment value for the whole phrase.
- Translating acronyms: Lexicons can also link acronyms back to their unabbreviated forms. For example, you could add a lexicon entry to map the acronym “FBI” to “Federal Bureau of Investigation.”
- Capturing common misspellings: Lexicons can also help account for common misspellings by linking the misspelling back to its correct or standard form. For example, the word “calendar” is commonly misspelled as “calender.” Adding the misspelling to the lexicon allows it to be mapped back to the correct spelling.
- Capturing common redaction patterns: Lexicons can be used to capture common redaction patterns and map them to the related field. For example, XXXXXXXXX to SSN.
Identifying Lexicon Candidates
Not all multi-word pairs are good lexicon candidates. Most pairs should remain independent in order to form linguistic connections. For example, “Best Buy” is a great lexicon candidate as it defines a single entity: the company.
When evaluating a new lexicon candidate, ask yourself two questions:
- Is [first word] a type of [second word]?
- If yes, are there other types of [second word] that I would want to distinguish in my reports? Or, could someone refer unambiguously to all of the variations of [second word] in aggregate by just using [plural of second word]?
If you answered “no” to any of these questions, then you have found a potential lexicon.
Associated Words
If your organization maintains an internal list of product and brand names, competitors, common acronyms, or company nicknames, that can be a good source of industry-specific terms.
You can also run an associated word report in XM Discover to see the top linguistic relationships in your project:
- In Designer, go to the Report tab.
- Use the Report On dropdown to select Associated Words.
- Choose the Model that contains the data you’d like to analyze.
- Click Run Now.
- After finding a term you’re interested in, click the Preview button to view specific instances of that word appearing in your data.
- You can then review individual responses for terms to be added to a lexicon dictionary.
Types of Lexicons
There are 5 lexicon dictionary types are available:
- Product: Contains any lexicon entries that are specific products. The first line of this dictionary should always read NamedEntity:Product.
- Brand: In this dictionary, you should add any lexicon entries that are specific brands. The first line of this dictionary should always read NamedEntity:Brand.
- Company: The company dictionary is one of four Intelligent Entities dictionaries. In this dictionary, you should add any lexicon entries that are specific companies. The first line of this dictionary should always read NamedEntity:Company
- Person: The person dictionary is one of four Intelligent Entities dictionaries. In this dictionary, you should add the names or monikers of your employees or other people of interest. The first line of this dictionary should always read NamedEntity:Person.
- Custom Lexicon (general dictionary): The custom dictionary is the space for everything else that does not classify as a product, brand, company, or person. Most frequently, this dictionary will include common industry terminology. The first line of this dictionary should always read Custom:CustomLexicon.
These dictionaries contain Intelligent Entities. They help you manage lexicons by grouping them into related terms, making it easier to keep track of what content is in each list. For companies with lots of products and brands or many competitors, this functionality makes the management process simpler.
Editing Custom Lexicons
This section covers how to edit custom lexicons. Before you can edit your lexicons, you must build a file in the correct format. See Lexicon File Format for instructions before continuing.
Custom lexicons will immediately come into effect for any data loaded from that point forward. If you wish to apply custom lexicons to historical data as well, it needs to be fully reprocessed. Please contact your Qualtrics representative if you need to do this.
The modifications that you make to any of your custom lexicon files will always be available to you in XM Discover. You can view the current file by downloading the dictionary using the steps below. For any future adjustments, just add the new entries to the bottom of the existing file.
To edit your lexicons:
- In Designer, go to the Admin tab.
- Select the Accounts section
- Click Edit for the account you want to modify lexicons.
- Go to the Dictionaries section.
- Click Custom Lexicons.
- Choose the type of lexicon you’d like to update.
- Click Download to download the current lexicon file to your computer.
- Open this file in a text editor and add your lexicon terms. See Lexicon File Format for more information. Make sure the file is saved as a DCT file.
- In the same window in Discover, click Upload.
- Click Choose File and select the DCT file saved on your computer.
- Click Upload. If there are any formatting issues, the window will tell you where the issue is so you can fix it.
- Click Finish.