Excel Unplugged

Turning Product Descriptions into list of attributes with Power Query

One task data analysts often have to deal with is matching data from one table to data in another table. Let’s look at how we get our tables matched up in a new and powerful way: using Power Query Join and a custom filter Text.Contains function. First, let’s look at simple Table Joins in Power Query and build up to the third demo in this post where we will create a whole new way of matching two tables.

Basic Join

Let’s look at a simple example. Data is in the following tables:

Simulating Fuzzy Matching with Text.Contains and Filter
Two tables we wish to consolidate

We want to match records by both name and location. To achieve that, we take the following steps.

Load both tables to Query by clicking anywhere within the table and then selecting Data > From Table/Range.

Simulating Fuzzy Matching with Text.Contains and Filter

With both tables loaded in Query, select Merge Queries.

Simulating Fuzzy Matching with Text.Contains and Filter

Set the right table Demo1_right as the second table and select Left Outer as the Join method.

Simulating Fuzzy Matching with Text.Contains and Filter

Select the name and location column in both tables.

Simulating Fuzzy Matching with Text.Contains and Filter

Confirm with OK, resulting in the following.

Simulating Fuzzy Matching with Text.Contains and Filter

By clicking on any cell in the Demo_Right Column (not on the word Table, but anywhere in the blank space next to it), we can preview the matched records below.

Simulating Fuzzy Matching with Text.Contains and Filter

Select the Expand button and confirm by clicking OK.

Simulating Fuzzy Matching with Text.Contains and Filter

In the resulting table, we can see all the matched-up records from both tables.

Simulating Fuzzy Matching with Text.Contains and Filter

Fuzzy Join

Next up, we have a real-life version of the same data. It’s the same two tables but with data at its »ugly« stage – with different cases, added last name letters, etc.

Simulating Fuzzy Matching with Text.Contains and Filter

Our goal remains the same. Match up records from both tables and find a way to ignore the difference in the case of words/letters following the first name. We proceed with the following steps.

We again load both tables to query by clicking anywhere within the table and then selecting Data > From Table/Range.

Simulating Fuzzy Matching with Text.Contains and Filter

With both tables loaded in Query, select Merge Queries.

Simulating Fuzzy Matching with Text.Contains and Filter

This time set the Demo2_right as the left table (yes, I know it’s confusing 🙂 ) and Demo1_left as the right table. Again, select Left Outer as the Join method.

Simulating Fuzzy Matching with Text.Contains and Filter

Consequently, you will notice there are only a few matches.

Tick the box to Use fuzzy matching to perform the merge. More options are now available—tick both the Ignore case and Match by combining text parts.

Simulating Fuzzy Matching with Text.Contains and Filter

You will notice the number of matches now increases.

We click the Expand button again and this time this is the result.

Simulating Fuzzy Matching with Text.Contains and Filter

Notice how »bob« beautifully matches up with »Bob«!

Next Level Matchup

And last but not least, we have a match where we match text that is not the same and are in search of partial matches but realizing that even those may be different. We start with data in the form of paragraphs.

For example, descriptions of products, as shown in the image below.

Simulating Fuzzy Matching with Text.Contains and Filter

As a result, we want to get a table of all the attributes. The result we are striving for can be seen in the image below.

Simulating Fuzzy Matching with Text.Contains and Filter

The problem is attributes are described differently in every row. For example, »5 doors« appear as »5 doors« in one entry and as »five doors« in another. This is why we start by defining a Translation Table where each row contains all the possible appearances belonging to one attribute.

Simulating Fuzzy Matching with Text.Contains and Filter

For instance, we want »5 Doors «, »five doors, « and »5 doors« to be synonyms. We follow the steps below.

Again, load both tables to Query by clicking anywhere within the table and then selecting Data > From Table/Range.

We named our queries Demo3_BaseTable and Demo3_TranslationTable. Select Demo3_TranslationTable and select Column1. Now use one of my favorite commands in Power query: right-click> Unpivot Other Columns.

Simulating Fuzzy Matching with Text.Contains and Filter

Delete the Attribute column and name this last step Specs (by right-clicking the Step Name and Choosing Rename).

Simulating Fuzzy Matching with Text.Contains and Filter

Type in the following formula in the formula entry bar.

 = Table.AddColumn(Demo3_BaseTable, "Custom", each Table.SelectRows(Specs, (sp)=> Text.Contains([Description], sp[Value])))

Arguments are as follows:

  • Demo3_BaseTable is our BaseTable
  • “Custom” is the name of the new column
  • Specs are the previous step
  • (sp) is the abbreviation for Specs step
  • Description is the column of our Demo3_BaseTable
  • sp[Value] is the Value column of Specs or sp table

The formula adds a column to our BaseTable. It goes through each Specs[Value] value and checks if it’s contained in Demo3_BaseTable[Description]. It then returns all the contained values for each row.

At this point, we have the result below.

Simulating Fuzzy Matching with Text.Contains and Filter

We select to expand Column1.

Simulating Fuzzy Matching with Text.Contains and Filter

As a result, we end up with the table below.

Simulating Fuzzy Matching with Text.Contains and Filter

Next you select Add Column > Add Custom Column and type in “x”.

Simulating Fuzzy Matching with Text.Contains and Filter

Here is the table we get.

Simulating Fuzzy Matching with Text.Contains and Filter

We click on Column1 and select Pivot. Now we select Custom column as values.

Simulating Fuzzy Matching with Text.Contains and Filter

Consequently, we get the final table.

Simulating Fuzzy Matching with Text.Contains and Filter

Finally, we load the result to Excel.

Simulating Fuzzy Matching with Text.Contains and Filter

Just brilliant!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.