Excel Unplugged

Table.Join – The Power Tool of Power Query

The basic explanation of Table.Join function in Power Query is to use Merge Queries on two steps of one single query.
This will be a step by step tutorial on how to use Power Query to get a list of duplicate files that could be interesting for System Administrators and the main step of the process will include the Table.Join step. So let’s just dig in…

Power Query Table Join

We start with a folder called Finding duplicate Files that contains 12 files and three subfolders. The files that are duplicated within those folders are highlighted in the picture above. Some appear only in one folder, and some in several. Now let’s create a new Workbook and start writing a new query to find these duplicates. The path is different depending on your version of Excel.
In 2016 the path is Data/Get Data/From File/From Folder

Power Query Table Join

In 2013 and 2010 the Path is Power Query/From File/From Folder

Power Query Table Join

The first step is to pick a folder and we select the Finding Duplicate Files folder.

Power Query Table Join

At this point we get a result preview that lists all the files in the selected folder including many file properties that could come in handy in other scenarios.

Power Query Table Join

But this list of files is not the desired result, so as in 99.9% of Queries, we choose the Edit button

Power Query Table Join

This opens the Query Editor window where we will apply the desired transformations

Power Query Table Join

First, we get rid of the unnecessary columns. Now instead of selecting all the unwanted columns and choosing Remove Columns command, we will rather apply the Remove Other Columns command. And we use another feature of Power Query here, we select the Folder Path column first and the Name column second. Then we right click and choose Remove Other Columns.

Power Query Table Join

And there it is, notice the order of the columns… Quite ingenious right? Next up we select the Name column and apply the Transform/Format/Uppercase command. This is not necessary but it’s good practice since we are searching for duplicates and Power Query is case sensitive, this will make them “look the same” in the eyes of Power Query.

Power Query Table Join

Now an important step! We must make sure that the resulting table after we apply the last transformation will be something that we can call upon later, so for clarity purposes and practical reasons, we rename the last step to Table1. For this we right click the last step in the Applied Steps sidebar and choose the Rename command.

Power Query Table Join

Now that we have the filenames and Paths of all the files, we must get a list of duplicate files. For this, we will use the Transform/Group By command on the Name Column.

Power Query Table Join

In the Group By dialog box, we choose to group by Name, the operation is Count Rows and we type the NumberOfAppearances for the name of the resulting new column where the result of the Count Rows operation will go.

Power Query Table Join

And here is the result.

Power Query Table Join

Since we are only interested in the files that appear multiple times, we use the filter get rid of those that only appear once.

Power Query Table Join

And there it is. The list of files that appear more than once. This is an OK result since we now know which files are duplicated, but we don’t know in which folders they are. That information is still saved in the Table1 step. Since we want to merge the two together (the use of the word Merge is not incidental 😊 ), we should rename this last step

Power Query Table Join

Same as before, right click on the last step and choosing Rename

Power Query Table Join

And let’s rename it to Table2

Power Query Table Join

Now the stage is set for the use of M function Table.Join. We can do this two ways, either in the Advanced editor or by simply inserting a new step via the Formula Bar. If the Formula Bar is not visible in your Query Editor, you can turn it on via the View tab as shown in a picture

Power Query Table Join

Once the Formula Bar is visible, you simply click the fx icon to insert a new step and write the following formula.
=Table.Join(Table1,”Name”,Table2,”Name”)
The syntax would be Table.Join(
Followed by the name of the first table (in our case we renamed that step to Table1 but generaly that would be
StepResult1,
This is followed by the column name by which you wish to merge the two (there could be more than one column here)
“NameOfTheColumnInStepResult1”,
After that you repeat the process for the second step (we renamed that into Table2
StepResult2,”NameOfTheColumnInStepResult2″)
This is all you need for the function to work, but there are a few additional parameters that the function can take. The most interesting is Join.Kind where you can choose from these types of joins
• JoinKind.Inner (the default type)
• JoinKind.LeftOuter
• JoinKind.RightOuter
• JoinKind.FullOuter
• JoinKind.LeftAnti
• JoinKind.RightAnti
And if these remind you of the Query Merge command, you are right, they are exactly those! If this optional is omitted, then the default value of Inner join is passed to the function. This is how we wrote the function, but imagine the scenarios with Right Anti Join.

Power Query Table Join

After we write the function, this is the result…

Power Query Table Join

Now all we have to do is to sort the file list by File Names…

Power Query Table Join

And for each file that has duplicates, we get the list of all folders where this file is.

Power Query Table Join

A bit of rearranging of columns

Power Query Table Join

And we can load the result into Excel by using the Close & Load command

Power Query Table Join

Next week I will write about using Conditional Formatting to do a sort of Anti Fill and turning the above table into this:

Power Query Table Join

If you want more information on the Table.Join command, this is a good place to start.

Comments 2

  1. Frank Tonsen says:

    I prefer to group by a second column that doesn’t aggregate (“All rows”) and then to expand the path.
    Kind of intrinsic join.

    1. Sure Thing Frank. As is always the case in Excel, there are many things to get to the desired result, but I really wanted to make a case for Table.Join as I believe it should have it’s own (Join Steps) command in the Query editor, since it’s so useful. But thanks for contributing your solution!

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: