The basic explanation of Table.Join function in Power Query is to use Merge Queries on two steps of one single query.
This will be a step by step tutorial on how to use Power Query to get a list of duplicate files that could be interesting for System Administrators and the main step of the process will include the Table.Join step.
So let’s just dig in…
We start with a folder called Finding duplicate Files that contains 12 files and three subfolders. The files that are duplicated within those folders are highlighted in the picture above. Some appear only in one folder, and some in several. Now let’s create a new Workbook and start writing a new query to find these duplicates.
The path is different depending on your version of Excel.
In 2016 the path is Data/Get Data/From File/From Folder
In 2013 and 2010 the Path is Power Query/From File/From Folder
The first step is to pick a folder and we select the Finding Duplicate Files folder.
At this point we get a result preview that lists all the files in the selected folder including many file properties that could come in handy in other scenarios.
But this list of files is not the desired result, so as in 99.9% of Queries, we choose the Edit button
This opens the Query Editor window where we will apply the desired transformations
First, we get rid of the unnecessary columns. Now instead of selecting all the unwanted columns and choosing Remove Columns command, we will rather apply the Remove Other Columns command. And we use another feature of Power Query here, we select the Folder Path column first and the Name column second. Then we right click and choose Remove Other Columns.
And there it is, notice the order of the columns… Quite ingenious right? Next up we select the Name column and apply the Transform/Format/Uppercase command. This is not necessary but it’s good practice since we are searching for duplicates and Power Query is case sensitive, this will make them “look the same” in the eyes of Power Query.
Now an important step! We must make sure that the resulting table after we apply the last transformation will be something that we can call upon later, so for clarity purposes and practical reasons, we rename the last step to Table1. For this we right click the last step in the Applied Steps sidebar and choose the Rename command.
Now that we have the filenames and Paths of all the files, we must get a list of duplicate files. For this, we will use the Transform/Group By command on the Name Column.
In the Group By dialog box, we choose to group by Name, the operation is Count Rows and we type the NumberOfAppearances for the name of the resulting new column where the result of the Count Rows operation will go.
And here is the result.
Since we are only interested in the files that appear multiple times, we use the filter get rid of those that only appear once.
And there it is. The list of files that appear more than once. This is an OK result since we now know which files are duplicated, but we don’t know in which folders they are. That information is still saved in the Table1 step. Since we want to merge the two together (the use of the word Merge is not incidental ???? ), we should rename this last step
Same as before, right click on the last step and choosing Rename
And let’s rename it to Table2
Now the stage is set for the use of M function Table.Join. We can do this two ways, either in the Advanced editor or by simply inserting a new step via the Formula Bar. If the Formula Bar is not visible in your Query Editor, you can turn it on via the View tab as shown in a picture
Once the Formula Bar is visible, you simply click the fx icon to insert a new step and write the following formula.
=Table.Join(Table1,”Name”,Table2,”Name”)
The syntax would be Table.Join(
Followed by the name of the first table (in our case we renamed that step to Table1 but generaly that would be
StepResult1,
This is followed by the column name by which you wish to merge the two (there could be more than one column here)
“NameOfTheColumnInStepResult1”,
After that you repeat the process for the second step (we renamed that into Table2
StepResult2,”NameOfTheColumnInStepResult2″)
This is all you need for the function to work, but there are a few additional parameters that the function can take. The most interesting is Join.Kind where you can choose from these types of joins
• JoinKind.Inner (the default type)
• JoinKind.LeftOuter
• JoinKind.RightOuter
• JoinKind.FullOuter
• JoinKind.LeftAnti
• JoinKind.RightAnti
And if these remind you of the Query Merge command, you are right, they are exactly those! If this optional is omitted, then the default value of Inner join is passed to the function. This is how we wrote the function, but imagine the scenarios with Right Anti Join.
After we write the function, this is the result…
Now all we have to do is to sort the file list by File Names…
And for each file that has duplicates, we get the list of all folders where this file is.
A bit of rearranging of columns
And we can load the result into Excel by using the Close & Load command
Next week I will write about using Conditional Formatting to do a sort of Anti Fill and turning the above table into this:
If you want more information on the Table.Join command, this is a good place to start.
Check out our YouTube channel and subscribe for more amazing Excel tricks!
Follow us on LinkedIn.
Check out our brand new R Academy!