It seems that lately Excel users can be separated into two groups. One group that gets the new Excel (2010 or 2013) but still uses the application as if it’s Excel 2003 and those are users that may have heard of the whole Power BI thing but think that the whole Power BI story can only benefit certain people. That is not true. Power BI is all in all a set of brilliant commands. This post will show you only one of them and that is a Get Data From Folder command in Power Query. But it will also show you how Get Data From Folder command in Power Query can make your life easier. Take the time and read this post and think about how to implement this in your work. Trust me, you will not regret it! Here’s what this post will talk about.
Following a wide demand, I also wrote a post about getting data from Excel files in a folder. You can read it here and I recently followed up with a post on how to make the process bulletproof and you can read that here.
If you get new data periodically in any form (txt, csv, xlsx…) then you know that what that means is that every time you have to go through the same process to get that data into Excel. Now this will be reduced to copying that file to a certain folder and clicking refresh in Excel. It doesn’t get much better than that. But first, let’s explain what is Power Query.
Power Query is an Excel Add-In that was created to help you get data into Excel or Power Pivot. It’s part of Microsoft Power BI and is even incorporated into the new Microsoft Power BI Designer. Everything you read about in this post refers to Excel 2010 and Excel 2013 for which Power Query is available. And now for the Get data from folder command.
When I first found this command, my enthusiasm about Excel doubled over night (and that is saying a lot). Here’s where it can really help you. Working with Excel will sooner or later get you in a position where you will be repeating one and same actions periodically. In the beginning that is great, but in time you would much rather be creating great dashboards and charts and not repeating the same imports or copying over and over again. Before Power Query was “born”, the only tool at your disposal for easing of the process was VBA. You could write a macro that did the import and accounted for all specifics in your data. But now, things got even better. What follows is a nice example of that.
Let’s say that every month, we get a new txt file that is just an export of data that we want in Excel so we can analyze it. This is how those txt files look like.
So every month we have to remove the top six rows, correctly import the columns and add them to previously imported data and analysis. Here’s how you do it with Get data from folder command in Power Query.
On the Power Query tab select From File and then From Folder. This will open a Choose a folder dialog box, where you can choose any folder you like (even folders on SharePoint).
Once you choose the folder, you get what will soon become your favorite window in Excel. It’s the Query Editor Window. And here the fun begins.
The above picture also shows two buttons we will use magnified, since one hides the dropdown menu of most used commands in Power Query. And we will use many of them during this process. But the other is even more important in this first stage and that is the double down arrow in the Content column.
The reason we need this button is that at this point, we are seeing the list of files in the folder but not the content of those files. But pressing this button will show exactly that.
Here’s what we get after pressing it.
And now the fun begins. We will do some data transforming by mostly using the dropdown menu button we talked about before.
First off, we will delete the top six rows. So from the dropdown menu we choose Remove Top Rows…
As the three dots at the end of the command suggest, you get a dialog box, and all that it needs from you is the number of top rows, you wish to remove (in our case that is 6).
After that we will choose a simple Use First Row As Headers command.
Now if you can imagine, this took care of the first six rows from the first txt file. But the same six rows from other txt files in that folder are still there, appended bellow. So the next step must be removing those. We will be using the filter in the name column to do that. First a little trick. At the beginning, the filter only shows you a limited amount of records and you must use the Load more command shown below.
Now we remove the blanks (since all rows contain the name, than these must be the six rows at the beginning of all txt files)…
…and then remove the rows containing Name (these are the header rows of other txt files)
Now we have the data we need, we just have to tell Excel what that data is. So we set the data types of the columns. So we select the column and go to Home/Data Type and select the appropriate data type.
With that done, we attend to the specifics in our data. In the City column, a question mark has replaced the apostrophe so we should revert those changes. We will use the Transform/Replace Values command.
The dialog box will remind you of a classic Find and Replace Window.
And this is what you get
Now we will use the Close & Load command.
Ok, so now we did all that we do every time we get a txt file. More or less a same amount of work, so the question is, what we have gained or what is so special about this.
First and foremost, at this point we have imported four txt files at once and now have 4000 rows. But now we just got the new txt file with new data.
All we need to do is to copy that file to the same folder as the others.
And just right click the data in Excel and choose Refresh
And right away we get 1000 new rows from the new txt.
So Power Query will take all the files in that folder and repeat all the steps that we have defined before. Brilliant. But there’s more. What if we get a txt file that needs a new rule or a change in the old set of rules? All we need to do is to right click the Query on the right and choose Edit and we are back in the Query editor and on the right you can delete or modify any of the steps that we defined in the original query.
Pure Brilliance. But if the new txt file would be totally different than the previous ones, you could define a new query for that one and then use the Append command to add the new data to our old Query.
Now do you believe me, that this is truly a game changer. Just imagine. You only do the hard work once and then it’s just a matter of refreshing your data. And that is why its called POWER BI and POWER QUERY 🙂
Check out our YouTube channel and subscribe for more amazing Excel tricks!
Follow us on LinkedIn.
Check out our brand new R Academy!
I recently posted on how to make a process of getting data from a ...
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Thank you Gasper.. Great article.. I am just starting the PowerBI tool journey – figuring how to incorporate with client work. awesome. This will get me started.
HI Gašper, I have followed your article and everything works almost perfectly. I’m finding that when I add in another months data and refresh not all of the dates get reformatted but display as the general number format. It means I end up having.
Any ideas on this. I’m using my own data for this.
There has to be something in that column, that is not a date (or an equivalent serial number)… could just be that the headers are still there, could be a strange number that Excel doesn’t recognize, but some piece of data has to prevent Power Query from applying the Date format. If the data is not sensitive, you could send it to me and I will have a look.
That will be great.
What email address do I need to send it to.
Hi,
This is great!
I need to have the filename on every row, is therenany way to accomplish this?
@Larra
Sure there is… Here is a Ken Puls version of this.
http://www.excelguru.ca/blog/2014/11/12/merge-multiple-files-with-properties
If you need any additional explanation just ask.
Is there a way to get a folder that is actually on sharepoint?
Hi,
the Get Data from a Folder will not work with SharePoint. The Get data from web will, but the question is can you use get data from web, input the URL of your SharePoint library and then input your credentials and get the file list? Last time I tried that and it didn’t work… Now Power Query does have a SharePoint list as a source in the From Other Sources section but that doesn’t give you anything even remotely resembling the Get Data From Folder command functionality.
Generally, it seems that Power Query is not support SharePoint folder as a data source, but you can use following code in Advanced Editor.
—————————————
let
Source = SharePoint.Files(“https://xxxxx.sharepoint.com/teams/yyyyyyy/”, [ApiVersion = 15])
in
Source
—————————————
That worked like a charm… Brilliant. Might just do a Post on that and will give you all the glory you deserve 🙂
This is not working for me. I get “unexpected error: Specified value has invalid CRLF characters” Any ideas on this? Thanks!
I’m guessing you get this error connecting to a SharePoint site using your organizational account? This is aaknown issue and the Power BI team is working on it…
This is great. Love it. Thank you… and replaces a macro template I built for clients importing CSV files. And while its great for us, trying to get them to trust the process…hmm…I don’t think we are quite there yet.
Hi,
Is there any chance to pull the values from a specific range of equally looking Excel files (all forms) and list them?
Example: The cell C9 of all forms contains one product characteristic. This characteristic shall be now listed as “Characteristic A”.
Any help is appreciated.
Tom
The answer is quite long but since this is not the first request for this, I’m just gonna write a post. I’ll try to write it today or tomorrow at the latest …
Read about it here
Does Power Query have the ability to pull monthly files from a folder, to maintain a 12 month rolling report? If so, how would you update the query to add a month/drop a previous month?
Thank you
The easiest way to do that would be to simply delete the old files, when new arrivals come.
The harder way would be to go for the (Year*12 + Month) indexing of your files (you can do the calculation on the fly in Power Query, but you need it to uniquely index your months) and then filtering out those that are more that 12 less than the Maximum index…
What if you didn’t want to delete the old files? Is it best to copy the files into a new folder to better manage the 12 month rolling report?
Thank you for your response.
Great post, thank you.
But i am looking for a way to include a new column with the folder name.
For example , I have:
– Year 2016
– – – January
– – – February
… and so on
unfortunate the data doesn’t contain the Dates, but I need to include it in the Pivot I am making from the imported data for the final report.
Is there a way to add a column with folder names (like January)
Read this article I wrote
https://excelunplugged.com/2016/05/28/use-power-querys-get-data-from-folder-to-get-data-from-multiple-excel-files/
it takes a slightly different route but in the end you will see that you can get the whole path, and if you can do that, you can also split by delimiter (the right-most) “/” and get the folder name and then you are one M formula away from eternal happiness and that is the Date.FromText… Read more about it here
https://msdn.microsoft.com/en-us/library/mt260703.aspx
Hi Gasper!
How much data can be imported like this into data model?
I have a number of unstructured messy files, totalling 350MB.
as soon as number of rows in data model reaches just over 1m, i get an error about memory 32-64.
If you have ANY tip, please share, would be most grateful.
This one’s easy, you need(!) 64bit Excel.
You were right! Worked like magic 😉 but still, how much data can go into 64bit data model?
From this point on it’s no longer an Excel limitation, it’s as much as your computer has free…
There is a function get the file modification date?
When I open the files by paste, I can see the date, but when I open file data, I can not see the dates.
You should try this method if you want to retain file properties
https://excelunplugged.com/2016/05/28/use-power-querys-get-data-from-folder-to-get-data-from-multiple-excel-files/
Very helpful, easy to follow guide. Thank you for writing it.
I have two xls files in the folder. But when refreshing excel spreadsheet only one appears (first one on the alphabetic order). Why is the second one not automaticly added?
Hi,
if you follow this post
https://excelunplugged.com/2016/05/28/use-power-querys-get-data-from-folder-to-get-data-from-multiple-excel-files/
it will work. Sorry I can’t be more specific as to why in your case it doesn’t work but there are just to many variables to pinpoint the problem with any degree of certainty.
Gasper, thanks for the post.
I’m wondering if you could help with something I’m trying:
I’m using Excel 2016 and Power Query to get a list of files in a folder. From there, I’d the like to be able to add columns to be able to track meta data for the files listed such as if the document has been signed by a client, or used in an invoice.
I’ve found that you can add columns to the query result, but any data you input manually doesn’t seem to be associated with the record in any way. If the query is refreshed, the manually entered data stays in the row it was entered in, even if the file record has shifted to a new row.
I know I can solve this with some VBA, but would prefer not to.
Thanks for any help!.
You could write a tedious If function if you wanted the value to stay with it’s original file, but you would be far better of creating a secondary table, and adding those values in Excel (to the original list imported as a new column) and then importing this new table by using power query.
Great Post!
I have used this concept in a 2-tier approach to bring list of files in a user interface on Excel sheet and then selectively import data from files based on user selection.
Thank you!
Hi,
Thanks so much for your article. Doesn’t seem to work with me. I use Excel 2016 and started a new query, from folder and when I press on the two arrows on top of the binary I get DataFormat.Error: External table is not in the expected format.
Any ideas?
I believe you are trying to do this with Excel files which will work in the upcoming update, but is not supported right now. This article was written to support txt, csv type files. For excel, please read the following article
https://excelunplugged.com/2016/05/28/use-power-querys-get-data-from-folder-to-get-data-from-multiple-excel-files/
Hi Gasper,
Thank you for the great post. I am using the above steps for importing data from large CSV files (~30 to 50MB file each month). Each month I add a CVS file to my folder, the query does the magic and adds new data to my powerpivot data model.
Only problem is that I want to pull data once and get rid of the CVS file. If I remove source CSV file, it lose historic data. How you’d deal with this problem. Thanks
Then Power Query is not the tool you need. You should be thinking about either Access database with some VBA or SQL code or a SQL server where you can achieve the desired result using SSIS (SQL Server Integration Services)
I have my data in word document with heading 1, heading 2 format. Column header in heading 1 format and data in heading 2 format like H1 Indroduction and below, data for this Introduction in Heading 2, and again H1 Purpose, and Data for purpose in Heading 2, Again How in H1 then Data for How in Heading 2 format and similarly for 6 categorical approach sequentially. I have multiple files. For such pattern how can I import like CSV pattern to import from import from folder in power query excel?
That’s a good one. As far as I know, you can’t import from Word files. If you change them to TXT I don’t know how that will affect your headings, but either way, you would have to somehow get Power Query to recognize Titles and subtitles and place them in seperate columns…
I have a report that I run daily. The report is emailed to me and the attachment is automatically extracted to
a folder. I then refresh and my data is updated. My problem is when I refresh I only want the files refreshed that haven’t already been added or the latest file. After refreshing the file I will manually edit certain cells and I don’t want the new refresh to change the data back to the way it was when originally added. What am I missing?
Thanks
You shouldn’t change the table that is a result of a query since that table will change on the next refresh. I would suggest using a dynamic “filtering” system for which data should be pulled in and which should be static. The filter should be a named cell in the resulting Excel, and should reflect the naming of those files… If the files go File1, FIle2 and so on, you should have an Index filter…
I have just started using Power Query and have a question. If I have loaded data from multiple csv files (say 100 files), how can I tell which file a particular data row came from? This would be most useful when narrowing down data issues from files sent by outside parties.
If you create a function, based on your original query, for importing data and then use Import from Folder, you can keep the filename column even after you expand the data tables from each file…
You can see the steps here
https://excelunplugged.com/2016/05/28/use-power-querys-get-data-from-folder-to-get-data-from-multiple-excel-files/
It’s an Excel example so even less work with the CSV files since the SheetName parameter is nonexistent.
Hello,
Great article and very helpful. I have one file that is 12 months worth of data and over the next 6 or so months will be adding one file per month (1 month of data), so this works well. However, is there a way with your solution to only load the newest data to the Data Model in PowerPivot so that Excel doesn’t have to process the 12 months worth of data every time? (for example, one of these 12 month files is 7 Million-plus rows; I’d like to avoid my team having to reprocess that every month (I don’t think their machines are powerful enough).
Thanks,
Walter
The way you would do that is either by indexing MonthYear periods and then filtering by them or by filtering the files within a folder.
I cannot find the Power Query in the excel 2016. It is only Data -> Get Query. It does not allow to combine files in the same folder.
It should be on the Data tab in the Get&Transform group.
Hi
Im looking to be able to have a variable source folder,
I need my users to be able to press a VBA Button to select the source folder and CSV file(s) into the workbook .the file format doesn’t change but source does.
Is there any way to do that?
cheers
@Jon
Yes, and it can also be done solely by Power Query which would be my first choice. The way you would do this with Power Query is to input a folder path into a predefined named cell… If this is a viable alternative, let me know and I will go on, if not, let me know which version of Excel you’re running and I’ll give you the VBA code you need.
Cheers that’s pretty much how i got it working, but ive added to it a little now.
I have a VB button that opens the folder address to a single cell table.
Then i have a drop down box that is linked to a second table which contains different query/formatting depending on the source data.
The second Button loads the chosen query and outputs it to a fresh sheet.
then i also added a refresh data button to refresh the data and a delete button to kill it off.
Cheers anyway
It’s great that you’re still replying to queries.
I also have a question about combining multiple Excel sheets. What if I have data corresponding to different sets of Dates in each Excel sheet. For example, quarterly data in sheet 1 starts from March 2004 and ends at Sep 2017 whereas sheet 2 starts from March 2007.
Would greatly appreciate your help.
Sure you can but you would need to be more specific if you want a step by step tutorial… You can go to the contact form on my page, and we can discuss your sample further there…
Hi Gasper,
I build a power query retrieving standard data from excel files. When I refresh the query it opens the first file in the folder and as a result of that the power query goes into error. Since the power query is loaded into the datamodel (powerpivot) I’m losing all relationships with my dimensions and other datasets in the model.
Are you aware of this issue? And if yes is there a way to avoid this?
I’m on excel 2013 / 32-bit and the power query version is: 2.12.3660.142 (add-on)
Hope you have a solution.
Thanks and kind regards,
Paul
Hi Paul,
If you have a proper data model with multiple tables and relationships and your Power Query loads data that contains errors, there is a big possibility that your relationships will get lost. This is due to blanks appearing in dimensions or something even more sinister.
What you should do, is either have try…otherwise statement in your Power Query or just a basic Remove Rows/Remove Errors to handle that one file in your import. I’m planning on writing a post on Tuesday on exactly this so if the answer was not clear, feel welcome to read the entire post.
Hi Gasper,
thanks for the prompt respone, much appreaciated.
I’ve included the remove error step and it solves part of the problem (not losing the relationships anymore in powerpivot). Only thing that still happens is when I open the consolidation file with the power query to retrieve data from the folder is that it opens 2 files (sometimes 3) from that folder. When I close those files the query is working fine. The question is why is it opening those files (what is the trigger) and is it possible to prevent this from happening?
Looking forward to your new/next post.
Kind regards,
Paul
That is most certainly a bug and not how Power Query is supposed to act. I’m guessing it’s something to do with the links that you have to those files and refreshing those…