Excel 2007 is a versatile application having grown far beyond its early versions as simply a spreadsheet solution. Employed as a record keeper, address book, forecasting tool, and much more, many people even use Excel in ways its programmer never intended
If you use Excel to keep track of many addresses, products SKUs, receipts, expenses, or other important things in your home and home office, you know that sometimes your Excel files can become unwieldy by the sheer number of records of which you are trying to keep track.
Luckily, Microsoft has built a function into Excel 2007 to help you find and remove duplicate records. Unfortunately, there are a few caveats to using the Remove Duplicates function so be careful or you may unknowingly delete records you didn’t intend to remove.
Removing Duplicate Records from Excel
As your Excel records grow, it is more likely that you have duplicate records not only bloating the size of the file but also creating inefficiencies and errors, especially if you use the file to store addresses, contacts, and product listings.
Suppose that you use Excel to keep track of addresses and you suspect that you have duplicate records. Look at the example Excel worksheet below:
Notice that the “Jones” record appears twice. To remove such duplicate records, click on the Data tab on the Ribbon and locate the Remove Duplicates functions under the Data Tools section. Click on Remove Duplicates and a new window opens.
Here you have to make a decision based on whether you use heading labels at the tops of your columns. If you do, select the option labeled My Data Has Headers. If you don’t use heading labels, you will use Excel’s standard column designation such as Column A, Column B, etc..
For this example, we will choose column A only and click the OK button. The option window closes and Excel removes the second “Jones” record.
Of, course this was just a simple example. Any address records you keep using Excel are likely to be much more complicated. Suppose, for example, you have an address file that looks like this.
Notice that although there are three “Jones” records, only two are identical. If we used the procedures above to remove duplicate records, only one “Jones” entry would remain. In this case, we need to extend our decision criteria to include both first and last names found in Columns A and B respectively.
To do this, once again click on the Data tab on the Ribbon and then click on Remove Duplicates. This time, when the options window pops up, choose columns A and B. Click the OK button and notice that this time Excel removed only one of the “Mary Jones” records.
This is because we told Excel to remove duplicates by matching records based on Columns A and B rather than just Column A. The more columns you choose, the more criteria need to be met before Excel will consider a record to be a duplicate.
A Few Caveats with Removing Duplicate Records
There are, of course, a few problems with letting Excel automatically remove duplicate records for you. First, you have to be careful of choosing too few or too many columns for Excel to use as the criteria of identifying duplicate records.
Too few and you may inadvertently delete records you need. Too many and the restrictions are too high for Excel to determine whether a record is a duplicate or not.
Second, Excel always assumes that the first unique record it comes across is the master record. Any subsequent records are assumed to be duplicates. This is a problem if, for example, you failed to amend an address of one of the people in your file but instead created a new record.
If the new (correct) address record appears after the old (out-of-date) record, Excel will assume that the first (out-of-date) record to be the master and delete any subsequent records it finds. This is why you have to be careful how liberally or conservatively you let Excel decide what is or is not a duplicate record.
Finally, Excel doesn’t ask you to verify whether you really want to delete a record. Using the parameters you choose (columns), the process is completely automated. This can be a dangerous thing when you have a huge number of records and you trust that the decisions you made were correct and allow Excel to automatically remove the duplicate records for you.
Also, be sure to check out our previous article on deleting blank lines in Excel. Enjoy!