Deduping in Salesforce: The complete how-to

Deduping in Salesforce: The complete how-to

Don’t be duped by duplicate records. Deduping seems like one of those routine system maintenance issues, but it’s actually very tricky and there's no undo.

Microsoft's free fuzzy matching plug-in

If you’re on a PC and you’re using conventional Office, have we got a deal for you:  a tool that helps you quickly find good matches using fuzzy logic, with parameters and options galore.  This guide will get you started, using Lead importing and dupe-detection as an example…but this tool is good for any of the objects in SFDC.

The Fuzzy Lookup add-on is available for download here.

Step 1: Setup

Once you've downloaded the Fuzzy Lookup add-on, open Excel. If there is a ribbon for Fuzzy Lookup, continue to step 2. If not, activate the add-on by navigating to Options and Add-ons. (Note: different versions of Excel have slight variations in the UI/navigation sequence.  If your version doesn’t match what you see here, Google your version for instructions on how to get these steps done.)

Fuzzy Lookup options David Taber/IDG

Once in options, navigate to Add-ins (1),

Check (2) to make sure the fuzzy lookup plugin is both installed and inactive, and

Use the dropdown in (3) to manage COM Add-ins.

Microsoft Fuzzy Lookup add ins David Taber/IDG

Hit Okay and check to make sure the Fuzzy Lookup ribbon has appeared.

Fuzzy Lookup ribbon David Taber/IDG

Step 2: Data input

Create four sheets in a new workbook. Name the first Master Data, the second Subject Data, the third Analysis and the fourth Duplicates.

The Master Data sheet will eventually hold your deduplicated data, the data that will be uploaded into Salesforce. Begin by taking a single data set that has been internally deduplicated and insert it into the Master Data spreadsheet. Add in two columns, one for Row Number and one for Divider. Leave the row number blank for now, include some visual aid in the divider column.

Data input David Taber/IDG

Repeat the process for Subject Data in a second worksheet. Next, normalize the columns of the Master Data with that of the Subject Data (if you’re lucky, they’re identical…but never assume that). In the example, the Subject data has the following headers:

taber subject data David Taber/IDG

And the Master has the following:

taber master David Taber/IDG

Notice the “Email” in the Master Data is actually a “Website” column, and in the Subject Data the “Website” column has phone numbers, and the “States” column has city names. Make sure the contents of the Master and Subject columns agree, as in this example:

taber email David Taber/IDG

Notice the headers are also in the same columns. If necessary, copy and paste columns in either data set to make sure the two worksheets agree. This will aid in the merge of the data sets later.

taber names David Taber

Step 3: Table Creation and Analysis

For both Master and Subject worksheets, select the entire data set (but none of the empty rows and including the Divider row) and input Ctrl-L to create a table. Select “My Table has Headers,” and hit OK.

taber table creation David Taber/IDG

Next, select the first row of the Row Number column, and input “=ROW(”. You can then select Column A Row 2, which should input the text “[@Names]” or something similar. Close the parentheses on the “=ROW(” formula and hit enter. Row numbers should appear in that column for the whole table. This will be useful for reference later.

taber row David Taber/IDG
taber fuzzy lookup configuration David Taber/IDG

Navigate to the Analysis tab and place your cursor on A1. Open the Fuzzy Lookup add-on from the ribbon.

Hit the x to clear the default configuration.

Set up a new configuration, matching the columns with the same name, minus the phone numbers and any data columns that were added to the Master sheet (i.e., are blank in Master), and click the configuration button to create the configuration. Do the same with the phone number columns. Select default configuration for the first and PhoneNumber configuration for the second, as in the screenshot below. There are further configurations for Zip Codes, which can be used if applicable.

taber phone number David Taber/IDG

Click GO to run the fuzzy matcher.

Step 4: Analyze data and dedupe

Select the Similarity header and navigate to Data > Filter. Select the chevron next to Similarity and filter out the 0 values.

taber filter David Taber/IDG
taber similarity 0 David Taber/IDG

After hitting Ok, the data with a similarity of 0 will be filtered. What will be left are the records that the system thinks are duplicated based on the parameters set.

The Similarity column shows the confidence the system has of those records being duplicates.

The left of the Divider row is the Master Dataset and the right is the Subject Dataset.

Merge the datasets from the right of the divider to the left of the divider. Once done, use the Row Number reference on the left of the divider to find the correct row in the Master data set and copy the merged data into the Master dataset in that location.

taber find row David Taber/IDG

Then use the Row Number on the right of the divider to find the same row in the Subject dataset, and remove that row, shifting the other cells upward to close the gap and pasting the duplicate row into the Duplicates sheet. In the Subject Data sheet, delete the now-empty row to clear up the blank space.

taber empty row David Taber/IDG

Finally, copy the deduplicated data from Subject into the merged data of Master (notice the row numbers aren’t being copied).

taber subject data2 David Taber/IDG

You will now have a larger Master data set that has been both deduped and merged.

taber large master data David Taber/IDG

Clean up the Subject datasheet, deleting all the headers and data so it’s ready for a new table.

Repeat steps 2-4 for all the Subject Datasets you have, leaving the merged and deduped Master Data in place. When you run out of datasets, you will have deduplicated data in the Master Sheet that you can then insert into Salesforce using its built-in data Import Wizard.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about ABBradstreetClickCreationDuPontEclipseExcelFord MotorGoogleIDGISOMarketoMicrosoftNBCParallelsSalesforce.comSimpleTestTyco

Show Comments