Data Cleansing & DeDuplication is a commonly needed function for any data-driven industry. Customer Relationship Management (CRM) data such as Salesforce data, customer mailing lists, vendor and supplier data bases, employee data bases, voter registration data bases are just a few examples where deduplication can improve the accuracy and integrity of your data. A complete deduplication and standardization of your data can reduce duplicate payments, prevent multiple mailings going out the door, and assist in accurately aggregating corporate reports despite multiple identifiers or multiple name spellings.
Our fuzzy-matching algorithms are designed to identify discrepancies in individual names, company names, addresses, and more. In addition to using the Jaro-Winkler and Generalized Edit Distance (GED) distance functions, our analysts have developed a suite of user-defined name and address matching functions.
Listed below are some of our clients who have hired us for data cleansing projects:
- U.S. Coast Guard Exchange
- The World Bank
- Middlesex County, NJ
- National Football League (NFL)
- Large hospital in Southern California
Fuzzy matching is one of Automated Auditor’s core strengths. Fuzzy matching describes the ability to join text phrases that either look or sound alike but are not spelled the same. For example, “Elizabeth Banks” and “Banks, Liz E.” are close enough to the human eye and ear that they should be counted as similar. How is fuzzy matching performed, and why is it important?
Benefits of Fuzzy Matching
Fuzzy matching is the art and science of linking disparate words and phrases with one another. The benefits of utilizing fuzzy matching are too numerous to list, but it is commonly used to merge disparate data files which have no common key. For example, if you have student loan data and you need match it with student demographic data, but do not have the student’s social security number or student ID, you have to apply fuzzy-matching to match on the student’s name, address, or other identifying pieces of information. If you match with just exact student first name and last name, your match rate will suffer because of name variations (i.e. Robert O’Malley, Bob O Malley, Bobby O Malley). By using fuzzy-matching, you can drastically increase your match-rate.
Another benefit of implementing fuzzy-matching is to cleanse duplicate data. Cleansing and standardizing messy data is essential for maintaining a vendor file, for example. If you have a vendor or supplier file with duplicate records, your Accounts Payable department is more susceptible to making duplicate payments. By applying fuzzy-matching on the vendor’s name, address, phone number, email, and other identifying pieces of data, duplicate vendors can be identified and consolidated into one master vendor record, thereby preventing erroneous payments.
For example, suppose you have several different Hewlett Packard vendors, with the following spellings:
Using fuzzy matching, we consolidate these names into one standardized name so that an accurate aggregation can be made for this corporation.