The goal is to build a custom web application that will ingest specific valid data sources (XML, CSV, XLS, etc.) and then:
Dedupe the imported data against our existing data and the data within the file
Normalizes the data's formatting against a defined schema so data from different sources all follow the same structure.
Standardize the output into a defined machine-readable format that can then be used by our internal tools (JSON, XML, etc.).
Store the information in a database optimized for searching against.
The tool should also be able to fetch data from defined feeds and supplement with tags (such as the data source, date of processing, etc.). Initially this may be by manual upload, but eventually should be able to either poll a feed for an update or begin running against a feed when a change is pushed.
Each data source will need to map to specific fields in our own schema, so the incoming data stream will need to be parsed and matched appropriately. The documents that will be imported will come from a limited number of sources with each conforming to a specific template that we will provide. This tool will require a basic interface and also documented API endpoints. Ideally, this will leverage G Suite authentication so our employees will not need to create a new set of credentials for this service.
Additionally, deduplication of data is critical. For example, with LinkedIn results, a duplicate should be identified if it has the same:
First Name + Last Name + Company Name
Linkedin ID
Linkedin URL
As background, email is our strongest indicator of duplicate entries when we manually review leads currently.
Also, the tool should assign a confidence score to the duplicate identification. The above examples would have a high confidence score, but if there were partial name matches, different suffixes in a company name, etc. those factors should lower the confidence score. This applies to both duplicate entries in the files being imported as well as duplicates that conflict with existing entries in our CRM. We will want duplicates to be flagged appropriately so they can be recognized and dealt with in other workflow tools.
About the recuiterMember since May 20, 2018 Vijayalakshmi K
from Florida, United States