Find more Web Development Remote Jobs posted recently Worldwide

Required software development,Website Development freelancer for Developer for tool to sanitize and structure ingested datasets from different sources job

Posted at - Dec 2, 2020

Toogit Instant Connect Enabled


The goal is to build a custom web application that will ingest specific valid data sources (XML, CSV, XLS, etc.) and then:
Dedupe the imported data against our existing data and the data within the file
Normalizes the data's formatting against a defined schema so data from different sources all follow the same structure.
Standardize the output into a defined machine-readable format that can then be used by our internal tools (JSON, XML, etc.).
Store the information in a database optimized for searching against.

The tool should also be able to fetch data from defined feeds and supplement with tags (such as the data source, date of processing, etc.). Initially this may be by manual upload, but eventually should be able to either poll a feed for an update or begin running against a feed when a change is pushed.

Each data source will need to map to specific fields in our own schema, so the incoming data stream will need to be parsed and matched appropriately. The documents that will be imported will come from a limited number of sources with each conforming to a specific template that we will provide. This tool will require a basic interface and also documented API endpoints. Ideally, this will leverage G Suite authentication so our employees will not need to create a new set of credentials for this service.

Additionally, deduplication of data is critical. For example, with LinkedIn results, a duplicate should be identified if it has the same:
First Name + Last Name + Company Name
Linkedin ID
Linkedin URL

As background, email is our strongest indicator of duplicate entries when we manually review leads currently.

Also, the tool should assign a confidence score to the duplicate identification. The above examples would have a high confidence score, but if there were partial name matches, different suffixes in a company name, etc. those factors should lower the confidence score. This applies to both duplicate entries in the files being imported as well as duplicates that conflict with existing entries in our CRM. We will want duplicates to be flagged appropriately so they can be recognized and dealt with in other workflow tools.

About the recuiterMember since May 20, 2018 Vijayalakshmi K
from Florida, United States

Skills & Expertise Required

software development Website Development 

Candidate shortlisted and hiredHiring open till - Jan 1, 2021

Work from Anywhere
40 hrs / week
Hourly Type
Remote Job
$12.50
Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions


Apply on more work from home jobs posted in Web Development category.