Remote Web Development Job In IT And Programming

Developer for tool to sanitize and structure ingested datasets from different sources

Find more Web Development remote jobs posted recently Worldwide

The goal is to build a custom web application that will ingest specific valid data sources (XML, CSV, XLS, etc.) and then:
Dedupe the imported data against our existing data and the data within the file
Normalizes the datas formatting against a defined schema so data from different sources all follow the same structure.
Standardize the output into a defined machine-readable format that can then be used by our internal tools (JSON, XML, etc.).
Store the information in a database optimized for searching against.

The tool should also be able to fetch data from defined feeds and supplement with tags (such as the data source, date of processing, etc.). Initially this may be by manual upload, but eventually should be able to either poll a feed for an update or begin running against a feed when a change is pushed.

Each data source will need to map to specific fields in our own schema, so the incoming data stream will need to be parsed and matched appropriately. The documents that will be imported will come from a limited number of sources with each conforming to a specific template that we will provide. This tool will require a basic interface and also documented API endpoints. Ideally, this will leverage G Suite authentication so our employees will not need to create a new set of credentials for this service.

Additionally, deduplication of data is critical. For example, with LinkedIn results, a duplicate should be identified if it has the same:
First Name + Last Name + Company Name
Linkedin ID
Linkedin URL

As background, email is our strongest indicator of duplicate entries when we manually review leads currently.

Also, the tool should assign a confidence score to the duplicate identification. The above examples would have a high confidence score, but if there were partial name matches, different suffixes in a company name, etc. those factors should lower the confidence score. This applies to both duplicate entries in the files being imported as well as duplicates that conflict with existing entries in our CRM. We will want duplicates to be flagged appropriately so they can be recognized and dealt with in other workflow tools.
About the recuiter
Member since May 20, 2018
Vijayalakshmi K
from Florida, United States

Skills & Expertise Required

software development Website Development 

Candidate shortlisted and hiredHiring open till - Mar 13, 2020

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$12.50

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Integration openLDAP with FreeRadius 3

Im looking for someone that create a procedure for me to be able to configure an openLDAP docker container to a FreeRadius 3 docker container.
FreeRadius 3 docker will be running via a MySQL db.

Adding and fixing features for a website

Im looking for a web developer whos familiar with WordPress and possibly fanfiction, Tumblr, or Reddit but that isnt as necessary.

The deliverable is to make sure the website fanfictionaddiction, (a website that people can use to create i...read more

Secure Account for IBM Cloud

Provide user access security configuration recommendations and implementation according to related best practices. Ensure all infrastructure changes are logged using the IBM activity tracker and setup for reporting.

Activities Required
...read more

Need assistance with Linode

Need assistance in very small actions from time to time.
You should have a good experience with Linode and should be able to demonstrate the experience.

These items ranges from configuring / securing code from time to time. This also i...read more