Remote Data Mining And Management Job In Data Science And Analytics

Scrape of SDWIS

Find more Data Mining And Management remote jobs posted recently Worldwide

I need to scrape part of a national database of water quality violation information known as the SDWIS (2013-2018). The query-able database is here: (Removed by Toogit admin)
The database is split into 9 different types of report, and I need specific variables from four of the reports. For now Im only requesting the information in the violations report, but assuming satisfactory completion I will keep you on contract and request at least three additional reports with negotiated pay for each report. Anyone with good web-scraping skills is welcome to apply. Priority will be given to applicants who offer a sample of the completed task so I can be confident in your ability to scrape the data.

The deliverable for this first contract will be one spreadsheet with all variables in the Violations report for all years and quarters. You will need to create variables for reporting year and quarter so I can identify the origin of each report. The spreadsheet will be quite large, so you may need to break it up into several spreadsheets before sending it over. I will share a workbook, the tab Sample Violations shows what I expect the report to look like. I would also like you to send me the code, preferably Python, you use to pull the data.

In this paragraph Im going to explain the entire job so you have a good idea of whether you will be able to complete the follow-up contracts. In workbook I show the variables Ill eventually request from each report along with some sample spreadsheets showing what Im looking for. The final deliverable will be four spreadsheets, one for each type of report, with all available years (2013-2018) and quarters (1-4) appended to the same spreadsheet. In the process of pulling the data for each year and quarter, youll need to create variables for year and quarter so I can identify the origin of each report. For one of the reports (Water System Summary), youll need to create two categorical variables based on sub-queries in the Further Water System Characteristics. In the spreadsheet I show these variables and the numeric indicators for each category.

Thank you for your interest, and please let me know if you have any questions about the project.
About the recuiter
Member since May 20, 2018
Mr. Dinesh Das
from Nordrhein-Westfalen, Germany

Skills & Expertise Required

Data Science & Analytics Data Mining & Management 

Candidate shortlisted and hiredHiring open till - Jan 30, 2022

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$243.43

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Excel Script for a simple utility function

We are looking for a simple excel macro based utility which does the following. The spec are given below

Input Details
File 1 : Cust_master.csv
Contains , First Name, Last Name , Email
File2 : EmailDelete.csv
both the files w...read more

Excel expert to help with formula

I have a set of data which includes the following:

1) Price
2) TImestamp

I will like to create a backtesting model for predicting future price movement. The formula will work based on this signal:

1) simulate entry ever...read more

Data Enricher/Miner for LinkedIn prospects

I have a large volume of prospect data / b2b data. I have their names, company names, job titles, location, but I dont have their email addresses.

Im looking for a data miner/enricher that can provide me with email addresses for the prosp...read more