Web Scraping + Automation + Excel
Looking for an experienced Web Scraping, Data Mining and Automation specialist. This is a very well defined and streamlined task that involves triggering multiple scripts form within excel, with each script performing a web scraping operation and returning the data/findings to excel.
The job involves web scraping 4 websites for different pieces of data and integration with excel.
Each step below needs to be run from within Excel, executed as an excel macro maybe?
Inputs:
Websites A, B, C, D
Dates from(D1) and to(D2)
Step (1)
- Go to website A
- Bypass captcha and move to next/search page
- Perform a search between D1 and D2, and selecting an additional choice from a multiple choice item.
- Results will be displayed as a table.
- For each row, if one of the fields equals 'X', press on a link to open the details
- Extract some information from the page, say into columns C1, C2, C3 ... C10
- We need a pdf from that same page, download pdf and upload to Google Drive (credentials will be provided).
- Populate excel sheet with C1 (with hyperlink to pdf), C2, C3, ... C10
Step (1) BONUS - If you are able to do this, I'm willing to pay a higher price.
Extract two fields from pdf - by performing OCR - and store as C11 and C12
Step (2)
- Go to website B
- For each row obtained in Step (1), use two columns - say C3 and C4 - to search for information on website B.
- Extract info found on page into some columns C11 ... C15
- Make C11 a hyperlink to the result page, if possible.
- If not possible, get a pdf of the page obtained and upload to Google Drive, then make C11 a hyperlink to that file. (Printable version of the page is available on website B after performing the search).
Step (3)
- Go to website C
- For each row obtained in Step (1), use two columns - say C12 and C13 - to search for information on website C.
- Extract info found on page into some columns C16 ... C20
- Make C16 a hyperlink to a public website providing input of C16 value.
Step (4)
- For each row obtained in Step (1), use column C20 as input to perform an API call to a publicly available service (simple API call).
- Store result in column C21
Step (5)
- For each row, perform a simple mathematical operation between a couple of columns in that row and store as an additional column, C22
Step (6)
- Filter sheet based on C22 value.
Step (7)
- Go to website D, enter credentials (provided).
- For each row, perform a search on the website, get information and store in excel sheet C23 ... CN
Done
Total of 7 buttons/macros in excel sheet ... Each step performs a function. You will need to sign an NDA.
About the recuiterMember since May 20, 2018 Sigit Sujarwo
from Morbihan, France