Remote Web Development Job In IT And Programming

Need help designing a web scraping solution to surface external calendar information

Find more Web Development remote jobs posted recently Worldwide

Hi, I need help with solving an architectural/algorithmic problem around scraping calendars on the web to have near real time updates to the times we show on our app. The current design involves having a recurring job scrape a site for all of the free times available for a month for a service that is 30 minutes long (our service duration interval) and storing the free times in our database. Then, when a user comes to our site and chooses their services, we pull the free times from our cache, resolve what times can accommodate the aggregate service duration, and show those to our users. The issue is that one of the external scheduling providers has a bug where they dont show all of the times they have available to book. So the optimization of only scraping the times for a 30 min duration and using those free times to calculate which ones will work for a larger duration at runtime gets thrown out of the window. The only other option we can think of is scraping for each individual time interval but that makes the scrape/caching job take way too long to be feasible. The scraping script takes -2min per pass and we need to do it for at least 2 months (the current and next) so for 100 stylists using our current implementation, the job takes 2min * 100 stylists * 2months * 1 (30 min service duration) = 400 min if we parallelize that on 8 machines it would run in less than an hour. However, trying to run the job for every possible aggregate service duration would be 2min * 100 stylists * 2 months * 16 (8 hours by 30min intervals) =6400 min = 106+ hours and even if we parallelize it on 8 machines, it would still take 13+ hours to run and thats too long. Were looking for a fresh pair of eyes that can see another solution we arent seeing that allows for our times to sync with the external scheduling provider on a regular, relatively small interval.

***Some people have asked why the script takes so long. It has to automate choosing a service and walking through the booking flow of the other website to see the times available for each day.

The site is (removed by Toogit admin). If you click on one service and then add some more and click save it will show you the calendar where you have to click on each individual day to see its hours ***
About the recuiter
Member since Mar 14, 2020
Achmad Shofiyul
from Kostanay, Kazakhstan

Skills & Expertise Required

Automation Data Extraction Scripting Selenium Web Scraping 

Candidate shortlisted and hiredHiring open till - Apr 30, 2024

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$17.25

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Web Scraper - Researcher - Data Analysis : Scraping And Data Extraction Wanted For Career Website

I AM CREATING A CAREER DATABASE BASED ON ABOUT 1000 DIFFERENT OCCUPATIONS.

OCCUPATIONS RANGING FROM ACCOUNTANT TO ZOOLOGIST

I NEED UNIQUE CONTENT TO ADD TO EACH ONE OF THESE INDIVIDUAL OCCUPATIONS.

EXAMPLE:

OCCUPA...read more

Python+NLTK Scraping Work

I have a scraping work that needs to be completed in Python and need to use NLTK library.

I need someone who can start right away and complete fast.

Long-Term - Remote Data Extractor & Lead Generator - Fast-Growing Company

Were looking for a remote data extractor to find relevant leads using Google searches based on a provided criteria and add their contact email address to a Google Sheets document that weve created.

We would like to assign 10 hours of work...read more

Scraping data from mobile application

- Data Scraping from mobile application
- Tittle and price
- Everyday 5 times, it has to connect my server(Mysql) to change current price

Data Collection From Sources in Real Estate Industry in US

We are looking to extract data from 4 to 5 different sources for the listings that is available on the internet. We want to have a program which can update it on a daily basis with the data that is available on the different 4 to 5 source. There will...read more