Requirements:
1) store data scraped into either dynamodb
2) listen to notifications to fire news scrape
3) being able to run cronjobs
4)being able to unify all time to one timezone
5) being able to use a proxy to prevent ip blocking ( will purchase and provide)
6) 3 lambdas: *one for news,
*one for stats,table,fixtures,googlesheet
*another for team fixtures
*The required are 8 apis end points,
*6 simple html page scripts from scratch and one to be duplicated with different url
*one rss feed listener, and one to be duplicated with different url
*two apis response scraping scripts
the project is divided into these steps:
* news:
1)scripts: there are 5 rss feeds created, there is one should be created, and another one duplicated with different url. All these rss feeds should run in one lambda every 5 minutes and if there are news urls open the html content and scrape it.
2) 4 simple html pages scraping to run if there were new news posted in the rss feeds
3) db table is already done only missing isLocal coloumn
4) 2 apis, one with paging
*Standing:script,db is created ( should run every 10 minutes from 1 pm to 12 am)
1)1 api to create
2) add a column to detect if the team position increased(1) or decreased(-1) ,or remained the same(0) from te previous scrape insert
*Fixtures: scraping api is done, db is done
need to create one api end points
Stats html page, googlesheet scrape and team fixtures will be done from scratch
check pdf file for more details
start job offer with RD07
preferably to use python for scraping as most scripts are done with python
About the recuiterMember since Nov 11, 2022 Sanjay Kumar
from New Jersey, United States