Remote Data Mining And Management Job In Data Science And Analytics

Python web Crawler (bot) that build a bilingual corpus

Find more Data Mining And Management remote jobs posted recently Worldwide

We need to build a web crawler (bot) that will traverse the high level domain like .co.uk or com.
The search for bilingual web sites.
Determine the languages of the site.
Scrap and align the text from the site.

There are many python libraries and research papers that talk about that. I think bitextor for example (which extract and align 2 html pages) will take care of the alignment.

We will be waiting for a detailed proposal how the project will be performed and the time frame.
About the recuiter
Member since Mar 14, 2020
Ganesan Muthusa
from New York, United States

Skills & Expertise Required

Data Scraping Web Crawling Python Data Extraction 

Open for hiringApply before - May 13, 2024

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$479.16

Cost

Offer to work on this project closes in 2 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Developer needed to build custom Auto scraper

I need a custom scraper program that will run automatically every 30 seconds or slow. This scraper would be pulling data from the FMCSA public website. I have actually have instructions of how to build it that i was givin from another developer that...read more

Reverse engineer executable file

Ideally I want to retrieve source code from a 50kB machine code file, if it is possible

Odoo Server Admin / Dev

We have recently selected Odoo as our CRM of choice. We will have ongoing but only occasional needs.

At the moment, we our Odoo is down. We believe it was a poorly created Module that we activated. Odoo ran normally till server restarted...read more

online examination system in python

Design a software for student homework and examination.
The objective of this topic is to create a software for examination on computer. Its features include
1. randomly select question and send exam to examinee
2. Automatically check the...read more

Developer needed classification of BCI 3 3a dataset using cnn and accuracy over 94% is needed

i need subject wise accuracy above 94% . Its cued motor imagery (multi-class) with 4 classes (left hand, right hand, foot, tongue) three subjects (ranging from quite good to fair performance)
EEG, 60 channels, 60 trials per class .The goal is imp...read more