Remote Web Development Job In IT And Programming

webscraping english conversations

Find more Web Development remote jobs posted recently Worldwide

I need someone to web-scrape some English conversations

Im building a chatbot for learning English and want someone to scrape a bunch of English conversations from various websites to use as training material.
Im looking for short and simple conversations

There are two parts to the task
- some googling to find basic relevant conversations
- scraping code for different sites


Most of these sites are pretty simple plain text services. Here are some example sites, but there are hundreds of resources.
I would want the scraped results in a TSV or CSV format:

convoId | line | url | topic | who | text

convoId - an ID for each conversation so we can sort things later
line - simple increment count for each line in that conversation
url - place it was from for attribution later
topic - please try to get a topic from the page. if this is a LOT more work maybe not needed
who - usually the conversations have role playing A: xxx, B: replies
text - scraped line of text

You can use NodeJS or Python.

Let me know what experience you have in scraping, although this should not be a challenging scraping task - most of these are amateur sites with no Logins or other blockers.

If youre trying to improve your English, this also might be an interesting project!

If youre into machine learning, Ive also looked at the various online corpus for dialog training, but havent found anything great yet.
These datasets dont work for basic language learning conversations.

Id like to start with a small sample task, but then manage this as an on-going project with some regular work each month as we refine the idea. There will be on-going cleaning up of the dataset for training etc.

Respond to me with some info on what kind of scraping tasks youve done before and how many sites you think you can cover for the initial budget Ive proposed.
About the recuiter
Member since May 20, 2018
Sehej Bir Singh
from Victoria, Australia

Skills & Expertise Required

Web Scraping Node.js Scrapy Beauty Python 

Candidate shortlisted and hiredHiring open till - Sep 22, 2023

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$13.42

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Python Developer needed to create a desktop GUI application

We need a python GUI code and a python controller code communicating with GUI code. UI code will be displaying some changes based on controller code communication and some things based on buttons in UI. About the communication, we have some ideas usi...read more

Create a web application based on existing sourcecode

A web application that is working online.
The app is a document builder that takes input and forms a specific PDF that the customer can download.

UX/UI Design Implementation for simple Web App

Im looking for an experienced developer who can take an existing internal app built on Python, and apply a provided design mock-up to the HTML/CSS. I will provide the app files and design (UI) assets.

Its a very simple form based design....read more

Airflow: Write DAG that runs a program on a remote PC.

Deliverable: Run a local python file on a computer upon triggering an Airflow DAG, hosted on a GCP Composer instance.

Already set up:
- Python file to run remotely on a PC with a set path.
- Airflow Server setup scripts in GCP (will...read more

Edit python / unity code for chess game

What is done so far:
+ UI 75%
+ Stockfish engine 100%
+ Player vs Player 80%
+ Player vs Computer 80%
+ Compile it for Android and iOS 0%

(For Android and iOS the game was done on a game engine but Im not sure how much i...read more