I need to tag approximately 30,000 political forum pages with relevant topic tags, like 'macroeconomics', 'immigration', 'tsa', 'education', 'climate change' (there are roughly 100 tags total).
Ideally, each tag should have a relevance score between 0 and 1 for each page. Then it should be possible, for any given tag, to pull up the most relevant 100 pages (with the highest scores in that category), sort of like a tag-specific search engine.
I can give clarification on any tags that are confusing. Also, feel free to volunteer additional tags that you think would be helpful. It is often the case where the topic keyword never appears verbatim in the text, so it may be useful to be able to understand the *context* of the topic.
I'll provide a csv file with a list of urls (30,000 rows), and I'm looking for a csv file with the relevant scores for each topic for each page (30,000 rows, 100 columns).
It would be great if you can share your source code after the project is completed.
There are a number of ways this problem could be solved: neural nets, clustering, some sort of graph theoretic approach. Whatever solves the problem with the least amount of effort!
About the recuiterMember since Dec 26, 2016 Dave C.
from Scotland, United Kingdom