Remote Data Mining And Management Job In Data Science And Analytics

R Programmer needed to help create an function to process documents

Find more Data Mining And Management remote jobs posted recently Worldwide

I have assembled a large amount of text for a research project. The text is stored in .docx files (not ideal, I know, but the best option given the source) which are nested in a series of folders (-70). Each document starts with a section of metadata, followed by the main text, and concluding with another section of metadata.

I would like to create a function in R that will take as input a directory location, scan that directory for all .docx files (a package exists to do that part), separate the different documents, identify the main text of each document and separate out both sections of the metadata, parse the metadata for several important categories and populate columns with those values, paste the metadata itself into separate columns, then reassemble a data frame where the main text has been separated out from the metadata.

The deliverable is therefore an R script for this function. A freelancer who is experienced and comfortable with working in R, specifically with text-as-data and writing simple loops and function, is required.

I use R for basic dataset construction, variable manipulation, and statistical modeling, but am inexperienced at writing loops and functions. I have written up code that accomplishes the things I want to accomplish on a single document that can be used as a guide. I am also happy to provide more information about the data itself to answer any questions.
About the recuiter
Member since May 20, 2018
Adam Maulana
from Lombardia, Italy

Skills & Expertise Required

R 

Open for hiringApply before - May 15, 2024

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$26.83

Cost

Offer to work on this project closes in 5 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Sample size calculation for non-parametric data

I need a sample size calculation performed on non-parametric data (absolute error) as requested by the reviewer for a scientific study I have submitted.
The study is looking at the absolute error of different prediction models using the same dat...read more

R programming bar graph from dataset

Hi! For my project, I have a Tuberculosis dataset from the World Health Organization. I need to make a double bar graph that shows the incidence and mortality (per 100k) for the top 10 countries.

Final output - I need the bar graph and th...read more

Data Analysis Advising

We are looking for someone with a statistical view to look at some survey data we have gathered to see what types of analysis would be suitable. The dataset is not large. It is baseline and endline data for a 1 year intervention. Data is collected on...read more

Run a Principal Component Analysis in R (preferred) or PRIMER

Run a PRINCIPAL component analysis (PCA) in R Studio / R Cloud or PRIMER.

The question is;
Can these factors (age, sex, distance, treatment) explain X% of the variation in participants responses.

Data has been prepared for you....read more

Developer needed for AI and Machine learning integration and code

We are looking for a multi talented and hard working developer who can help us integrate AI and Machine learning to help understand our customer behavior and help make their decisions more personalized. We are a start up and hope to add valuable memb...read more