Remote Data Mining And Management Job In Data Science And Analytics

Data on Twitter Accounts for R&D ML Project for the DoD / Navy Research Center

Find more Data Mining And Management remote jobs posted recently Worldwide

I have 75k twitter accounts.

I am looking for the following data on each of them.

Category Feature
1 - Profile Commonality between screen name and user names
1 - Profile Creation Date
1 - Profile Description / Bio
1 - Profile Display Name
1 - Profile Is Profile Picture Egg (Yes/No)
1 - Profile Is Profile Picture Human? (Yes/No)
1 - Profile Is Profile Picture Stock Image (yes/no)
1 - Profile Number of Sources (mobile, computer, null)
1 - Profile Primary Language
1 - Profile Handle (@name)
1 - Profile Twitter User ID
2 - Bio / Description Does Description have a URL? (Yes/No)
2 - Bio / Description If so, does the description URL have a clone elsewhere?
2 - Bio / Description Average Word Length
2 - Bio / Description Contains URL
2 - Bio / Description Correlation with a NLP Program
2 - Bio / Description Length
2 - Bio / Description Number of Words
2 - Bio / Description Score - ARI (Automated Readability Index)
2 - Bio / Description Score - Coleman Liau index
2 - Bio / Description Score - Dale-Chall Score
2 - Bio / Description Score - Flesch Kincaid Grade level
2 - Bio / Description Score - Flesch Reading Ease
2 - Bio / Description Score - Linsear Write Formula
2 - Bio / Description Score - SMOG
3 - Activity URL Is Shortened? (yes/No)
3 - Activity # of Posts
3 - Activity # of Retweets
3 - Activity # of Tweeting @s
3 - Activity % of Tweets Geo-enables
3 - Activity Ave. # of Hashtags in Tweets
3 - Activity Ave. # of Links in Tweets
3 - Activity Ave. # of Special Characters in Tweets
3 - Activity Ave. # of User Mentions in Tweets
3 - Activity Average Duration between being a tweet being posted and this user re-tweeting it for all retweets (in minutes)
3 - Activity Average Duration between being a tweet being posted and this user re-tweeting it for top 10 fastest re-tweets (in minutes)
3 - Activity Average Duration between being a tweet being posted and this user re-tweeting it for top 3 fastest re-tweets (in minutes)
3 - Activity Average Tweets / Day Since Creation Date
3 - Activity Distribution of Tweets Per Hour
3 - Activity Longest No-Tweet Duration (In Days)
3 - Activity Most Compact Number of Tweets per Hour
3 - Activity Number of Languages
3 - Activity Percentage of tweets ending with punctuation, hashtag, or link
3 - Activity Number of Events / Hour Distribution - Standard Deviation
3 - Activity Number of Events / Hour Distribution - Skew
3 - Activity Number of Events / Hour Distribution - Kurt
3 - Activity Sentiment Score
3 - Activity Time from Last Tweet (In Days)
3 - Activity # of Followers
3 - Activity # of Following
3 - Activity # of Likes
3 - Activity Category of website Linked to
4 - Similarity Number of known bots followed by a user - a user following several known bots is more likely to be a bot.
4 - Similarity Number/Percentage of bots in the cluster that a user belonged to -if a clustering algorithm places the user in a cluster with many bots, he is more likely to be a bot.
4 - Similarity Pagerank and between-ness centrality of users in both retweet and mention networks
4 - Similarity Similarity of Profile to Known Bots
4 - Similarity Variables related to star and clique networks associated with users
5 - Outcome Is a Bot? (Yes / No)
5 - Outcome Bot Type (Spambots, Paybots, Influence Bots)

This is important, but not time sensitive. The proper data miner / analyst will be given a few weeks to work on the job,


The final output for this is 2-fold.

1) Looking for a google spreadsheet output of this info for all -75k accounts.
2) a web-based tool that I can upload a CSV file or paste in a list of Google IDs OR Usernames to get this data for the identified accounts.

I have a strong opinion about what tools you use to build this solution.


About the recuiter
Member since Sep 14, 2017
Lance Hirahon
from Jalisco, Mexico

Skills & Expertise Required

Data Science & Analytics Data Mining & Management 

Candidate shortlisted and hiredHiring open till - Apr 20, 2021

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$25.02

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Google Analytics set-up with Shopify and ReCharge

We have a customized Shopify store in which you can either make a one-time purchase or enter into a subscription. For the subscription management we use ReCharge.
We want to set Google Analytics up that we can track all marketing channels, includ...read more

Python Tkinter

I am looking for a python developer that can modify some python code without changing the functionality of the program it used many APIs

The developer should have a deep understanding of python and APIs and tkinter

Need an expert with Microsoft excel.

I need to create an Excel spreadsheet that will have formulas and tables