Remote Data Mining And Management Job In Data Science And Analytics

Converting JSON or Avro files to Parquet

Find more Data Mining And Management remote jobs posted recently Worldwide

I need to convert JSON, Avro or other row-based format files in S3 into Parquet columnar store formats using an AWS service like EMR or Glue.

I already have code that converts JSON to parquet using Python but the process is very manual, accounting for NULL values in the JSON elements by looking at each and every field/column and putting in default values if theres a NULL.

I am looking for an easier, less manual way of doing this using something like Spark or other similar methods.

Since I am working exclusively on AWS, I am only looking for solutions using AWS services such as EMR, Glue or similar AWS service.

I am thus looking for someone with experience using AWS EMR, Glue, Python, Pyspark etc.

Please note: Since this is going to be a learning experience for me, this is going to be a live session on Skype, Zoom, Google Hangouts etc where you code and I watch and you answer any questions I have in the process.

Thus, I will pay in one-hour increments. The initial contract is going to be for one hour and if we need more time we can have another one hour contract and so on and so forth.

Please only apply if youre ok with all these conditions and have the required experience.
About the recuiter
Member since May 20, 2018
Dibyashakti Moh
from Karnataka, India

Skills & Expertise Required

Amazon S3 Amazon Web Services Apache Spark Pyspark Python 

Candidate shortlisted and hiredHiring open till - Feb 27, 2020

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$25.02

Cost

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Oracle Database Administrator

We are actively seeking Oracle database administrators to provide professional support for new graduates.

Your skill set must include:

RMAN
RAC
Golden Gate
Data Guard
Performance tuning

Nice to have (not a...read more

Dendritic Cell Algorithm

I am looking for someone who worked before with this algorithm cuz I have some questions that I cant find the answer my self

Facial Recognition for Pets

We are looking for a person for pets (dogs/cats etc;) Facial recognition project. This can be an ongoing project and we need a person who is:

- experienced of pet facial recognition
- has something to share with us to confirm the capabil...read more

Python Developer needed to create a desktop GUI application

We need a python GUI code and a python controller code communicating with GUI code. UI code will be displaying some changes based on controller code communication and some things based on buttons in UI. About the communication, we have some ideas usi...read more

AWS and general sys admin support and architecture

Looking for an experienced AWS systems admin to help us manage our client websites and dev pushes; also looking to have a new server architecture developer to support our new start-up. Would love to have an on-going, long-term and trusted systems adm...read more