Remote Data Mining And Management Job In Data Science And Analytics

Converting JSON or Avro files to Parquet

Find more Data Mining And Management remote jobs posted recently Worldwide

I need to convert JSON, Avro or other row-based format files in S3 into Parquet columnar store formats using an AWS service like EMR or Glue.

I already have code that converts JSON to parquet using Python but the process is very manual, accounting for NULL values in the JSON elements by looking at each and every field/column and putting in default values if theres a NULL.

I am looking for an easier, less manual way of doing this using something like Spark or other similar methods.

Since I am working exclusively on AWS, I am only looking for solutions using AWS services such as EMR, Glue or similar AWS service.

I am thus looking for someone with experience using AWS EMR, Glue, Python, Pyspark etc.

Please note: Since this is going to be a learning experience for me, this is going to be a live session on Skype, Zoom, Google Hangouts etc where you code and I watch and you answer any questions I have in the process.

Thus, I will pay in one-hour increments. The initial contract is going to be for one hour and if we need more time we can have another one hour contract and so on and so forth.

Please only apply if youre ok with all these conditions and have the required experience.
About the recuiter
Member since Mar 14, 2020
Ifb Agro Indust
from Bavaria, Germany

Skills & Expertise Required

Amazon S3 Amazon Web Services Apache Spark Pyspark Python 

Open for hiringApply before - May 21, 2024

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$34.54

Cost

Offer to work on this project closes in 2 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

ERPNext and QuickBooks Integrator or Connector

We would like to have bidirectional syncronisation of ERPNext and QuickBooks accounting data. This integration can be mnual through the use of a trigger button or automated behind the scene for the follwoing:

1. Sales Invoices
2. Sales O...read more

Python code

Python code to automate xml to csv if transcripts

This will include the ability to translate xml to csv

Allow multiple xml files in a folder to be translated Into one csv

program must be flexible to handle large files and...read more

full stack developer

build mobile apps in java, iOS, and react native
understand mysql, postgres, and redis

Gstreamer Video stacking

I am looking for developer to write me program to stack 2 to 4 Video sources side by side or arrange them 2 on top and 2 on b bottom and Crete a single output with a combined video frames and all 8 audio streams.

Amazon Ranking and PPC Help

I need help in my Launch for amazon product and to manage my PPC, i have been spending way too much on amazon PPC. i need help with ranking on amazon and getting to the first page.