1. Project Background and Description:
Access, real time satellite data, surface measurement and model data to map PM2.5 over America.
2. Data Sets:
a. GOES-R data (every hour) from Google Clouds (netcdf format)
b. Real time surface PM2.5 measurements (OPENAQ.ORG, API access)
c. NASA GEOS-5 data (every hour, netcdf format)
3. Geographical Scope: Americas (North & South)
4.0 Develop a python code to download PM2.5 data from openaq.org every hour. This data will be saved into a database (postgres). We will provide an existing code that can be modified to perform these task.
4.1 Develop a python code to download, read and extract GOES-R data. This code will run every hour to download new data, it will read data over ground locations and save certain parameters in the database. There will be multiple files each hour to read and extract the data from.
4.2 Develop/Modify a python code to download, read GEOS-5 data. This code will read several parameters for the same locations as in 4.1. The data are in netcdf format. We will provide an existing code that can be modified to perform these task. Again, extracted data will be saved in the database along with data extracted from GOES-R.
4.3 The combined data set should be collocated in space and time i.e. for every date/time stamp there will be several parameters reported in the database from three difference sources. The initial data will be processed on Microsoft Azure. The database will created for the first six month of year 2018. The database will be divided into training, testing and validation.
4.4 Develop machine learning/AI algorithm to estimate PM2.5 (openaq, 4.0) based on inputs from 4.1 & 4.2. This is one of the most important task and will require some research. This task will use database collected in task 4.3. The AI algorithm will be created using AI tools available through Microsoft Azure at Azure cloud system. The AI algorithm must be trained well, optimize to perform well with independent data sets. 10-fold validation using validation data sets must be performed and presented in graphical forms with table and numbers with error estimations. The AI model will be developed separately for Northern America and Southern America. Each will be evaluated separately. The AI development work is important and must be satisfactory to be accepted for the submission.
4.5 Implement the developed AI model with inputs from two data sets (GOES-R, GEOS-5) and create maps of estimated PM2.5 for the larger region. This code should run in real-time every hour using cronjob by using codes from 4.0, 4.1, 4.2. Also, extract the data to be entered into database.
4.6 Develop a document with details on code, AI models, error estimates and other details. This document should be detailed enough for someone else to duplicate the work created here.
1. Source codes - all including AI models.
2. All codes must work in production environment in microsoft cloud
3. AI models in form of python codes and original formats.
4. Graphical results and explanation of AI model performance during training, testing and
5. Integrated data sets in csv format (openaq, goes-r, geos-5) for six months.
6. Demonstration of successful running of all codes and AI models on Microsoft Azure cloud.
7. Report documenting all the details of the codes and AI models
Pythons, Django, AI/Machine Learning on Microsoft Azure, Azure cloud, data analysis and good understanding of statistics.
The project will be divided into 3 mile stones ($100, $300, $100) and will be decided after discussion with freelancer and may not be in equal amount.
About the recuiterMember since May 20, 2018 Sigit Sujarwo
from Morbihan, France