We are building a team to provide 24/7/365 monitoring of our application environment running in AWS. Our environment is built using CloudFormation, AutoScaling, Lambdas to perform actions based on events, and automated deployments via Jenkins.
We have defined thresholds for scaling up and down and need someone to help respond to events/errors in the application log (Tomcat). You don't need to fix the errors just be prepared to respond to the errors by
1) immediately notifying the application team
2) redeployment of any applications if needed
3) identifying when the issue happened and documenting the event
4) be able to respond within 30 minutes of the error 24/7/365
Note: we will be using SumoLogic to also collect logs and we plan to take actions based on errors in the logs. We need someone who can understand Java applications to recommend the actions to take based on the exception. This part is critical.
We are looking to pay $600/mo. fixed cost for 12 months. The goal is to automatically identify issues and self-heal by automating responsive actions based on what is happening in the application logs.
About the recuiterMember since Mar 14, 2020 Karan Marwah
from Neamt, Romania