Jul 18, 2023
No image
Implementing Site Reliability Engineering into Serverless IoT Enterprise
Ongoing

Implementing Site Reliability Engineering into Serverless IoT Enterprise

$100,000+
4-6 months
Sweden
10+
Service categories
Service Lines
Cloud Consulting
DevOps
IoT Development
Domain focus
Other
Technology
Programming language
JavaScript
TypeScript
Frameworks
Node.js
React.js
React Native
Subcategories
Cloud Consulting
Public Cloud

Challenge

-Observability in distributed , microfrontend-based system. Thousands of AWS Lambda’s running on production needed a well-established tracing and monitoring framework to help troubleshoot and debug issues -Business goals and objectives (SLOs) needed to be defined across a large organization through efficient and easy to adapt proces -Tranings and knowledge-sharing for understanding and implementing Site Reliability principles and learn new tools. centralized view on system’s health -High cost-per-customer needed to be optimize to allow effective and rapid scale-up of the product on the market.

Solution

1. Extend the business unitts and teams to support them with processing backlog of initiatives and projects 2. Implement SLO's and SLI's across entire organization 3. Centralize logging for Serverless applications to simplify debugging and optimize costs 4. Optimize cloud costs to decrease cost-per-customer 5. Ensure company governance and security policies are applied to all teams and services (through CI/CD)

Results

Implement SLODLC framework to accelerate adoption of SLO's and SLI's within organization Provide centralized observability dashboards and solutions for teams to troubleshoot and debug company-wide systems Automate and standardize the governance of 100+ AWS accounts and their resources