Jobs at Central Business Solutions, Inc

View all jobs

Linux SRE Engineer

Bangalore, Karnataka
The Enterprise Computing (EC) Core Infrastructure Services organization is looking for a Site Reliability Engineering to manage the operations, reliability and services for Morgan Stanley's suite of Software Distribution product ecosystem products that are part of Artifact Curation and Distribution Control squad. This squad is responsible for providing lifecycle management and tooling for packaging, curation and distribution of runtime artifacts across the firm and building container based Software Deployment Pipeline to Hybrid Cloud (public and private).
 
The Site Reliability Engineering (SRE) team drives the reliability, recoverability and operational efficiency of this product portfolio. SRE is expected to drive implementation of advanced observability, troubleshooting tools, automation and technical debt management working closely with the user community, development, engineering and the global support team that provide first line support.
 
Candidate will have the technical skills required to support these products on a Kubernetes platform. Hands-on experience in automation and atleast one pillar of observability toolset is required with expertise in defining system monitoring, not just reacting to alerts. Cloud experience is not necessary, but it would be an advantage.
 
Responsibilities include:
- Managing operations for the firm's Artifactory based software distribution platform
- Maximizing the availability and performance of supported systems through optimized and automated plant management, ongoing problem management, and architecture reviews with engineering-side peers
- Reduction of the cost of support through the elimination of TOIL, operational issues, optimization and automation of tasks, development of operational tools and driving client self-service to minimize constraints
- Identification and prioritization of technical debt that is impacting client developer productivity, reliability or the efficiency of the ops team
- Complex troubleshooting in a Kubernetes and cloud environment
- Consult with clients (the Firm's internal development community, IT service practitioners) to maximize their productivity, including troubleshooting the issues they have in using the Software distribution products
- Minimizing the escalation rate to the dev-side product delivery team members to ensure the department has the greatest possible flow of feature delivery
- Being operationally responsive, including sharing on-call rotation with the rest of the global team (with a time-off in lieu system)
 
Required Qualifications / Skills
- Strong Linux or Kubernetes experience
- JFrog Artifactory experience
- Task automation experience in any programming language
- Experience of observability stack such as Prometheus,Grafana
- Effective communication and collaboration skills
- Exhibit working knowledge in at least ONE of the following areas
- SQL
- REST services (API)
- Load balancing and networking
- Performance troubleshooting and resolution
 
Desired Skills
- Postgres experience
- Python development for task automation
- Experience with site reliability engineering practices, like service level objectives (SLOs), error budgets, blameless postmortems, toil reduction- Prior experience creating operational dashboards (Splunk, Grafana, etc)

 
Central Business Solutions, Inc(A Certified Minority Owned Organization)
Checkout our excellent assessment tool: http://www.skillexam.com/
Checkout our job board : http://www.job-360.net/
=====================================================
Central Business Solutions, Inc
37600 Central Court Suite 214 Newark CA, 94560
Phone: (833)247-8800 Fax: (510)-740-3677
Web: http://www.cbsinfosys.com
=====================================================

Share This Job

Powered by