Platform Performance SME

Location: Sunnyvale, CA, United States
Date Posted: 07-13-2018
Position Role/Tile: Platform Performance SME
Location:  Sunnyvale, CA​​.


We're looking for a Performance specialist to understand why the Customer's Prod and DR clusters differ in performance.  This will involve debugging jobs at a greater depth than just how many mappers and reducers.  Additional details are below.  
The customer would like to start ASAP or within 1-2 weeks and the duration is 40 days onsite in Sunnyvale, CA.  

1.         Cluster Diagnostics
a.         Project kick-off, including customer readiness
b.         Cluster Collection: Metrics
i.          Setup metric collection on PHX and BD1 cluster, over an agreed period of time in parallel
1.         Mapper and Reducer counts and capacity
2.         Resource utilization per cluster
a.         Memory
b.         I/O
c.         Compute
d.         Application load
3.         Hardware Profiling
a.         Nodes and Type
b.         Memory
c.         Processor capacity
d.         Thread
e.         Disk configuration
f.          Network Configuration
g.         Data node loss w/ disk failure
ii.          Assist to configure and collect peak load
iii.         Assist to collect and document network architecture, including node topology and OS configuration
1.         Generate a table with Node information, including OS version, disk information and JDK version
iv.        Assist to collect and document findings on weekly basis
v.         Run collectors and tune to required configuration/metrics
vi.        Run standard benchmark tests (TPC-DS, DFSIO, etc) across BD1 and PHX at the same time and collect the findings
vii.        Review NameNode performance across BD1 and PHX

1.         RPC spikes
2.         I/O and RPC delays
3.         Small files and compaction

viii.       Generate a remediation list for DBA team and additional iteration metric collection
ix.        Drive customer SME to configure Client recommendations, in a timely manner
x.         Generate a PoC approach and definition for validation
1.         Identify and recommend selective applications for PoC validation
2.         Identify dataset requirements for PoC
3.         Recommend network topology for cluster architecture
4.         Generate an approach for PoC validation, including steps for application validation for Customer team
2.         Cluster PoC

a.         Review the findings from BD1 and PHX
b.         Build and configure a subset/limited PoC cluster, representation of recommended architecture & topology
c.         Run standard benchmark tests and collect metrics
d.         Review network configuration and benchmarking results with the infrastructure team; generate recommendations for updates
e.         Tune the cluster for optimization
i.          Generate node level optimization steps
ii.          Generate application level optimization steps
iii.         Generate network recommendations, if any
f.          Execute tests and collect results; present findings to Customer teams
g.         Generate production level recommendations and assist to deploy in the environment

3.         Cluster Remediation
a.         Review production recommendation with Customer teams
b.         Assist to update HDP component configurations
c.         Review network and OS configuration recommendations for infrastructure team
d.         Customer and Client Support hand-off

Central Business Solutions, Inc,
37600 Central Ct.
Suite #214
Newark, CA 94560.
Central Business Solutions, Inc(A Certified Minority Owned Organization)
Checkout our excellent assessment tool:
Checkout our job board :
Central Business Solutions, Inc
37600 Central Court Suite 214 Newark CA, 94560
Phone: (510)-713-9900, 510-573-5500 Fax: (510)-740-3677
this job portal is powered by CATS