Beyond DevOps: How Netflix Bridges the Gap


The Presentation inside:

Slide 0

Josh Evans - Director of Operations Engineering November 16, 2015 Beyond DevOps: How Netflix Bridges the Gap


Slide 1

Technical Debt Java 6 Perforce Single Master Jenkins Ant CentOS Asgard/Mimir Fall 2013


Slide 2

How do we drive broad-based change?


Slide 3

The Paved Road Java 7 Stash Jenkins Shards Gradle Ubuntu


Slide 4

Some said You’re overloading us Too many projects Poor targeting Others said What took you so long? We’ve moved on Now we need to migrate That’s great but… We’re paying a high tax


Slide 5

Expectations gap Division of labor Timing of solutions Leadership Affects Reputation Relationships Lost opportunities Organizational Debt


Slide 6

How do we bridge the gap?


Slide 7

“Remember that TIME is money…”


Slide 8

Time is a form of currency


Slide 9

Product Engineering Operations Engineering Challenges & Strategies Our time today…


Slide 10

Product Engineering Operations Engineering Challenges & Strategies Our time today…


Slide 11

Product Innovation winning moments of truth


Slide 12


Slide 13


Slide 14

Every facet of the product 1400 AB tests in the last year & accelerating Continuous Innovation


Slide 15

But wait, there’s more…


Slide 16

Build It design code build bake test deploy Run It configure monitor triage fix …at scale, globally You build it, you run it


Slide 17

Internet 1000s of starts per second 100,000s of requests per second 100,000,000 hours of content / day 3 AWS Regions, 3 AZs per region


Slide 18

Relentless product innovation Building & running micro-services at scale, globally


Slide 19

Product Engineering Operations Engineering Challenges & Strategies Our time today…


Slide 20

DevOps is a software development method that emphasizes the roles of both software developers and other information-technology (IT) professionals with an emphasis on IT Operations. - Wikipedia The Gap


Slide 21

Why? How?


Slide 22

Quality Velocity Operational Excellence


Slide 23

Operational Excellence is the continuous improvement of the management, design, and function of operational environments to achieve greater quality, velocity, and competitive advantage.


Slide 24

Engineering Tools Insight & Real-time Analytics Performance & Reliability Operations Engineering is the application of software engineering practices to achieve and sustain operational excellence.


Slide 25

Operations Engineering Service provider Operational excellence driver Cross-cutting solutions Undifferentiated heavy lifting


Slide 26

Product Engineering Operations Engineering Challenges & Strategies Our time today…


Slide 27

You’re overloading us What took you so long? Remember that feedback? We made assumptions Requirements – what & when Time for non-product work


Slide 28

Move from assumptions to knowledge Affect change without imposing a tax? Achieve and sustain operational excellence? How do we…


Slide 29

Time is a form of currency


Slide 30

5 strategies for success in time-based economies software & organizational engineering


Slide 31

1. Reach out


Slide 32

What are your biggest operational pain points? How can we help? How well are we meeting your needs today? What would you like to see from us in the future? Listen Shower, rinse, repeat Talk to your engineering customers


Slide 33

Grease the Squeaky Wheels low tolerance for tax more vocal than most


Slide 34

High impact solutions Clarity on deliverables Lower operational tax Leadership, innovation, and partnership What they wanted


Slide 35

Deliver on solutions Better road map definition & communication A more aggressive stance on automation Deeper investment into leadership, innovation, planning Our commitments


Slide 36

2. Make an impact Apply what you’ve learned Deliver what matters


Slide 37

global cloud console end to end delivery automation platform velocity with confidence


Slide 38


Slide 39

Pipelines - Automated Global Delivery


Slide 40


Slide 41

3. Make it easy to do the right thing


Slide 42

Engineering time is scarce We must do more heavy lifting Supply & Demand


Slide 43

Spinnaker manual step Automated migrations – Mimir Provide on-ramps


Slide 44

Automate proven practices


Slide 45

Alerting and Monitoring Apache & Tomcat Hardening Automated Canary Analysis Autoscaling Chaos Participation Consistent Naming ELB Configuration Healthcheck Configured Red-Black Pipeline Squeeze Testing Timeout & Fallback Tuning Workload Reliability Production Ready?


Slide 46

Alerting and Monitoring Apache & Tomcat Hardening Automated Canary Analysis Autoscaling Chaos Participation Consistent Naming ELB Configuration Healthcheck Configured Red-Black Pipeline Squeeze Testing Timeout & Fallback Tuning Workload Reliability Production Ready?


Slide 47

Old Version (v1.0) New Version (v1.1) Load Balancer Customers 100 Servers 5 Servers 95% 5% Metrics Canaries


Slide 48

Old Version (v1.0) New Version (v1.1) Load Balancer Customers 0 Servers 100 Servers 100% Metrics Canaries


Slide 49

Define Metrics A threshold Every n minutes Classify metrics Compute score Make a decision Automated Canary Analysis


Slide 50

Canary Analysis Performance Integration Tests Chaos Conformity Static Unit Tests Make it easy to do the right thing Static & Functional Testing


Slide 51

4. Reduce the cost of change


Slide 52

Ongoing migrations Library propagation 100s of micro-services Complex dependencies Continuous, Broad-based Change


Slide 53

Change Engineering Locate Communicate Facilitate


Slide 54

Automated forensics Who last touched x? What team? Who was their manager? Who owns this artifact, repository, service?


Slide 55

Whitepages Workday wrapper App & REST API Organization hierarchy Metadata Change log (###) ###-####


Slide 56

Krieger REST-based service Sources Whitepages Stash Edda Jenkins Spinnaker Etc… { "content": {}, "_links": { "employees": { "href": "/api/employees/" }, "projects": { "href": "/api/projects/" }, "teams": { "href": "/api/teams/" }, "applications": { "href": "/api/applications/" }, "jobs": { "href": "/api/build/jobs" }, "masters": { "href": "/api/build/masters" }, "projectDistribution": { "href": "/api/teams/projectDistribution" } } }


Slide 57

/api/employees?q=jevans "employees": [ { "id": "241", "firstName": "Josh", "lastName": "Evans", "username": "jevans", "email": "[email protected]", "jobTitle": "Director of Operations Engineering", "isManager": true, "isCurrent": true, "title": "Josh Evans (jevans) - Operations Engineering", "_links": { "self": { "href": "/api/employees/241" }, "manager": { "href": "/api/employees/117890" }, "team": { "href": "/api/teams/f9134a81" }, "projects": { "href": "/api/teams/f9134a81/projects" } } } ] }


Slide 58

Security vulnerabilities Who owns this service? Platform updates Who is using this version of this library? Today – Targeted Coordination


Slide 59

Automated, efficient technical project management Communication Guidance Tracking Low tax for TPMs & engineers Security Fix Java 9 Guava Future – Change Campaigns


Slide 60

5. Develop Partnerships Beyond supply & demand


Slide 61

Nearing completion Aggressive schedule Unexpected delays Commitment to June delivery Spinnaker 1.0 – 1H 2015


Slide 62

Built their own continuous delivery solution Not positioned for engineering-wide support Believes common solutions Edge Engineering


Slide 63

Partnership in Action Strong relationship Open discussions about concerns Decision - leaned forward +2 engineers on Spinnaker Successful 1.0 launch


Slide 64

Moving Forward Together Containers? Achieving alignment Collaborative exploration Edge, Platform, Operations A new paved road?


Slide 65

Paved Road adopted Adding new ones Production Ready ongoing Migrations easier Reputation improving Improved Service uptime Rate of change Payoffs


Slide 66

Putting it to the test in 2016 Streaming production & test - EC2 Classic to VPC Highly cross-functional Complex dependencies Zero downtime Stay tuned…


Slide 67

Five Strategies Reach out Make an impact Make it easy to do the right thing Reduce the cost of change Develop partnerships


Slide 68

Open Sourced! https://netflix.github.io/


Slide 69

Josh Evans [email protected] @ops_engineering Questions?


×

HTML:





Ссылка: