by Jeff Geerling · 9 Oct 2015 · 313pp · 75,583 words
and configuration management for humans Jeff Geerling This book is for sale at http://leanpub.com/ansible-for-devops This version was published on 2015-05-17 * * * * * This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing
…
, but I may have an occasional typo. You’ve been warned! About the Author Jeff Geerling is a developer who has worked in programming and devops for companies with anywhere between one to thousands of servers. He also manages many virtual servers for services offered by Midwestern Mac, LLC and has
…
software developer by trade, and a sysadmin by necessity, I have seen the power in uniting development and operations—more commonly referred to now as DevOps. When developers begin to think of infrastructure as part of their application, stability and performance become normative. When sysadmins (most of whom have intermediate to
…
is improved, and more time can be spent doing ‘fun’ activities like performance tuning, experimentation, and getting things done, and less time putting out fires. DevOps is a loaded word; some people argue that using the word to identify both the movement of development and operations working more closely to automate
…
future. Ansible Examples There are many Ansible examples (playbooks, roles, infrastructure, configuration, etc.) throughout this book. Most of the examples are in the Ansible for DevOps GitHub repository, so you can browse the code in its final state while you’re reading the book. Some of the line numbering may not
…
a repeatable and centrally managed way. Ansible also has other tricks up its sleeve, making it a true Swiss Army knife for people involved in DevOps (not just the operations side). One of Ansible’s greatest strengths is its ability to run regular shell commands verbatim, so you can take existing
…
-hoc commands alone can make Ansible a powerful tool; playbooks turn Ansible into a top-notch server provisioning and configuration management tool. What attracts most DevOps personnel to Ansible is the fact that it is easy to convert shell scripts (or one-off shell commands) directly into Ansible plays. Consider the
…
. The rest of this chapter uses more realistic Ansible playbooks. All the examples in this chapter can be found in Jeff Geerling’s Ansible for DevOps GitHub repository, and you can clone that repository to your computer (or browse the code online) to follow along more easily. Real-world playbook: CentOS
…
YAML! You can find the entire example Node.js app server playbook in this book’s code repository at https://github.com/geerlingguy/ansible-for-devops, in the nodejs directory. Real-world playbook: Ubuntu LAMP server with Drupal At this point, you should be getting comfortable with Ansible playbooks and the
…
, Laravel, etc. You can find the entire example Drupal LAMP server playbook in this book’s code repository at https://github.com/geerlingguy/ansible-for-devops, in the drupal directory. Real-world playbook: Ubuntu Apache Tomcat server with Solr Apache Solr is a fast and scalable search server optimized for full
…
search indexes. You can find the entire example Apache Solr server playbook in this book’s code repository at https://github.com/geerlingguy/ansible-for-devops, in the solr directory. Summary At this point, you should be getting comfortable with Ansible’s modus operandi. Playbooks are the heart of Ansible’s
…
can find the entire example Drupal LAMP server playbook using include files in this book’s code repository at https://github.com/geerlingguy/ansible-for-devops, in the includes directory. You can’t use variables for task include file names (like you could with include_vars directives, e.g. include_vars
…
/GPG setup, firewall configuration... roles: - nodejs tasks: # Node.js app deployment tasks... You can view the full example of this playbook in the ansible-for-devops code repository. Once you finish reformatting the main playbook, everything would run exactly the same during an ansible-playbook, with the exception of the tasks
…
> 91 </html> Don’t try transcribing this example manually; you can get the code from this book’s repository on GitHub. Visit the ansible-for-devops repository and download the source for index.php.j2 As this is the heart of the example application we’re deploying to the infrastructure, it
…
need to do and manage the provider’s services within your playbooks. You can find the entire contents of this example in the Ansible for DevOps GitHub repository, in the lamp-infrastructure directory. ELK Logging with Ansible Though application, database, and backup servers may be some of the most mission-critical
…
can’t run docker commands, you may need to use sudo with this playbook. The code example above can be found in the Ansible for DevOps GitHub repository. Building a Flask app with Ansible and Docker Let’s build a more useful Docker-powered environment, with a container that runs our
…
more creative endeavors, and the developers see their code go live in near-real-time! Etsy’s production deployment schedule is enabled by a strong DevOps-oriented culture (with robust code repository management, continuous integration, well-tested code, feature flags, etc.). While it may not be immediately possible to start deploying
…
user: sudo su - awx Go to Tower’s default projects directory: cd /var/lib/awx/projects Create a new project directory: mkdir ansible-for-devops && cd ansible-for-devops Create a new playbook file, main.yml, within the new directory, with the following contents: 1 --- 2 - hosts: all 3 gather_facts: no
…
web browser and get everything set up to run the test playbook inside Ansible Tower’s web UI: Create a new Organization, called ‘Ansible for DevOps’. Add a new User to the Organization, named John Doe, with the username johndoe and password johndoe1234. Create a new Team, called
…
‘DevOps Engineers’, in the ‘Ansible for DevOps’ Organization. Under the Team’s Credentials section, add in SSH credentials by selecting ‘Machine’ for the Credential type, and setting ‘Name’ to Vagrant, ‘Type
…
. Under the Team’s Projects section, add a new Project. Set the ‘Name’ to Tower Test, ‘Organization’ to Ansible for DevOps, ‘SCM Type’ to Manual, and ‘Playbook Directory’ to ansible-for-devops (Tower automatically detects all folders placed inside /var/lib/awx/projects, but you could also use an alternate Project Base
…
you want to store projects elsewhere). Under the Inventories section, add an Inventory. Set the ‘Name’ to Tower Local, and ‘Organization’ set to Ansible for DevOps. Once the inventory is saved: 1. Add a ‘Group’ with the Name localhost. Click on the group once it’s saved. 2. Add a ‘Host
…
Modules” to “Ansible Cookbooks” due to reader interest. Cleaned up Vagrantfile in chapter 3, as well as throughout ansible-for-devops git repo. Added “Web Architecture Example” example to ansible-for-devops git repo. Built structure of chapter 8 (“Ansible Cookbooks”). Added cowsay to chapter 8. Added information about add_host and
…
c. Removed ‘Variables’ chapter (variables will be covered in-depth elsewhere). Added Appendix B - Ansible Best Practices and Conventions. Started tagging code in Ansible for DevOps GitHub repository to match manuscript version (starting with this version, 0.50). Fixed various layout issues. Version 0.49 (2014-04-24) Completed history of
…
SSH in chapter 10. Clarified definition of the word ‘DevOps’ in chapter 1. Added section “Testing Ansible Playbooks” in chapter 14. Added links to Ansible for DevOps GitHub repository in the introduction and chapter 4. Version 0.47 (2014-04-13) Added Apache Solr
by Gene Kim, Kevin Behr and George Spafford · 14 Jul 2013 · 395pp · 110,994 words
write a book, describing the Three Ways and how other people can replicate the transformation you’ve made here at Parts Unlimited. Call it The DevOps Cookbook and show how IT can regain the trust of the business and end decades of intertribal warfare. Can you do that for me?” Write
…
says sternly, “Learn.” Shaking my head for a moment, I finally say, “Of course. It would be an honor and a privilege to write The DevOps Cookbook for you while I embark on what will probably be the most challenging three years of my entire career.” “Very good. It’ll be
…
was bigger than just Dev or Ops or Security. There’s a term that we’re hearing more lately: something called “DevOps.” Maybe everyone attending this party is a form of DevOps, but I suspect it’s something much more than that. It’s Product Management, Development, IT Operations, and even Information
…
?” Turning to her, I say with a sense of calm and inner peace, “What have we got, Patty?” To access more free resources on IT, DevOps, and helping your business win, visit: http://itrevolution.com/next Join us in spreading the word by leaving a review on Amazon or GoodReads, writing
…
. Gene Spafford from CERIAS at Purdue University, and Michael Krigsman from Asuret. I also want to attribute the contributions of my fellow coauthors of The DevOps Cookbook, Patrick DeBois, John Wills, and Mike Orzen. Among others, they helped crystallize the practices that became The Three Ways that Erik talked about. I
…
quest to keep improving in all aspects of my life. George Spafford Saint Joseph, MI, June 1, 2012 The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win © 2013 Gene Kim, Kevin Behr & George Spafford All rights reserved. ISBN13: 978-0-9882625-7-7 IT Revolution Press Portland
by Thomas A. Limoncelli, Strata R. Chalup and Christina J. Hogan · 27 Aug 2014 · 757pp · 193,541 words
5 Design Patterns for Scaling Chapter 6 Design Patterns for Resiliency Part II Operations: Running It Chapter 7 Operations in a Distributed World Chapter 8 DevOps Culture Chapter 9 Service Delivery: The Build Phase Chapter 10 Service Delivery: The Deployment Phase Chapter 11 Upgrading Live Services Chapter 12 Automation Chapter
…
survive failure. Part II Operations: Running It Chapter 7: Operations in a Distributed World Overview of how distributed systems are run. Chapter 8: DevOps Culture Introduction to DevOps culture, its history and practices. Chapter 9: Service Delivery: The Build Phase How a service gets built and prepared for production. Chapter 10
…
large-scale operations is through “quality management” techniques such as W. Edwards Deming’s “Shewhart cycle” (Deming 2000) or “Lean Manufacturing” (Spear & Bowen 1999). DevOps’s key principles are an application of these principles to web system administration. The book The Phoenix Project (Kim, Behr & Spafford 2013) explains these principles
…
system to verify that they are detected and handled. By embracing failure this way, we go from optimized to resilient. 8.3 History of DevOps The term “DevOps” was coined by Patrick Debois in 2008. Debois noticed that some sites had evolved the practice of system administration into something fundamentally different.
…
that some sysadmins have always had an emphasis on automation, though outside of web environments management did not value such skills. From their perspective, DevOps was driven by sysadmins who concentrated on their coding skills and began collaborating with the developers on deployment and code testing. Taking these steps into
…
Integration • Automation • Continuous improvement 8.4.1 Relationships In a traditional environment, the tools and scripts are seen as the primary focus of operational maintenance. DevOps gives more weight to the relationships among the teams and the various roles in the organization. Developers, release managers, sysadmins, and managers—all need to
…
networking, or security, this is a sign that integration of teams has not been achieved. 8.4.3 Automation Under the auspices of automation, DevOps strives for simplicity and repeatability. Configurations and scripts are handled as source code and kept under version control. Building and management of the source code
…
commitment that could be focused on the problem. Buying an expensive third-party system that would have needed configuration and outside management was avoided. * * * DevOps: Not Just for the Web A Practical Approach to Large-Scale Agile Development: How HP Transformed HP LaserJet FutureSmart Firmware by Gruver, Young, and Fulghum
…
with business needs, the agility of the development process is maintained and development can respond to changing business needs easily without wasted effort and rework. DevOps is the application of Agile methodology to system administration. 8.6.2 What Is Continuous Delivery? Continuous delivery (CD) is a set of principles
…
This shift brings measurable efficiency and increased uptime when it is strongly applied, and you should explore it for yourself. Theo Schlossnagle (2011) describes DevOps as the natural response to “the operationalism of the world.” As velocity increases in companies of every type, the most successful competitors will be those
…
are many possible service delivery strategies. Most methodologies fall along the continuum between the older waterfall methodology and the more modern methodology associated with the DevOps world. We recommend the latter because it gets better results and encourages faster rates of innovation. It focuses on automation, instrumentation, and improvement based
…
on data. 9.1.1 Pattern: Modern DevOps Methodology The DevOps methodology divides the platform into two phases: the build phase and the deployment phase. The build phase is concerned with taking the source code
…
between steps marks the flow of work, not organizational boundaries. 9.1.2 Anti-pattern: Waterfall Methodology The waterfall methodology works differently from the modern DevOps methodology. It is predicated on multiple phases, each controlled by a different organization. Handoffs not only mark the flow of work, but also indicate
…
capacity-planning-in-the-real-world Index A-B testing, 232–233 AAA (authentication, authorization, and accounting), 222 Abbott, M., 99–100 Abstracted administration in DevOps, 185 Abstraction in loosely coupled systems, 24 Abts, D., 137 Access Control List (ACL) mechanisms description, 40 Google, 41 Access controls in design for
…
270 code reviews, 268–269 compensatory principle, 246–247 complementarity principle, 247–248 continuous delivery, 190 crash data collection and analysis, 129 creating, 255–258 DevOps, 182, 185–186 exercises, 272–273 goals, 252–254 hidden costs, 250 infrastructure strategies, 217–220 issue tracking systems, 263–265 language tools, 258
…
338 BASE (Basically Available Soft-state services) databases, 24 Baseboard Management Controller (BMC), 218 Basecamp application, 55 bash (Bourne Again Shell), 259 Batch size in DevOps, 178–179 Bathtub failure curve, 133 Beck, K., 189 Behaviors in KPIs, 390–391 Behr, K., 172 Bellovin, S. M., 79 Benchmarks in service
…
Blog Search, upgrading, 226 Blue-green deployment, 230 BMC (Baseboard Management Controller), 218 Botnets, 140 Bots in virtual offices, 166–167 Bottlenecks automation for, 257 DevOps, 179 identifying, 96 Bourne Again Shell (bash), 259 Bowen, H. K., 172 Boyd, John, 296 BSD UNIX, 460 Buckets in histograms, 361 Buffer thrashing
…
assessments, 442–443 build console, 205 build step, 203–204 commit step, 202–203 continuous deployment, 237 continuous integration, 205–207 develop step, 202 DevOps, 185–186 exercises, 209 overview, 195–196 package step, 204 packages as handoff interface, 207–208 register step, 204 service delivery strategies, 197–200
…
Code review system (CRS), 268–269 Cognitive systems engineering (CSE) approach, 248 Cold caches, 106 Cold storage factor in service platform selection, 54 Collaboration in DevOps, 183 Collection systems, 345 central vs. regional collectors, 352–353 monitoring, 349–353 protocol selection, 351 push and pull, 350–351 server component vs.
…
continuous deployment, 237 Computation, monitoring, 353–354 Confidence in service delivery, 200 Configuration automating, 254 deployment phase, 213–214 in designing for operations, 33–34 DevOps, 185 four-tier web service, 80 monitoring, 345–346, 362–363 Configuration management (CM) languages, 260–262 Configuration Management Database (CMDB), 222 Configuration management
…
code review system), 268–269 CSE (cognitive systems engineering) approach, 248 Current usage in capacity planning, 368–369 Customer functionality, segmentation by, 103 Customers in DevOps, 177 Cycle time, 196 Daemons for containers, 61 Daily oncall schedules, 289 Dark launches, 233, 383–384 Dashboards for alerts, 293 Data analysis in capacity
…
launches, 158 Deployment and deployment phase, 195, 197, 211 approvals, 216–217 assessments, 444–445 configuration step, 213–214 continuous delivery, 221 defined, 196 DevOps, 185 exercises, 223 frequency in service delivery, 201 infrastructure as code, 221–222 infrastructure automation strategies, 217–220 installation step, 212–213 installing OS and
…
resiliency. See Resiliency Design patterns for scaling. See Scaling Details design documents, 278 postmortems, 302 Develop step in build phase, 202 Developers for oncall, 287 DevOps, 171–172 Agile, 188–189 approach, 175–176 automation, 182, 185–186 batch size, 178–179 build phase, 197–198 business level, 187–188
…
, 181 starting, 187 strategy adoption, 179–180 summary, 192 vs. traditional approach, 173–175 values and principles, 181–186 workflow, 176–177 DevOps Cafe Podcast, 180, 200 DevOps culture, 171 “DevOps Days” conferences, 180 Diagnostics, monitoring, 337 Dickson, C., 345 Dickson model, 334 diff tool, 33 Differentiated services, 233 Direct measurements, 347
…
system, 24 “Each Necessary, But Only Jointly Sufficient” article, 302 ECC (error-correcting code) memory, 131–132 Edge cases, 153 Edwards, Damon DevOps benefits, 172–173 DevOps Cafe podcast, 180, 188, 200 Effectiveness of caches, 105 80/20 rule for operational features, 47 Elements of Programming Style, 11 Eliminating tasks, 155
…
EMA (exponential moving average), 367, 379 Email alerts, 292–293 archives, 277 Embedded knowledge in DevOps, 177–178, 187 Emergency hotfixes, 240 Emergency issues, 160 Emergency Response (ER), 403, 426–428 Emergency tasks, 156 Employee human resources data updates example,
…
, and Deploying Messaging Solutions, 87 Environment-related files, 220 Ephemeral computing, 67 Ephemeral machines, 58 Erlang language, 236 Error Budgets, 152 case study, 396–399 DevOps, 184 Error-correcting code (ECC) memory, 131–132 Escalation alert messages, 345, 354–357 automated, 128–129 monitoring, 333 third-party, 298 Etsy blog,
…
43–44 Exceptional situations. See Oncall Execution in service delivery, 201 Executive summaries in design documents, 277, 282 Expand/contract technique, 234–235 Experimentation in DevOps, 178 Expertise of cloud providers factor in service platform selection, 66 Explicit oncall handoffs, 299 Exponential moving average (EMA), 367, 379 Exponential scaling, 476
…
Face-to-face discussions in DevOps, 187 Facebook chat dark launch, 384 recommended reading, 488 Factorial scaling, 477 Fail closed actions, 40 Fail open actions, 40 Failed code pushes, 239
…
operations, 46 building, 45 toggles, 39, 230–232 writing, 47–48 Federal Emergency Management Administration (FEMA) web site, 324 Feedback design for operations, 47–48 DevOps, 177–178, 186–187 Feeding from queues, 113 Felderman, B., 137 FEMA (Federal Emergency Management Administration) web site, 324 Files, environment-related, 220 Finance
…
Hudson tool, 205 Human error, 141–142 Human processes, automating, 154 Human resources data updates example, 89–90 Humble, J. continuous delivery, 190, 223 DevOps Cafe Podcast, 188, 200 HVMs (hardware virtual machines), 58 Hybrid load balancing strategy, 75 Hyper-Text Transfer Protocol (HTTP) load balancing, 75 overview, 69 IaaS
…
Incident Commanders, 324–325, 328 Index lookup speed, 28 Individual training for disaster preparedness, 311–312 Informal review workflows, 280 Infrastructure automation strategies, 217–220 DevOps, 185 service platform selection, 67 Infrastructure as a Service (IaaS), 51–54 Infrastructure as code, 221–222 Inhibiting alert messages, 356–357 Initial level
…
148 Input/output (I/O) overload, 13 virtual environments, 58–59 Installation in deployment phase, 212–213 OS and services, 219–220 Integration in DevOps, 182 Intel OKR system, 389 Intentional delays in continuous deployment, 238 Intermodal shipping, 62 Internal backbones in cloud-scale service, 83–85 Internet Protocol (IP
…
, 222 load balancers, 72–73 restrictions on, 40 Introducing new features, flag flips for, 232 Introspection, 10 Invalidation of cache entry, 108 Involvement in DevOps, 183 IP (Internet Protocol) addresses deployment phase, 222 load balancers, 72–73 restrictions on, 40 Isolation in ACID term, 24 ISPs for cloud-scale
…
, 59 Netflix Aminator framework, 219 Netflix Simian Army, 315 Networks access speed, 26–27 counters, 349 interface failures, 133 protocols, 489 New feature reviews in DevOps, 183 New Product Introduction and Removal (NPI/NPR) assessments, 435–436 operational responsibility, 404 New services, launching, 382–384 Nielsen, Jerri, 225 Non-blocking
…
bandwidth, 137 Non-functional requirements term, 32 Non-goals in design documents, 277 Nonemergency tasks, 156 Nontechnical DevOps practices, 183–184 Normal growth in capacity planning, 369 Normal requests, 161 NoSQL databases, 24 Notification types in oncall, 292–293 Objectives in Incident Command
…
coordination, 294 alert responsibilities, 295–296 alert reviews, 302–304 benefits, 152 calendar, 290–291, 355 continuous deployment, 238 defined, 148 designing, 285–286 DevOps, 183 end-of-shift responsibilities, 299 excessive paging, 304–305 exercises, 306 frequency, 291–292 long-term fixes, 299–300 notification types, 292–293 onduty
…
Privacy in platform selection, 63 Private cloud factor in platform selection, 62 Private sandbox environments, 197 Proactive scaling solutions, 97–98 Problems to solve in DevOps, 187 Process watchers, 128 Processes automation benefits, 253 containers, 60 instead of threads, 114 Proctors for Game Day, 318 Product Management (PM) monitoring, 336
…
351 network, 489 Prototyping, 258 Provider comparisons in service platform selection, 53 Provisional end-of-shift reports, 299 Provisioning in capacity planning, 384–385 in DevOps, 185–186 Proxies monitoring, 352 reverse proxy service, 80 Public cloud factor in platform selection, 62 Public Information Officers in Incident Command System, 325–326
…
97 Regional collectors, 352–353 Registering packages, 204, 206 Regression analysis, 375–376 Regression lines, 376 Regression tests for performance, 156, 215 Regular meetings in DevOps, 187 Regular oncall responsibilities, 294–295 Regular software crashes, 128 Regular Tasks (RT) assessments, 423–425 operational responsibility, 403 Regulating system integration, 250 Relationships in
…
in continuous deployment, 237 Requests in updating state, 18 “Resilience Engineering: Learning to Embrace Failure” article, 320 Resiliency, 119–120 capacity planning, 370–371 DevOps, 178 exercises, 143 failure domains, 126–128 human error, 141–142 malfunctions, 121–123 overload failures, 138–141 physical failures. See Physical failures software failures
…
(SQS), 86 Simplicity importance, 11 review workflows, 280 Singapore MAS requirements, 43 Single-machine web servers, 70–71 Site Reliability Engineering (SRE), 147–148 DevOps, 181 overview, 151–152 vs. traditional enterprise IT, 148–149 Site reliability practices, 151–152 Size batches, 178–179 caches, 108–110 SLAs (service
…
Terminology for Incident Command System, 324 Test-driven development (TDD), 267–268 Tests vs. canarying, 228–229 continuous deployment, 237 deployment phase, 215–216 DevOps, 186 disaster preparedness. See Disaster preparedness early and fast, 195 environments, 197 flag flips for, 232–233 Text-chat, 167 Text files for configuration,
by David N. Blank-Edelman · 16 Sep 2018
introducing SRE into your enterprise. SRE has an exciting future in small and large organizations alike. Further Reading Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, by Nicole Forsgren, Jez Humble, and Gene Kim (IT Revolution Press, 2018) Increment On-Call Starting and
…
reference to teach new SRE teams how to think and work like that. The Kata approach is already becoming a common practice in Agile and DevOps efforts looking to improve development and delivery processes. However, operations processes are rarely deemed worthy of the effort, especially in legacy operations cultures. This
…
cofounder of Rundeck Inc., the makers of Rundeck, the popular open source operations management platform. Damon was previously a managing partner at DTO Solutions, a DevOps and IT operations improvement consultancy focused on large enterprises. Damon is a frequent conference speaker, writer, and podcast host. 1 Mean Time to Detect (
…
”. 11 Netflix Technology blog, “Introducing Bolt: On Instance Diagnostic and Remediation Platform” 12 Kim, Gene, Patrick Debois, John Willis, and Jez Humble. (2017). The DevOps Handbook. Portland, OR: IT Revolution Press, LLC. 13 Site Reliability Engineering: How Google Runs Production Systems, Introduction. 14 https://www.youtube.com/watch?v=CFMJ3V4VakA
…
”, YouTube video, 45:57, posted by USENIX, January 12, 2012) Figure 11-2. The handoffs readiness review at Google (source: “SRE@Google: Thousands of DevOps Since 2004”, YouTube video, 45:57, posted by USENIX, January 12, 2012) As Tom Limoncelli, coauthor of The Practice of Cloud System Administration (Addison-Wesley
…
Eran Messeri, “What Goes Wrong When Thousands of Engineers Share the Same Continuous Build?” Aarhus, Denmark, October 2, 2013. Tom Limoncelli, “SRE@Google: Thousands Of DevOps Since 2004”, YouTube video of USENIX Association Talk, NYC, posted by USENIX, 45:57, posted January 12, 2012. Ben Treynor, “Keys to SRE” (presentation,
…
, as they are considered inefficient. Though this is not absolute, just a trend. SRE is programming the operations to create reliable and efficient infrastructure. DevOps focuses on breaking down cultural silos and increasing efficiency or velocity of deployment (CI/CD) pipeline, from development to delivery; this includes building and pre
…
testing before and after the artifact is built), thus CT, or continual testing. It takes over where Agile left off and embraces aspects of Lean. DevOps would work with optimization and integration upstream (build, test) and downstream toward deployment and delivery. There is overlap, where deployment/delivery to an operational site
…
there is a defined SRE role with a set of responsibilities. That is another fundamental difference between SRE and DevOps; there is not a clearly defined “DevOps Engineer” role. — Jayne Groll, CEO, DevOps Institute ◆ ◆ ◆ DevOps is underpinned by three principles: systems thinking (looking at the whole system not just your slice), amplifying feedback
…
its components, down to the underlying infrastructure, using methodology adopted and recognized as a fit for accomplishing that task. — Keith McDuffee, senior manager, infrastructure & DevOps, Cardinal Health ◆ ◆ ◆ DevOps and SRE are related, and we are all developers again. In my first job at IBM in the early 80’s, three kinds of
…
a meaningful difference between the two. We are all developers, again. — Paula Paul, technology principal, ThoughtWorks ◆ ◆ ◆ Site Reliability Engineering: we don’t know what DevOps is, but we know we’re something slightly different. — Mike Doherty, successful reliability escapee ◆ ◆ ◆ If you look at job advertisements the answer is quite clear
…
: the industry has decided that DevOps engineers focus on the SDLC pipeline with occasional responsibilities for production operations. Job advertisements for SREs focus on production operations with occasional responsibilities for the
…
SRE manager, Stack Overflow, Inc., Google SRE Alum ◆ ◆ ◆ They are similar in that both have a heavy focus on automation around developer and operations workflows. DevOps focuses on upgrading old workflows and cultures with efficient tools and strategies that scale. SRE focuses on preventing customer downtime by autohealing on events, knowing
…
increase discussion of topics such as repetitive manual work (toil) and release management (launch configuration engineer). — Mark Rendell, principal director, Accenture ◆ ◆ ◆ Reliability engineering and DevOps aim to solve the same problem set that most of the world is now realizing they are faced with: keeping digital services always online and
…
available while improving functionality and operability over time. While DevOps remains elusive in a manifesto-type definition, reliability engineering assigns a more concrete role and responsibility to the term in many ways. At its
…
core, site reliability engineering embodies and encourages the same exact principles that have been associated with DevOps since the term began entering the web operations lexicon. Building and operating digital services with 24/7 availability expectations has become a necessity for more
…
Advocate, Microsoft (formerly VictorOps) ◆ ◆ ◆ At PayPal, we believe that site reliability engineers are both the ultimate enablers as well as the ideal practitioners of DevOps. To that end, we engineer for reliability in two major ways. The first, as platform providers, building and constantly improving the key tools that enable
…
application deployments or write patches for infrastructure tools in ways that the traditional system administrator could not. — John Siegrist, release engineer, Deswik Mining ◆ ◆ ◆ While DevOps and SRE roles overlap a lot technical execution–wise, the distinction likely comes from the size and scale of the organization. Since software and its
…
it’s a low wall!). It’s no wonder we struggle to differentiate the two. — Dave Mangot, former head of site reliability engineering, SolarWinds Cloud ◆ ◆ ◆ DevOps is really about Dev and Ops working together, with complementary and overlapping skills, but really focused on different areas. The main goal is supporting developers
…
SLOs. Both aim for the same goals but take different paths. The SLOs help steer effort and investment and there’s no similar instrument in DevOps. DevOps is more common for startups, where the incentive to reach production as fast as possible clearly outweighs reliability or availability. SRE makes more sense on
…
well-established businesses when the conflict between innovation and reliability start to emerge. — Luis Mineiro, principal site reliability engineer, Zalando ◆ ◆ ◆ In my view, DevOps is a set of practices organizations can adopt to better enable the operation and delivery of products to their customers. The simplest summary of these
…
Vertically integrated staff are generally integrated directly with engineering teams and remain dedicated to said team. — Aaron Blew, director of service reliability engineering (SRE), iovation ◆ ◆ ◆ DevOps 10 years ago to me was a way to express the need for a better, nonfunctional and nonpolitic agreement between who made software and changes
…
] class of operational user with a strong development background along with a deep understanding of operational principles and designs. SRE teams should closely embody the DevOps philosophy, using its basic tenets to provide the operational integrity and stability to their environment through contact with service owners and other operational teams. —
…
innovation can thrive. With the recent publication of Google’s SRE book as well as a fair number of other publications and conferences about SRE, DevOps, and related movements, there’s a fairly active conversation about reliability engineering across the industry. As an inherently more sensitive topic, privacy engineering is
…
, William. (2009). Statistics for Engineers and Scientists. New York: McGraw-Hill Education. Limoncelli, Thomas A., et al. (2014). The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Vol. 2. Boston: Addison-Wesley Professional. Contributor Bio Theo Schlossnagle has been architecting, coding, building, and operating scalable systems
…
not measuring, sharing responsibility for, and attempting to remediate risks from outside your own systems to successful customer outcomes. At the core of what the DevOps philosophy teaches us is the realization that operational silos result in missed SLOs. The effects of laggy communication boundaries and “Somebody Else’s Problem”
…
to relentlessly winnowing out operational toil, delay, and human error through process and software engineering; and to shared responsibility for outcomes. As with Lean and DevOps, both of which SRE ideally has many characteristics in common, it is an ongoing process that requires dedicated attention not just from the expert team
…
,”19 Matt Klein’s “Introduction to Modern Network Load Balancing and Proxying,”20 and the “Application Architectures” chapter in Practice of Cloud System Administration: DevOps and SRE Practices for Web Services.21 Contributor Bio Emil Stolarsky is an infrastructure engineer with a passion for load balancers, performance, and DNS tooling
…
nearby rock climbing gym. 1 Limoncelli, Thomas A., Strata R. Chalup, and Christina J. Hogan. (2014). “Application Architectures.” In Practice of Cloud System Administration: DevOps and SRE Practices for Web Services. Boston: Addison-Wesley Professional. 2 Shuff, Patrick. (2015). “Building a Billion User Load Balancer.” Talk at SREcon15 EU. 3
…
Matt. (2017). “Introduction to Modern Network Load Balancing and Proxying.” 21 Limoncelli, Thomas A., et al. (2014). “Application Architectures.” In Practice of Cloud System Administration: DevOps and SRE Practices for Web Services. Boston: Addison-Wesley Professional. Chapter 26. The Service Mesh: Wrangler of Your Microservices? Matt Klein, Lyft Over the past
…
Pattern 1: Birth of Automated Testing at Google Blank-Edelman, DavidContext Versus Control in SRE, Context Versus Control in SRE-Context Versus Control in SRE DevOps and SRE, Background-Replies Production Engineering at Facebook, Production Engineering at Facebook-Production Engineering at Facebook Blew, Aaron, Replies blind resume review, Biases blue/
…
for scale, Self-Service for Scale tools for, Tools de-provisioning, Provisioning, Change Management, and Velocity Debois, Patrick, SRE Patterns Loved by DevOps, SRE Patterns Loved by DevOps People Everywhere-Conclusion decision trees, Why Now? What Changed for Us?, Decision trees-Decision trees decision-making, uncertainty and, Critical Decisions Made Under
…
teams into, The Embedded SRE interaction with SRE teams, Choose SRE for the Right Reasons DevOpsand enterprise operations model–SRE transition, Leverage Existing Enthusiasm for DevOps and SRE origins, Where Did SRE Come From? automated testing at Google, Pattern 1: Birth of Automated Testing at Google-Pattern 1: Birth of
…
Get in the Way enterprise operations model–SRE transition, Clearing the Way for SRE in the Enterprise-Join the MovementDevOps and, Leverage Existing Enthusiasm for DevOps error budgets, Error Budgets-Error Budgets Lean manufacturing concepts applied to, Start by Leaning on Lean-Start by Leaning on Lean minimizing handoffs, Get
…
at, Pattern 1: Birth of Automated Testing at Google-Pattern 1: Birth of Automated Testing at Google data center cooling management with AI, Success Stories DevOps–SRE relationship, Replies documentation at, Do Docs Better: Integrating Documentation into the Engineering Workflow, The Google Experience: g3doc and EngPlay-Integrations are key to
…
Team-Operations teams are bad at estimating their level of psychological safety human-factors research, Beyond Burnout Humble, Jez, SRE Patterns Loved by DevOps, SRE Patterns Loved by DevOps People Everywhere-Conclusion I immutable infrastructure, Immutable Infrastructure and SRE-Disadvantagesbase image construction, Building the Base Image continuous integration/continuous deployment with confidence
…
SRE Come From? Key Performance Indicators (KPIs), Where Did SRE Come From?, Monitoring, Metrics, and KPIs Kim, Gene, SRE Patterns Loved by DevOps, SRE Patterns Loved by DevOps People Everywhere-Conclusion Kissner, Lea, The General Landscape of Privacy Engineering Klein, Matt, SRE throughout the development cycleon service meshes, The Service Mesh
…
State-To establish a roadmap for what products SRE will be responsible for, survey the current infrastructure landscape defining SRE for, Defining SRE-Defining SRE DevOps–SRE relationship, Replies identifying/educating stakeholders, Identifying and Educating Stakeholders implementing the SRE team, Implementing the SRE Team-Defining the role of supporting divisions
…
-Conclusion LGBTQ+ inclusivity, Mental Disorders Are Missing from the Diversity Conversation, Benefits Lightweight Directory Access Protocol (LDAP), Understanding External Dependencies Limoncelli, Thomas A.on DevOps–SRE relationship, Replies on LRR, Pattern 2: Launch and Handoff Readiness Review at Google on shared source repository, Pattern 3: Create a Shared Source Code
…
, Training on-call, Training transactions, as availability metric, Transactions transgender inclusivity, Benefits Treat, Robert, Replies Treynor Sloss, Benjamin, Leverage Existing Enthusiasm for DevOps, SRE Patterns Loved by DevOps People Everywhere triage, First, Do No Harm trust, forgiveness as corollary to, The corollary to trust is forgiveness 2001: A Space Odyssey (movie
…
in SRE-Context Versus Control in SRE Wheel of Misfortune (game), Active Learning Example: Wheel of Misfortune Willis, John, SRE Patterns Loved by DevOps, SRE Patterns Loved by DevOps People Everywhere-Conclusion window of vulnerability, Window of Vulnerability Woods' Theorem, Mental Models work-life balance, Developers’ Productivity and Health Versus the
…
Zwieback, Dave, Approaching Operations as an Engineering Problem About the Editor David N. Blank-Edelman has over 30 years of experience in the SRE/DevOps/sysadmin field in large multiplatform environments. He currently works for Microsoft as a senior cloud operations advocate focusing on site reliability engineering. He is a
…
Google Pattern 2: Launch and Handoff Readiness Review at Google Pattern 3: Create a Shared Source Code Repository Conclusion Further Reading and Source Material 12. DevOps and SRE: Voices from the Community Background Method Results Replies 13. Production Engineering at Facebook II. Near Edge SRE 14. In the Beginning, There
by Matthew Skelton and Manuel Pais · 16 Sep 2019
and what behaviors you encourage. Despite this, few have attempted to catalog and analyze the organizational design patterns of IT organizations going through digital, DevOps, and SRE transformations. Skelton and Pais have not only accepted this bold challenge, but they’ve also hit the mark by creating an indispensable
…
organizational anti-patterns as excellently described in the models. . . . I am extremely pleased that Matthew and Manuel are growing on the success of the DevOps Topologies and turning their further learnings into the far-reaching book Team Topologies for organization design.” —Crystal Hirschorn, VP of Engineering, Global Strategy and Operations
…
to think about the relationships between teams and how their interactions influence both organizational culture and software architecture. Over time, we realized that the original DevOps Topologies presented a static view of team interrelationships that, while useful for initial discussions, was quite limited in scope. Through our combined experience with
…
aligned to the flow of business, developing and releasing in small, iterative cycles, and course correcting based on feedback from users. Lean IT and DevOps also encouraged big strides in telemetry and metrics tooling for both systems and teams, helping people building and running software to make proactive, early decisions
…
valid historical reasons for the predominance of monolithic databases (such as the rise in specialism of people and teams on technical stack layers) up until DevOps and microservices gained traction. Factors such as project orientation, cost cutting via outsourcing, or junior teams without sufficient experience have contributed to the perpetuation of
…
these problems early and thereby improve the effectiveness of IT as a whole. (We will look further at this organizational sensing in Part III.) DevOps and the DevOps Topologies This kind of organizational sensing “nirvana,” with cross-functional teams that build, test, and operate their own software, was an unfamiliar concept
…
operations teams but in interactions with many other teams involved in software delivery, like QA, InfoSec, networking, and more. Even though many people see DevOps as fundamentally addressing technological aspects of automation and tooling, only organizations that also address fundamental misalignments between teams are able to achieve the full potential
…
benefits from adopting DevOps. DevOps Topologies The DevOps Topologies catalog, originally created by Matthew Skelton in 2013 and later expanded by Manuel Pais, is an online collection of team design and
…
and operational experience. The implicit idea was that teams should evolve and morph over time. This chapter presents some of the patterns in the DevOps Topologies catalog to help illustrate the thinking around choosing team structures with organization context and needs in mind. It is not intended to be a
…
result from the cumulative effects of several teams working on the same codebase unless inter-team discipline is high. Around 2015, Ericsson moved to a DevOps approach to building and running software for emerging telecom business areas, such as “Software-Defined Networking” or “Network Functions Virtualization.”7 Teams in this
…
the mature engineering practices required to keep a sustainable pace over time (such as automated testing, deployment, or monitoring). They could benefit from a temporary DevOps team with battle-tested engineers to bring in expertise and, more importantly, bring teams together by collaborating on shared practices and tools. However, without
…
and teams within it. At the very least, organization size (or software scale) and engineering maturity should influence which topologies are chosen in a DevOps context, as shown in Figure 4.4. Figure 4.4: Influence of Size and Engineering Maturity on Choice of Topologies Organization size (or software
…
to help support the infrastructure while this happened.15 A third stage of evolution aimed to build on their earlier success and fully transition the DevOps team from an execution role to an evangelizing one, so that development and operations teams would become self-sufficient and collaborate around automation of
…
based analytical solutions. We knew that in order to scale effectively, we had to consider the interrelationships between different technology teams. We turned to the DevOps Topologies patterns to help us plan out our digital transformation. We wanted to bring development (Dev) and operations (Ops) closer together but avoid a
…
our digital transformation would happen and accelerated our adoption of cloud technologies and automation approaches. The patterns helped us avoid some pitfalls, like a separate DevOps team, and helped define team responsibilities more effectively. We’ve been able to scale our technology division significantly over the past four years, with
…
The result is team structures optimized for problems that are temporary or limited in scope, rather than adaptive to new problems over time. The “DevOps team” anti-pattern is a quintessential example. On paper, it makes sense to bring automation and tooling experts in house to accelerate the delivery and
…
evolve slowly), rather than adapting those that solve a particular problem or need in a given moment in time. In particular, within a DevOps context the DevOps Topologies can help shed some light on which topologies work well for which contexts. Forward-thinking organizations take a multi-stage approach to their
…
were successfully being “Agile” in our software development, we realized we were having problems delivering it to live efficiently and reliably. Later, we used DevOps practices as a core part of squads, and eventually we evolved dedicated reliability teams, combined with operations people being embedded in the delivery teams. During
…
the tools needed to maintain a disaster recovery (DR) environment in sync with production systems. Drawing from the lessons of how we started our DevOps journey, we created a platform evolution team drawn from a mixture of software and infrastructure backgrounds. The platform evolution team’s first priority was implementing
…
techniques like TDD [test-driven development], retrospectives, and product owners), and some focused on day-to-day operational activities. We don’t have any “DevOps” people; instead, we have experienced operations engineers who do in-depth analysis of live service problems. This means that our focus in the infrastructure area
…
and benefit the business (while allowing a monolith split along more valuable fracture planes, like business-aligned bounded contexts). For example, in his book DevOps for the Modern Enterprise, Mirco Hering explains how to apply good coding and version-control practices when dealing with proprietary COTS products.9 Fracture Plane
…
and autonomous. As we hired more engineers, the architecture and practices which worked for a single team were not going to scale. We put DevOps and continuous delivery practices at the center of our design choices and started transitioning to a microservices-type architecture from our existing (successful) monolithic system
…
work we need to do now.” This level of specialization introduces bottlenecks in delivery (as storied in The Phoenix Project and explained in The DevOps Handbook). The routine aspect can also negatively affect individual motivation. Another aspect at play occurs when the team no longer holds a holistic view
…
: consuming or providing something with minimal collaboration. RECOMMENDED READING Key Management Concepts and Practices for Reliable, Fast Flow Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim (Portland, Oregon: IT Revolution, 2018). Designing Delivery: Rethinking
…
Academy of Management Executive 13 no. 3 (1999), 36–49. Forsgren, PhD, Nicole, Jez Humble, and Gene Kim. Accelerate: The Science of Lean Software and Devops: Building and Scaling High Performing Technology Organizations. Portland, Oregon: IT Revolution Press, 2018. Fowler, Martin. “Bliki: BoundedContext.” MartinFowler.com (blog), January 15, 2014. https
…
.com/@pingles/convergence-to-kubernetes-137ffa7ea2bc. innolution. n.d. “Feature Team Definition | Innolution.” Accessed October 14, 2018. https://innolution.com/resources/glossary/feature-team “DevOps Over Coffee—Adidas.” YouTube video, 32:03, posted by IT Revolution, July 3, 2018. https://www.youtube.com/watch?v=oOjdXeGp44E&feature=youtu.be&t
…
the Age of Digital Disruption with the Flow Framework. Portland, OR: IT Revolution Press, 2018. Kim, Gene, Jez Humble, Patrick Debois, and John Willis. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. Portland, OR: IT Revolution Press, 2016. Kim, Dr. Kyung Hee, and
…
Carayannis, 245–250. New York: Springer, 2013. Kitagawa, Justin. “Platforms at Twilio: Unlocking Developer Effectiveness.” InfoQ, October 18, 2018. https://www.infoq.com/presentations/twilio-devops Kitson, Jon. “Squad Health Checks.” Sky Betting & Gaming Technology (blog), February 1, 2017. https://technology.skybettingandgaming.com/2017/02/01/squad-health-checks/. Kniberg, Henrik
…
.” SlideShare, posted by Pivotal, August 10, 2016. https://www.slideshare.net/Pivotal/microservices-organizing-large-teams-for-rapid-delivery. Mihaljov, Timo. “Having a Dedicated DevOps Person Who Does All the DevOpsing Is like Having a Dedicated Collaboration Person Who Does All the Collaborating.” Tweet. @noidi. April 14, 2017. https://twitter
…
Leeds, UK: Conflux Digital, 2019. Morris, Kief. Infrastructure as Code: Managing Servers in the Cloud. Sebastopol, CA: O’Reilly Media, 2016. Munns, Chris. “Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Deploys per Year.” SlideShare.net, posted by TriNimbus, May 6, 2016. http://www.slideshare.net/TriNimbus/chris-munns
…
Liberators.” BarryOvereem.com (blog), August 7, 2015. http://www.barryovereem.com/how-i-used-the-spotify-squad-health-check-model/. Pais, Manuel. “Damon Edwards: DevOps is an Enterprise Concern” InfoQ, May 31, 2014. https://www.infoq.com/interviews/interview-damon-edwards-qcon-2014. Pais, Manuel. “Prezi’s CTO on How
…
resources-pioneering-role-agile-ing.aspx. Schwartz, Mark, Jason Cox, Jonathan Snyder, Mark Rendell, Chivas Nambiar, and Mustafa Kapadia. Thinking Environments: Evaluating Organization Models for DevOps to Accelerate. Portland, OR: IT Revolution Press, 2016. Seiter, Courtney. “We’ve Changed Our Product Team Structure 4 Times: Here’s Where We Are
…
successful-engineering-8a9b6a4d2c8. Simenon, Stefan, and Wiebe de Roos. “Transforming CI/CD at ABN AMRO to Accelerate Software Delivery and Improve Security.” SlideShare, posted by DevOps.com, March 27, 2018. https://www.slideshare.net/DevOpsWebinars/transforming-cicd-at-abn-amro-to-accelerate-software-delivery-and-improve-security. Sinha, Harsh. “Harsh
…
Quality & Safety in Health Care 13 Suppl. 2 (1961): ii22–27. https://doi.org/10.1136/qshc.2003.009522. “What Team Structure is Right for DevOps to Flourish?” DevOpsTopologies.com, accessed March 21, 2019. http://web.devopstopologies.com. Wiley, Evan. “Scaling XP Through Self-Similarity at Pivotal Cloud Foundry.” Agile
…
. Software Architecture.” 9. Raymond, The New Hacker’s Dictionary, 124. 10. Lewis, “Microservices and the Inverse Conway.” 11. Pink, Drive, 49. Chapter 2 1. “DevOps Over Coffee – Adidas;” Fernando Cornago, person email communication with the authors, March 2019. 2. MacCormack et al., “Exploring the Structure of Complex Software Designs,” 1015
…
“The Rule of 5, 15 & 150;” Karlgaard and Malone, Team Genius, 201–205. 8. Lewis, “Microservices and the Inverse Conway Manoeuvre.” 9. Munns, “Chris Munns, DevOps @ Amazon.” 10. Brooks, The Mythical Man-Month. 11. Tuckman, “Developmental Sequence in Small Groups,” 384–399. 12. Kelly, Project Myopia, 72. 13. Helfand, Dynamic Reteaming
…
@ Spotify.” 4. Kniberg and Ivarsson, “Scaling Agile @ Spotify.” 5. Forsgren et al., Accelerate, 63. 6. Skelton, “What Team Structure Is Right for DevOps to Flourish?” 7. John, “DevOps for Service Providers—Next Generation Tools.” 8. Hastie, “An Interview with Sam Guckenheimer on Microsoft’s Journey to Cloud Cadence.” 9. Ben Treynor
…
Flow, 265. 3. Lane, “The Secret to Amazon’s Success—Internal APIs;” Hoff, “Amazon Architecture.” 4. Crawford, “Amazon’s ‘Two-Pizza Teams;’” Munns, “Chris Munns, DevOps @ Amazon.” 5. Kramer, “The Biggest Thing Amazon Got Right.” 6. Sussna, Designing Delivery, 148. 7. Pink, Drive, 49. 8. Eckstein, “Architecture in Large Scale
…
Communities of Practice, 11. 12. Bottcher, “What I Talk About When I Talk About Platforms.” 13. Eckstein, Agile Development in the Large, 53. 14. Neumark, “DevOps & Product Teams—Win or Fail?” 15. Reinertsen, The Principles of Product Development Flow, 292. 16. Womack and Jones, Lean Thinking. 17. Urquhart, “IT Operations
…
BoundedContext.” 6. Tune and Millett, Designing Autonomous Teams and Services, 38. 7. Nygard, “The Perils of Semantic Coupling.” 8. Helfand, Dynamic Reteaming, 203. 9. Hering, DevOps for the Modern Enterprise, 45. 10. Phillips, “Testing Observability.” Chapter 7 1. Bernstein et al., “How Intermittent Breaks in Interaction Improve Collective Intelligence,” 8734–8739
…
6. Drucker, The Daily Drucker, 291. 7. Stanford, Guide to Organisation Design, 17. 8. Narayan, Agile IT Organization Design, 65. 9. Kim et al., The DevOps Handbook, 11. 10. Sussna, Designing Delivery, 58. 11. Narayan, Agile IT Organization Design, 31. Conclusion 1. Conway, “How do Committees Invent?” 31. 2. Manns
…
184–185 “Capturing the Complexity in Advanced Technology Use: Adaptive Structuration Theory” (DeSanctis and Poole), xxi case studies complicated-subsystems teams, 94–95, 97–99 DevOps Topologies, 75–77 enabling teams, 88–90 fracture planes, 121–125 organizational sensing, 154–155, 157–159, 162–164 software boundaries, 121–125 static topologies
…
62 anti-patterns, 62 case study, 75–77 cloud teams, 69–70 compatible support systems, 69 credit reference agencies, 76–77 dependencies, 74–75 DevOps, 65–67 DevOps Topologies, 66–67, 75–77 engineering maturity, 73–74 feature teams, 67–68 flow of change, designing for, 63–64 healthcare organizations, 75–76
…
, Miguel Antunes, Paul Ingles, Pulak Agrawal, Robin Weston, Stephanie Sheehan, and Wolfgang John. We’d like to thank everyone who contributed to the original DevOps Topologies patterns, especially James Betteley, Jamie Buchanan, John Clapham, Kevin Hinde, and Matt Franz. A special thanks goes to John Cutler for a passionate outsider
by Tim O'Reilly · 9 Oct 2017 · 561pp · 157,589 words
make Internet sites run faster and more effectively. The Velocity Conference brought together a community working on a new discipline that came to be called DevOps, a portmanteau word combining software development and operations. (The term was coined a few months after the first Velocity Conference by Patrick Debois and Andrew
…
“Clay” Shafer, who ran a series of what they called “DevOps Days” in Belgium.) The primary insight of DevOps is that there were traditionally two separate groups responsible for the technical infrastructure of modern web applications: the developers who build the
…
it runs. And those two groups typically didn’t talk to each other, leading to unforeseen problems once the software was actually deployed at scale. DevOps is a way of seeing the entire software life cycle as analogous to the lean manufacturing processes that Toyota had identified for manufacturing
…
. DevOps takes the software life cycle and workflow of an Internet application and turns it into the workflow of the organization, building in measurement, identifying key
…
choke points, and clarifying the network of essential communication. In an appendix to The Phoenix Project, a novelized tutorial on DevOps created by Gene Kim, Kevin Behr, and George Spafford as homage to The Goal, the famous novel about the principles of lean manufacturing, Gene Kim
…
notes that speed is one of the key competitive advantages that DevOps brings to an organization. A typical enterprise might deploy new software once every nine months, with a lead time of months or quarters. At companies
…
,” he writes, “continuous experimentation . . . improve[s] the way we optimize business processes in our organizations.” But DevOps also brings higher reliability and better responsiveness to customers. Gene Kim characterizes what happens in a high-performance DevOps organization: “Instead of upstream Development groups causing chaos for those in the downstream work centers (e
…
each other’s time and contributions but also relentlessly injects pressure into the system of work to enable organizational learning and improvement.” The practices of DevOps have continued to evolve. Google calls its version of the discipline “Site Reliability Engineering” (SRE). As described by Benjamin Treynor Sloss, who coined the term
…
find more ways to make the consequences of bad action systemic, part of a high-velocity workflow akin to the way that Internet companies use DevOps to streamline and accelerate their internal business processes. This isn’t to say that we should throw out the concept of “due process” that is
…
; the Internet; new programming languages like Java, Perl, Python, and JavaScript; the best practices of the world’s leading programmers; and more recently, big data, DevOps, and AI. When, in 2000, our ad on the cover of Publishers Weekly baldly stated, “The Internet Was Built on O’Reilly Books,” everyone accepted
…
: The New Secret Sauce,” O’Reilly Radar, July 10, 2006, http://radar.oreilly.com/2006/07/operations-the-new-secret-sauc.html. 122 advantages that DevOps brings to an organization: Gene Kim, Kevin Behr, and George Spafford, The Phoenix Project, rev. ed. (Portland, OR: IT Revolution Press, 2014), 348–50. 122
…
1-Click e-commerce patent, 71–75 accessibility of data leads to AWS, 110–13 Andon Cord, 117–18 continuous improvement, 120–21, 122 and DevOps, 121–23 and electricity, 121, 124 on Linux operating system, 24 long-term investment priority, 245 and machine learning, 166 as a platform, 111–13
…
and centralization, 105–8 DeepMind, 165, 167, 168–69, 235 de Havilland Comet commercial jet, 217–18 Dell, Michael, 12 DeLong, Brad, 134 Denmark, 268 DevOps, 121–23 Dickens, Charles, 346 Dickerson, Mikey, 118–19, 146–47, 148 DiGiammarino, Frank, 129 digital footprint, physical assets with, 66–67 Digital Millennium Copyright
…
grant for, 132 and online mapping dominance, 127–28 pay-per-click ad auction, 161–62 and public sentiment about privacy, 82 Site Reliability Engineering/DevOps, 123 stock-based compensation, 280, 289–90 Street View cars collecting data, 33, 34–35 Google AlphaGo, x Google Book Search, 170–71 Google Finance
…
also individual platforms “Social Responsibility of Business Is to Increase Its Profits” (Friedman), 240 software, 15, 35 continuous improvement process, 30, 119–21, 122 and DevOps, 121–23 generative design, 327–28 MapReduce, 325 as organizational structure, 113–19 Perl, 10–11, 15, 16–17, 120–21 programmers as managers of
by Jan Kunigk, Ian Buss, Paul Wilkinson and Lars George · 8 Jan 2019 · 1,409pp · 205,237 words
Team Setup Compartmentalization of IT Revised Team Setup for Hadoop in the Enterprise Solution Overview with Hadoop New Team Setup Split Responsibilities Do I Need DevOps? Do I Need a Center of Excellence/Competence? Summary 6. Datacenter Considerations Why Does It Matter ? Basic Datacenter Concepts Cooling Power Network Rack Awareness and
…
footprint of roles (as well as your flexibility and control over infrastructure components). Do I Need DevOps? In a sense, the rise of the term DevOps mirrors the rise of corporate distributed computing. In some definitions DevOps signals a merging of the developer and operator roles into one, often skipping any compartmentalization as
…
the initiative on certain tasks around deployment automation and platform performance optimization. For the most part, however, our experience is that a full implementation of DevOps only occurs in companies whose entire business is centered around a few large-scale, web-based services; classic enterprise IT as a service to the
…
to provide strong isolation guarantees. Summary The ease of containerized application pods combined with horizontal scalability makes OpenShift intriguing for rapid development/deployment cycles and DevOps-driven environments. The current focus of OpenShift is to excel in scalability and provide orchestration for microservices. Hadoop is typically not a good fit for
…
data? technology primer, Big Data Technology Primer-Summary big data architects, Big data architect big data engineers, Big data engineer, SummaryDevOps responsibilities, Do I Need DevOps? skill profile, Big data engineer split responsibilities with other team roles, Split Responsibilities bigdata-interop project (Google), Hadoop integration binary releases of Hadoop, Installation Choices
…
Need a Center of Excellence/Competence?center for excellence/competence in solution with Hadoop, Do I Need a Center of Excellence/Competence? DevOps and solution with Hadoop, Do I Need DevOps? new team setup for solution with Hadoop, New Team Setup solution overview with Hadoop, Solution Overview with Hadoop split responsibilities in
…
for software development, Replication for Software Development multiple clusters for software development, Multiple Clusters for Software Developmentvariations in cluster sizing, Variation in cluster sizing DevOps, Do I Need DevOps? digest scheme (ZooKeeper), ZooKeeper direct bind (LDAP), LDAP Authentication disaster recovery, Multiple Clusters for Resiliency, Data Replication, Alternative solutions(see also backups and
…
Need a Center of Excellence/Competence?center of excellence/competence in solution with Hadoop, Do I Need a Center of Excellence/Competence? DevOps and solution with Hadoop, Do I Need DevOps? new team setup for solution with Hadoop, New Team Setup solution overview with Hadoop, Solution Overview with Hadoop split responsibilities in
…
team setup in business intelligence solution with Hadoop, New Team Setupcenter of excellence or competence, Do I Need a Center of Excellence/Competence? DevOps and, Do I Need DevOps? split responsibilities, Split Responsibilities revised team setup for Hadoop in the enterprise, Big data architectbig data architect, Big data architect big data engineer
by Sean Kane and Karl Matthias · 14 May 2023 · 433pp · 130,334 words
of containers allows development teams to take ownership of their work in production, making many facets of DevOps a reality. In a world where runtimes upgrade major versions regularly, teams and organizations are polyglot, DevOps practices like blue-green and canary releases are the norm, and scale is unprecedented, the technology that
…
to solve the complex workflow problems involved in developing and deploying software to production at scale. If you’re interested in Linux containers, Docker, Kubernetes, DevOps, and large, scalable, software infrastructures, then this book is for you. Why Read This Book? Today there are many conversations, projects, and articles on the
…
that fill similar roles. Instead of suffering from a complete lack of tools, there are now many robust choices for almost every aspect of the DevOps workflow. Wrapping your arms around the scope of what Linux containers and Docker provide, understanding how they fit into your workflow, and getting all the
…
integration and continuous delivery (CI/CD) workflow should function. Instead of each step involving a time-consuming process managed by specialists, most people expect a DevOps pipeline to be fully automated and flow from one step to the next without any human intervention. The technologies in that list are also generally
…
-new application into production can take the better part of a week for a complex new system. That’s not very productive, and even though DevOps practices work to alleviate many of the barriers, it often still requires a lot of effort and communication between teams of people. This process can
…
the development teams. The Docker Ecosystem Over the years, a wide community has formed around Docker, driven by both developers and system administrators. Like the DevOps movement, this has facilitated better tools by applying code to operations problems. Where there are gaps in the tooling provided by Docker, other companies and
…
final building technique that we will cover here is related to keeping build times as fast as possible. One of the important goals of the DevOps movement is to keep feedback loops as tight as possible. This means that it is important to try to ensure that problems are discovered and
…
a nice bridge between a single Linux container and a production orchestration system. It delivers a lot of value in development environments and throughout the DevOps pipeline. Chapter 8. Exploring Docker Compose At this point, you should have a good feel for the docker command and how to use it to
…
job, but having flexibility is a great place to be, and Linux containers are increasingly supported by more and more powerful tooling. Docker and the DevOps Pipeline So once we have considered and implemented all of that functionality, we should have our production environment in robust shape. But how do we
…
their dependencies. The Docker Workflow Docker’s workflow helps organizations tackle really hard problems—some of the same problems that DevOps processes are aimed at solving. A major problem in incorporating DevOps successfully into a company’s processes is that many people have no idea where to start. Tools are often incorrectly
…
to travel through their whole lifecycle, from conception to retirement, within one ecosystem. Unlike other tools that often target only a single aspect of the DevOps pipeline, Docker significantly improves almost every step of the process. That workflow is often opinionated, but it simplifies the adoption of some of the core
…
principles of DevOps. It encourages development teams to understand the whole lifecycle of their application and allows operations teams to support a much wider variety of applications on
…
image registry for hosting images. While not revolutionary on the face of it, the registry helps split team responsibilities clearly along the lines embraced by DevOps principles. Developers can build their application, test it, ship the final image to the registry, and deploy the image to the production environment, while the
…
production, Packaging and Delivery dependenciesabout Docker, Process Simplification-Process Simplification BuildKit directory caching, Directory Caching Docker’s role in production, ConfigurationDocker and the DevOps pipeline, Docker and the DevOps Pipeline explicitly declare and isolate, Dependencies filesystem layers of Linux containers, Filesystem layers filesystem state storage, Externalizing State persistent storage, Storage Volumes reliability
…
cycle twelve factors (see twelve-factor app) development/production parity, Development/Production Parity Device Mapper, Working with Docker Images, Storage DevOps pipeline and Dockerabout, The Promise of Docker, Docker and the DevOps Pipeline external dependencies, Outside Dependencies overview, Quick Overview-Quick Overview directory caching, Directory Caching-Directory Cachingdocumentation, Directory Caching disposability of
…
container, Anatomy of a Dockerfile stateless, Processes productionabout container platform design, Container Platform Design development/production parity, Development/Production Parity Docker and the DevOps pipelineabout, Docker and the DevOps Pipeline external dependencies, Outside Dependencies overview, Quick Overview-Quick Overview Docker’s roleabout, Docker’s Role in Production Environments-Docker’s Role in
…
traffic, Network Ports and Unix Sockets tcpdump command, Network Inspection terminology, Important Terminology testingDocker and the DevOps pipelineabout, Docker and the DevOps Pipeline external dependencies, Outside Dependencies overview, Quick Overview-Quick Overview Docker workflow, TestingDocker and the DevOps pipeline, Quick Overview testing your private registry, Testing the private registry time command, Utilizing the
…
Simplification-Process Simplification revision controlabout, Revision Control filesystem layers, Filesystem layers image tags, Image tags starting with default networking, Container Networking testing, TestingDocker and the DevOps pipeline, Quick Overview tools for additional capabilities, The Docker Ecosystem-Additional toolsimmutable atomic hosts, Immutable atomic hosts orchestration, Orchestration X xhyve, GUI installer Y YAML
…
P. Kane is the founder of techlabs.sh and a principal production operations engineer at SuperOrbital. Sean specializes in engineering, teaching, and writing about modern DevOps processes, including Kubernetes, Docker, Terraform, and more. He has had a long career in production operations, with many diverse roles across a broad range of
by Sam Newman · 14 Nov 2019 · 355pp · 81,788 words
of our development, technology, and architecture choices. Some work is already being done to this end in the form of things like “The State of DevOps Report”, but that looks at architecture only in passing. In lieu of these rigorous studies, we should at least strive to apply more critical thinking
…
past for many organizations. Instead, test specialists are becoming an integrated part of delivery teams, enabling developers and testers to work more closely together. The DevOps movement has also led in part to many organizations shifting away from centralized operations teams, instead pushing more responsibility for operational considerations onto the delivery
…
change you want to bring about, just as with our software, you can make this happen in an incremental fashion. DevOps Doesn’t Mean NoOps! There is widespread confusion around DevOps, with some people assuming that it means that developers do all the operations, and that operations people are not needed. This
…
is far from the case. Fundamentally, DevOps is a cultural movement based on the concept of breaking down barriers between development and operations. You may still want specialists in these roles, or
…
, no matter what their specific responsibilities are. For more on this, I recommend Team Topologies,9 which explores DevOps organizational structures. Another excellent resource on this topic, albeit broader in scope, is The Devops Handbook.10 Making a Change So if you shouldn’t just copy someone else’s structure, where should
…
for the names here! 9 Manuel Pais and Matthew Skelton, Team Topologies (IT Revolution Press, 2019). 10 Gene Kim, Jez Humble, and Patrick Debois, The DevOps Handbook (IT Revolution Press, 2016). 11 Yes, this has happened. It’s not all fun and games and Kubernetes…. Chapter 3. Splitting the Monolith In
…
than to say they run contrary to the principles of continuous integration. I could also throw in that data gathered from “The 2017 State of DevOps Report” shows that embracing trunk-based development (where changes are made directly on the main line and branches are avoided) and using short-lived branches
…
Edition. Addison Wesley, 1995. Bryant, Daniel. “Building Resilience in Netflix Production Data Migrations: Sangeeta Handa at QCon SF.” http://bit.ly/2m1EwHT. Devops Research & Assessment. Accelerate: State Of Devops Report 2018. http://bit.ly/2nPDNLe. Evans, Eric. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley, 2003. Feathers, Michael
…
. Humble, Jez. “Make Large-Scale Changes Incrementally with Branch by Abstraction.” http://bit.ly/2p95lv7. Kim, Gene, Patrick Debois, Jez Humble, and John Willis. The Devops Handbook. IT Revolution Press, 2016. Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017. Kniberg, Henrik, and Anders Ivarsson. “Scaling Agile @ Spotify.” October 2012. http
…
Show Itself? developerslocal developer experience problem, Local Developer Experience-Potential Solutions scaling number of, How else could you do this? DevOps, Shifting Structuresnot meaning NoOps, It’s Not One Size Fits All DevOps Handbook, The (Kim, Humble, and Debois), It’s Not One Size Fits All distributed monoliths, The Distributed Monolith distributed
…
All Spotify UI, componentized, And mobile applications Square orders example, tracer write pattern, Example: Orders at Square-Migrating consumers startups, microservices and, Startups State of DevOps Report (2017), Pattern: Branch by Abstraction static reference data library pattern, Pattern: Static reference data library, Pattern Index static reference data service pattern, Pattern: Static
…
Lacking Atomicity? avoiding use of distributed transactions, Distributed Transactions—Just Say No compensating, Saga rollbacks two-phase commits, Two-Phase Commits “The 2017 State of DevOps Report”, Pattern: Branch by Abstraction two-phase commits (2PCs), Two-Phase Commits U UI composition pattern, Pattern: UI Composition-Where to Use It, Pattern Indexexample
by Casey Rosenthal and Nora Jones · 27 Apr 2020 · 419pp · 102,488 words
many different environments. John’s publications include The Art of Capacity Planning and Web Operations (both O’Reilly) as well as the foreword to The DevOps Handbook (IT Revolution Press). His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the
…
DevOps movement. John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University. 1 David D. Woods, STELLA:
…
likely to adopt Chaos Engineering or seek out the discipline for obvious reasons. This is often followed by internal championing, with advocacy often found in DevOps, SRE, and Incident Management teams. In more traditional organizations, this often falls to Operations or IT. These are the teams that understand the pressure of
…
organization, at every level of the hierarchy, even if a centralized team provides tooling to make it easier. This is similar to how in a DevOps-activated organization, every team is responsible for the operational properties of their software, even though centralized teams may specifically contribute to improving some of those
…
require practices that favor method of exploration of the unknown (experimentation) over the known (testing). Tools are required in order to scale, whereas methodologies (Agile, DevOps, SRE, etc.) necessitate digital transformations and cultural changes along with expensive investments in staff to implement them. Verification addresses business outcomes, which are a more
…
and process improvement methods such as Lean, Six Sigma, knowledge management, training, and so on. In the software world, this would include XP, Agile, and DevOps, but the term HPT usually applies to manufacturing and is not commonly applied to software. Chapter 19. Chaos Engineering on a Database Liu Tang and
…
secure their systems. The goal of these experiments is to move security in practice from subjective assessment into objective measurement. As they do in the DevOps world, chaos experiments allow security teams to reduce the “unknown unknowns” and replace “known unknowns” with information that can drive improvements to security posture. By
…
own series of security chaos experiments using the framework provided by the project as a guide. Conclusion As enterprises adopt cloud native stacks and the DevOps model, their security programs must evolve to meet new demands such as the frequency of system-wide changes enabled by continuous deployment. Traditional security testing
by Rennay Dorasamy · 2 Dec 2021 · 328pp · 77,877 words
by Gourav Shah · 29 Jul 2015 · 178pp · 33,275 words
by Will Larson · 19 May 2019 · 227pp · 63,186 words
by Harihara Subramanian · 31 Jan 2019 · 422pp · 86,414 words
by Sean P. Kane and Karl Matthias · 15 Mar 2018 · 350pp · 114,454 words
by Ron Jeffries · 14 Aug 2015 · 444pp · 118,393 words
by Yevgeniy Brikman · 13 Mar 2017
by Jeff Lawson · 12 Jan 2021 · 282pp · 85,658 words
by Richard A. Clarke and Robert K. Knake · 15 Jul 2019 · 409pp · 112,055 words
by Dominica Degrandis and Tonianne Demaria · 14 May 2017 · 153pp · 45,721 words
by Heather Adkins, Betsy Beyer, Paul Blankinship, Ana Oprea, Piotr Lewandowski and Adam Stubblefield · 29 Mar 2020 · 1,380pp · 190,710 words
by Eric Brechner · 25 Feb 2015
by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy · 15 Apr 2016 · 719pp · 181,090 words
by Sebastien Donadio · 7 Nov 2019
by Marianne Bellotti · 17 Mar 2021 · 232pp · 71,237 words
by John Arundel · 16 Apr 2013 · 241pp · 43,073 words
by Felix Frank · 20 Nov 2014 · 234pp · 63,522 words
by Valliappa Lakshmanan, Sara Robinson and Michael Munn · 31 Oct 2020
by Harry J. W. Percival · 10 Jun 2014 · 779pp · 116,439 words
by Jesse Keating
by Martin Kleppmann · 16 Mar 2017 · 1,237pp · 227,370 words
by James Turnbull · 1 Dec 2014 · 514pp · 111,012 words
by M. Omar Faruque Sarker · 15 Feb 2014 · 234pp · 57,267 words
by Lorin Hochstein · 8 Dec 2014 · 761pp · 80,914 words
by Jo Rhett · 24 Mar 2016
by Paul Scharre · 23 Apr 2018 · 590pp · 152,595 words
by James Higginbotham · 20 Dec 2021 · 283pp · 78,705 words
by Frank J. Ohlhorst · 28 Nov 2012 · 133pp · 42,254 words
by James Turnbull · 13 Jul 2014 · 265pp · 60,880 words
by Martin Kleppmann · 17 Apr 2017
by Sean Ellis and Morgan Brown · 24 Apr 2017 · 344pp · 96,020 words
by John Arundel · 25 Aug 2013 · 274pp · 58,675 words
by Aaron Dignan · 1 Feb 2019 · 309pp · 81,975 words
by James Silver · 15 Nov 2018 · 291pp · 90,771 words
by Q. Ethan McCallum · 14 Nov 2012 · 398pp · 86,855 words
by Nadia Eghbal · 139pp · 35,022 words
by VM (Vicky) Brasseur · 266pp · 79,297 words
by Steve Klabnik and Carol Nichols · 14 Jun 2018 · 821pp · 178,631 words
by Alasdair Gilchrist · 27 Jun 2016
by Robert C. Martin · 13 Oct 2019 · 333pp · 64,581 words
by David Reed · 31 Aug 2021 · 168pp · 49,067 words
by Steve Klabnik and Carol Nichols · 27 Feb 2023 · 648pp · 183,275 words
by Andrew McAfee · 14 Nov 2023 · 381pp · 113,173 words
by Fabio Nelli · 27 Sep 2018 · 688pp · 107,867 words
by Brigid Schulte · 11 Mar 2014 · 455pp · 133,719 words
by Drew Neil · 2 May 2018 · 241pp · 43,252 words
by Eric Ries · 15 Mar 2017 · 406pp · 105,602 words
by Mitchell Hashimoto · 29 May 2013 · 192pp · 44,789 words
by Fabio Alessandro Locati · 21 Nov 2016
by Nigel Poulton · 10 May 2020
by Susan Fowler · 18 Feb 2020 · 205pp · 71,872 words
by Laura Shin · 22 Feb 2022 · 506pp · 151,753 words
by Ash Fontana · 4 May 2021 · 296pp · 66,815 words
by Matt Copperwaite and Charles Leifer · 26 Nov 2015
by Lee Atchison · 25 Jul 2016 · 255pp · 55,018 words
by Jono Bacon · 1 Aug 2009 · 394pp · 110,352 words