An Elegant Puzzle

Systems of Engineering Management

William Larson

Managers should support six to eight engineers This gives them enough time for active coaching, coordinating, and furthering their team’s mission by writing strategies, 2 leading change, 3 and so on.

219 ↱

Tech Lead Managers (TLMs). Managers supporting fewer than four engineers tend to function as TLMs, taking on a share of design and implementation work. For some folks this role can uniquely leverage their strengths, but it’s a role with limited career opportunities. To progress as a manager, they’ll want more time to focus on developing their management skills. Alternatively, to progress toward staff engineering roles, they’ll find it difficult to spend enough time on the technical details.

223 ↱

For production on-call responsibilities, 4 I’ve found that two-tier 24/ 7 support requires eight engineers.

240 ↱

Teams with fewer than four individuals are a sufficiently leaky abstraction that they function indistinguishably from individuals. To reason about a small team’s delivery, you’ll have to know about each on-call shift, vacation, and interruption. They are also fragile, with one departure easily moving them from innovation back into toiling to maintain technical debt.

249 ↱

Teams should be six to eight during steady state. To create a new team, grow an existing team to eight to ten, and then bud into two teams of four or five. Never create empty teams. Never leave managers supporting more than eight individuals.

258 ↱

You’ll also have limited resources to apply, and they’ll usually be insufficient to simultaneously move every team down the continuum. Many folks try to move all teams at the same time, peanut buttering7 their limited resources, but resist that indecision-framed-as-fairness: it’s not a fair outcome if no one gets anything. For each constraint, prioritize one team at a time. If most teams are falling behind, then hire onto one team until it’s staffed enough to tread water, and only then move to the next. While this is true for all constraints, it’s particularly important for hiring. Adding new individuals to a team disrupts that team’s gelling process, so I’ve found it much easier to have rapid growth periods for any given team, followed by consolidation/ gelling periods during which the team gels. The organization will never stop growing, but each team will.

324 ↱

Fundamentally, I believe that sustained productivity comes from high-performing teams, and that disassembling a high-performing team leads to a significant loss of productivity, even if the members are fully retained. In this worldview, high-performing teams are sacred, and I’m quite hesitant to disassemble them. Teams take a long time to gel. When a group has been working together for a few years, they understand each other and know how to set each other up for success in a truly remarkable way. Shifting individuals across teams can reset the clock on gelling, especially for teams in the early stages of gelling,

344 ↱

I’ve found it most fruitful to move scope between teams, preserving the teams themselves. If a team has significant slack, then incrementally move responsibility to them, at which point they’ll start locally optimizing their expanded workload. It’s best to do this slowly to maintain slack in the team, but if it’s a choice of moving people rapidly or shifting scope rapidly, I’ve found that the latter is more effective and less disruptive. Shifting scope works better than moving people because it avoids re-gelling costs, and it preserves system behavior.

380 ↱

The other approach that I’ve seen work well is to rotate individuals for a fixed period into an area that needs help. The fixed duration allows them to retain their identity and membership in their current team, giving their full focus to helping out, rather than splitting their focus between performing the work and finding membership in the new team. This is also a safe way to measure how much slack the team really has!

385 ↱

With high interview loads, you’ll sometimes notice last year’s solid interviewer giving a poor experience to a candidate or rejecting every incoming candidate. If your engineer is doing more than three interviews a week, it is a useful act of mercy to give them a month off every three or four months.

475 ↱

There are a non-zero number of companies that do internal documentation well, but I’m less sure if there are a non-zero number of companies with more than 20 engineers that do this well. If you know any, please let me know so that I can pick their brains.

499 ↱

Finally, a related antipattern is the gatekeeper pattern. Having humans who perform gatekeeping activities creates very odd social dynamics, and is rarely a great use of a human’s time. When at all possible, build systems with sufficient isolation that you can allow most actions to go forward. And when they do occasionally fail, make sure that they fail with a limited blast radius.

510 ↱

If you really want a solid grasp on systems thinking fundamentals, you should read Thinking in Systems: A Primer3 by Donella H. Meadows,

609 ↱

An effective vision helps folks think beyond the constraints of their local maxima, and lightly aligns progress without requiring tight centralized coordination. You should be writing from a place far enough out that the error bars of uncertainty are indisputably broad, where you can focus on the concepts and not the particulars. Visions should be detailed, but the details are used to illustrate the dream vividly, not to prescriptively constrain its possibilities.

811 ↱

Good goals are a composition of four specific kinds of numbers: A target states where you want to reach. A baseline identifies where you are today. A trend describes the current velocity. A time frame sets bounds for the change. Put these all together, and a well-structured goal takes the form of: “In Q3, we will reduce time to render our frontpage from 600ms (p95) to 300ms (p95). In Q2, render time increased from 500ms to 600ms.” The two tests of an effective goal are whether someone who doesn’t know much about an area can get a feel for a goal’s degree of difficulty, and whether afterward they can evaluate if it was successfully achieved. If you define all four aspects, typically your goal will fulfill both criteria.

862 ↱

plug for Ryan Lopopolo’s amazing blog post on “Effectively Using AWS Reserved Instances.”

905 ↱

Migrations are both essential and frustratingly frequent as your codebase ages and your business grows: most tools and processes only support about one order of magnitude of growth22 before becoming ineffective, so rapid growth makes migrations a way of life. This isn’t because you have bad processes or poor tools—quite the opposite. The fact that something stops working at significantly increased scale is a sign that it was designed appropriately to the previous constraints rather than being over-designed.

966 ↱

In general, the actual tactics for doing this are: Discuss with heavily impacted individuals in private first. Ensure that managers and other key individuals are prepared to explain the reasoning behind the changes. Send an email out documenting the changes. Be available for discussion. If necessary, hold an organization all-hands, but probably try not to. People don’t process well in large groups, and the best discussions take place in small rooms. Double down on doing skip-level one-on-ones.

1119 ↱

Managers tend to have a strong sense of the business’s needs, and that gives them the superpower of finding the intersection of your interests and the business’s priorities. That translation is a creative pursuit, so don’t leave this entirely to your manager: participate as well!

1211 ↱

Start with the conclusion. Particularly in written communication, folks skim until they get bored and then stop reading. Accommodate this behavior by starting with what’s important, instead of building toward it gradually.

1373 ↱

My general approach to presenting to senior leaders is: Tie topic to business value. One or two sentences to answer the question “Why should anyone care?” Establish historical narrative. Two to four sentences to help folks understand how things are going, how we got here, and what the next planned step is. Explicit ask. What are you looking for from the audience? One or two sentences. Data-driven diagnosis. Along the lines of a strategy’s diagnosis phase, 39 explain the current constraints and context, primarily through data. Try to provide enough raw data to allow people to follow your analysis. If you only provide analysis, then you’re asking folks to take you on trust, which can come across as evasive. This should be a few paragraphs, up to a page. Decision-making principles. Explain the principles that you’re applying against the diagnosis, articulating the mental model you are using to make decisions. What’s next and when it’ll be done. Apply your principles to the diagnosis to generate the next steps. It should be clear to folks reading along how your actions derive from your principles and the data. If it’s not, then either rework your principles or your actions! Return to explicit ask. The final step is to return to your explicit ask and ensure that you get the information or guidance you need.

1406 ↱

Decouple participation from productivity. As you grow more senior, you’ll be invited to more meetings, and many of those meetings will come with significant status. Attending those meetings can make you feel powerful, but you have to keep perspective about whether you’re accomplishing much by attending. Sometimes, being able to convey important context to your team is super valuable, and in those cases you should keep attending, but don’t fall into the trap of assuming that attendance is valuable.

1464 ↱

The fixed cost of creating and maintaining a policy is high enough that I generally don’t recommend writing policies that do little to constrain behavior. In fact, that’s a useful definition of bad policy. In such cases, I instead recommend writing norms, which provide nonbinding recommendations. Because they’re nonbinding, they don’t require escalations to address ambiguities or edge cases.

1561 ↱

Policy success is directly dependent on how we handle requests for exception. Granting exceptions undermines people’s sense of fairness, and sets a precedent that undermines future policy. In environments where exceptions become normalized, leaders often find that issuing writs of exception—for policies they themselves have designed—starts to swallow up much of their time. Organizations spending significant time on exceptions are experiencing exception debt. The escape is to stop working the exceptions, and instead work the policy.

1577 ↱

The next time you’re about to dive into fixing a complicated one-off situation, consider taking a step back and documenting the problem but not trying to solve it. Commit to refreshing the policy in a month, and batch all exceptions requests until then. Merge the escalations and your current policy into a new revision. This will save your time, build teams’ trust in the system, and move you from working the exceptions to working the policy.

1594 ↱

expressing your priorities convincingly can be a difficult, daunting task. I recommend breaking it down into three discrete steps: document all your incoming asks, develop guiding principles for how work is selected, and then share subsets of tasks you’ve selected based on those guiding principles.

1644 ↱

“With the right people, any process works, and with the wrong people, no process works.”

1694 ↱

Process is a tool to make it easy to collaborate, and the process that the team enjoys is usually the right process. If your process is failing somehow, it’s worth really digging into how it’s failing before you start looking for another process to replace it. As you start homing in on the problem (maybe it’s you!), honestly ask yourself if a different process would address it, or if you’re moving around the food on your plate. My experience is that a different process probably isn’t the solution you’re looking for.

1695 ↱

Long bones have growth plates at their ends, which is where the growth happens, and the middle doesn’t grow. This is a pretty apt metaphor for rapidly growing companies, and a useful mental model when trying to understand why your behaviors might not be resonating in a new role.

1737 ↱

You’d expect that novel ideas would be heavily valued in these circumstances, but, interestingly, it’s the opposite: execution is the primary currency in the growth plates. That’s because you typically have a surplus of fairly obvious ideas to try, and there is constrained bandwidth for evaluating those ideas. It’s common for well-meaning individuals from outside the growth plates to jump in to help by supplying more ideas, but that’s counterproductive. What folks in the growth plates need is help reducing and executing the existing backlog of ideas, not adding more ideas that must be evaluated. Teams in these scenarios are missing the concrete resources necessary to execute, and supplying those resources is the only way to help. Giving more ideas feels helpful, but isn’t. Finally, I think it’s important to recognize that, in the growth plates, you are focused on surviving to the next round, which might be a different growth challenge, or might be the team stabilizing. It is extremely hard to consistently do the basics well in these circumstances, because you simply won’t have enough time to do them well. You’ll have to get comfortable doing as well as time constraints allow, and sometimes that will lead to being mediocre at things you’re passionate about.

1743 ↱

an inclusive organization is one in which individuals have access to opportunity and membership. Opportunity is having access to professional success and development. Membership is participating as a version of themselves that they feel comfortable with.

1970 ↱

Tom DeMarco’s Slack13 has an excellent suggestion for a good starting state between positive and negative freedoms for engineering teams: generally follow the standard operating procedure (i.e., keep doing what you’re already doing, the way you’re doing it), but always change exactly one thing for each new project. Perhaps use a new database, a new web server, a different templating language, a static JavaScript front-end, whatever—but always change exactly one thing.

2218 ↱

Working at a company isn’t a single continuous experience. Rather it’s a mix of stable eras and periods of rapid change that bridge between eras. Thriving requires both finding a way to succeed in each new era and successfully navigating the transitional periods. You yourself trigger some transitions, like switching companies. Others happen on their own schedule: a treasured coworker leaves, your manager moves on, or the company runs out of funding.

2316 ↱

both the stable eras and the transitions are great opportunities for growing yourself. Transitions are opportunities to raise the floor by building competency in new skills, and in stable periods you can raise the ceiling by developing mastery in the skills that the new era values. As the cycle repeats, your elevated floor will allow you to weather most transitions, and you’ll thrive in most eras by leveraging your expanding masteries.

2329 ↱

(if you’re hitting your hiring goals—and with enough dedicated sourcers, any process will hit your hiring goals—then it can be hard to prioritize improving your process).

2353 ↱

while interviewing well is far from easy, it is fairly simple. Be kind to the candidate. Ensure that all interviewers agree on the role’s requirements. Understand the signal your interview is checking for (and how to search that signal out). Come to your interview prepared to interview. Deliberately express interest in candidates. Create feedback loops for interviewers and the loop’s designer. Instrument and optimize as you would any conversion funnel.

2355 ↱

Almost every unkind interviewer I’ve worked with has been either suffering from interview burnout after doing many interviews per week for many months or has been busy with other work to the extent that they have started to view interviews as a burden rather than a contribution. To fix that, give them an interview sabbatical for a month or two, and make sure that their overall workload is sustainable before moving them back into the interview rotation. Identifying interview burnout is also one of the areas where having a strong relationship with open communication between engineering managers and recruiters is important. Having two sets of eyes looking for these signals helps.

2372 ↱

it’s easy to work for a long time without building up a large personal network if you work in a smaller market or at a series of small companies. (One of the side benefits of working at a large company early in your career, beyond name recognition, is kickstarting your personal network.)

2481 ↱

What I’ve seen work best is to be tolerant of career ladder proliferation—really try to make a ladder for each unique role—but to only invest significant time in refining any given ladder as it becomes applicable for more employees. As a rule of thumb, any ladder with more than 10 individuals should probably be fully fleshed out, but smaller functions can probably survive with something rough.

2659 ↱

A good ladder allows individuals to accurately self-access; these ladders are self-contained and short. A bad ladder is ambiguous and requires deep knowledge of precedent to apply correctly. If there is one component of performance management that you’re going to invest into doing well, make it the ladders: everything else builds on this foundation.

2675 ↱

Most companies start out using a single scale to represent performance designations, often whole numbers from one to five. Over time, these often move toward the nine-blocker format, a three-by-three grid with one axis representing performance and the other representing trajectory. Having used a number of systems, I prefer to use the simplest representation possible. The extra knobs in more complicated systems support more granularity, but my sense is that they simply create the impression of rigor while remaining equally challenging to implement in a consistent, fair way.

2685 ↱

If a company is experiencing particularly frequent level expansion, it is usually a sign that progression, compensation, or recognition has been overly tied to your level system, and you should identify mechanisms to reduce pressure on leveling.

2781 ↱

Folks often look at new roles as less important, framing them as service roles to absorb work they’re not interested in. Sometimes roles are even explicitly designed this way, intended to reduce work for another role as opposed to having an empowering mission of their own.

2845 ↱

As you move away from generalized roles and toward specialists, an unexpected consequence is that your organization has far more single points of failure. Where everyone on a team was once able to perform all tasks fairly effectively, now if your project manager leaves, you’ll find that no one is able to fill the role very capably. This brittleness is particularly acute in organizations with frequent structural changes.

2848 ↱

When a new role is created, the role’s designers have a very clear vision of how they want the new function to work. Many other individuals are not particularly concerned with how the creators want the function to work, and will view it as an opportunity to offload tasks that they find challenging, difficult, or uninteresting. This can lead to new roles being immediately underwater, which often feels like success to leaders attempting to grow the size of their organization. However, that can easily translate into an unlovable work experience for those performing the role.

2856 ↱

people want the first hires they make into a new role to be strong role models for the entire function. This often leads to a proliferation of requirements until it’s impossible for any candidate to pass the bar.

2871 ↱

New roles are frequently described in terms of how they’ll impact other functions, rather than in terms of what they’ll accomplish. For example, you might describe technical program managers (TPMs) as offloading project management responsibilities from engineering managers. This approach frames the role as an auxiliary, support function, which makes it difficult to recognize the work’s impact. You must be able to frame the role’s work without referencing other existing roles in order for it to succeed long-term.

2888 ↱

As a rule of thumb, I would always create a new role if it immediately covered 20 individuals; would reluctantly create a new role if it would cover 20 individuals within two years; and would be skeptical of creating a new role that couldn’t meet one of those two conditions!

2939 ↱

Writing a great interview loop is almost identical to writing a great career ladder. 25 If you’ve already written expectations for the role, reuse those as much as possible.

2994 ↱

Teams and organizations have a very limited appetite for new process; try to roll out one change at a time, and don’t roll out the next change until the previous change has enthusiastic compliance. Process needs to be adapted to its environment, and success comes from blending it with your particular context.

3019 ↱

The other mechanism I’ve found to be exceptionally useful at this point is team snippets. These come out every two to four weeks and give snapshots of each team’s sprints: what they’re doing, why they’re doing it, and what they’re planning to do next. These are valuable for you to retain a sense of what your teams are working on, but they are invaluable for decentralizing coordination and communication between teams in your organization, as you become increasingly ineffective in that role.

3084 ↱

Loading highlights…