How to rate people and teams

Notes from June,  2022 about about how to rate people and teams as I read Nine Lies About Work. The notes are my own (although the quotes are not), and I may have misrepresented some of what the book was saying; some of the examples are ones I made up.

Lie: People can reliably rate other people (on an abstract attribute such as performance or strategic thinking)

Summary: Although we can’t reliably rate other people, we can reliably ask about our own experience and our own intentions “right now” in regard to those people. (Sidenote: we aren’t reliable at rating our own personality or performance.)

Why:

  1. The Idiosyncratic Rater Effect — accounts for 54% of how people rate others (or so says the book)
  2. Data insufficiency: we don’t “see” people enough to be able to rate them accurately
  3. There’s no single definition of “performance” between people, let alone teams.

“Lucy’s pattern of rating does not change when she rates two different people. Instead her ratings stay just about the same – her ratings pattern travels with her, regardless of who she’s rating, so her ratings reveal more about her than they do about her team members. We think that rating tools are windows that allow us to see out to other people, but they’re really just mirrors.”


  • Crowds are not smarter than a few people. Truth: well-informed crowds are wise. Ill-informed crowds are not.
  • Noise plus noise never equals signal.
  • Bad data contaminates all the good data.
  • Adding bad data to good, or the other way around, doesn’t improve the quality of the data or make up for its inherent shortcomings.

Good data has three distinct characteristics:

  • it is reliable
  • it is variable
  • it is valid

Reliable:

  • stable and predictable measurements
  • something that can be counted
  • doesn’t mean accurate… it means the data doesn’t fluctuate randomly
  • Your height is reliable data. I.e. we can trust the data-gathering tool

Variable:

  • The range of data reflects the natural world.
  • E.g. when a car salesman is rated 5/5 by all customers, there’s no variation, and therefore the measure doesn’t mean anything in the real world.
  • E.g. when a thermometer bottoms out or maxes out on measuring real world temperatures, it’s not variable enough.
  • When it comes to rating our own human experience, “to produce a range in our rating tools, we have to create questions that contain extreme wording.”

Valid: a.k.a. criterion-related validity

  • Once piece of reliable data predicts another piece of reliable data
  • Does a high score on the measurement tool predict a high score on something else in the real world?
  • When a scatter plot of the data shows that it clusters to one end of the scale, it’s probably not good data.

A more reliable way to capture data about people. “People can reliably rate their own experience” and intentions “right now”.

We can ask a manager/peer questions like this (and do it in a timely fashion, not asking them to reflect 4 months later):

  • Do you always go to this team member when you need extraordinary results?
  • Do you choose to work with this team member as much as you possibly can?
  • Would you promote this person today if you could?
  • Do you think this person has a performance problem that you need to address immediately?

Thought: Regardless of human limitations in accurately rating others, we do it anyway. Recognizing that our ratings of others are biased and inaccurate is hopefully a step toward humility, and thus to more fully loving others, and trying to avoid incorrect ratings 


Here’s the author’s website with three pages, all brief. The first explains the problem, the second explains a solution approach, and the third gives a concrete example of questions that he asserts “predict high-performing teams” in four areas: Purpose, Excellence, Support, and Future.

https://www.marcusbuckingham.com/good-data-bad-data-idiosyncratic-rater-effect

https://www.marcusbuckingham.com/data-fluency-series-current-tools-good-data

https://www.marcusbuckingham.com/data-fluency-series-case-study