Enum values: Integer or string?

I’ve been writing software for a few decades and historical momentum has been to define enum values as integers, probably because earlier languages supported integers, not strings.

class Color(Enum):
  UNINITIALIZED: 0
  RED:           1
  GREEN:         2
  BLUE:          3

So it was a surprise when a discussion among engineers went the other direction — encouraging strings as enum values — for newly introduced enums. It seems that industry momentum may have shifted this direction. Here’s a contrived example that I do not recommend:

class Color(Enum):
  UNKNOWN:  ""
  RED:      "rood"     # Note: Dutch
  GREEN:    "verde"    # Note: Spanish
  BLUE:     "bleu"     # Note: French
  LAVINDRE: "lavndir"  # Note: Purposefully misspelled

Benefits of using string enum values include:

  • Clarity when viewing logs. However, it is possible to log integer enum values as strings.
  • Clarity when debugging. However, some debuggers may already add clarity for integer-based enums.
  • Disparate teams across the organization can more readily understand the data without having to refer to documentation.
  • API interoperability and clarity.
  • Ease of adding new items in the middle of the enum, whereas with a sequential integer approach, it is not possible.
  • Default value as empty string. It has always been a pain when an engineer decided to use 0 to indicate anything other than “default”, “uninitialized” or “unknown”, because 0 is often the default value for uninitialized variables and properties. With strings, an empty string is ideal.
  • It’s still possible to represent string-value enums as integers when persisting to a database or interoperating with other systems — although it may require additional work.

Caveats and counter indications:

  • Legacy. Keep what works and don’t break things, especially when disparate systems depend on existing enumerated integers. Translation layers can be a solution.
  • Lack of programming language support, although there are often alternatives including use of constants.
  • When the enum string values are misnamed or poorly named, it can cause confusion and bugs due to misunderstandings.
  • Who gets do decide which human language we use for the string values? Who will be consuming the values? Do they understand the language? I gave my example above with three different language values to emphasize that it’s best to stick with a single language.
  • Misspelled values may need to remain misspelled forever to prevent breaking disparate consumers. Software engineers are human, and they make mistakes. We don’t usually misspell integer values, and when do, the software either won’t compile or won’t run. Translation layers can be a solution.
  • Upper or lower case? Camel case or snake case? Decide on a standard.
  • Performance critical systems and RAM/storage constrained systems may do better with integer values.
  • Some databases support enums — the underlying data is an integer, and the database represents it as a string, including in queries and exports — so it could be possible to get the best of both worlds — performance and clarity for humans.
  • Isolate the churn of a ever-changing names. The integers can stay the same and the enum keys can change to reflect current needs. E.g. marketing names tend to change.