Iterating the python3 exception chain

I use the python requests library to make HTTP requests. Handling exceptions and giving the non-technical end user a friendly message can be a challenge when the original exception is wrapped up in an exception chain. For example:

import requests

url = "http://one.two.threeFourFiveSixSevenEight"
try:
    resp = requests.get(url)
except requests.RequestException as e:
    print("Couldn't contact", url, ":", e)

Prints:

Couldn’t contact http://one.two.threeFourFiveSixSevenEight : HTTPConnectionPool(host=’one.two.threeFourFiveSixSevenEight’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘<requests.packages.urllib3.connection.HTTPConnection object at 0x7f527329c978>: Failed to establish a new connection: [Errno -2] Name or service not known’,))

And that’s a mouthful.

I want to tell the end user that DNS isn’t working, rather than showing the ugly stringified error message. How do I do that, in python3? Are python3 exceptions iterable? No. So I searched the internet, and found inspiration from the raven project. I adapted their code in two different ways to give me the result I wanted.

Update Aug 10: See the end of this blog post for a more elegant solution.

import socket
import requests
import sys

def chained_exceptions(exc_info=None):
    """
    Adapted from: https://github.com/getsentry/raven-python/pull/811/files?diff=unified

    Return a generator iterator over an exception's chain.

    The exceptions are yielded from outermost to innermost (i.e. last to
    first when viewing a stack trace).
    """
    if not exc_info or exc_info is True:
        exc_info = sys.exc_info()

    if not exc_info:
        raise ValueError("No exception found")

    yield exc_info
    exc_type, exc, exc_traceback = exc_info

    while True:
        if exc.__suppress_context__:
            # Then __cause__ should be used instead.
            exc = exc.__cause__
        else:
            exc = exc.__context__
        if exc is None:
            break
        yield type(exc), exc, exc.__traceback__

def chained_exception_types(e=None):
    """
    Return a generator iterator of exception types in the exception chain

    The exceptions are yielded from outermost to innermost (i.e. last to
    first when viewing a stack trace).

    Adapted from: https://github.com/getsentry/raven-python/pull/811/files?diff=unified
    """
    if not e or e is True:
        e = sys.exc_info()[1]

    if not e:
        raise ValueError("No exception found")

    yield type(e)

    while True:
        if e.__suppress_context__:
            # Then __cause__ should be used instead.
            e = e.__cause__
        else:
            e = e.__context__
        if e is None:
            break
        yield type(e)

saved_exception = None
try:
    resp = requests.get("http://one.two.threeFourFiveSixSevenEight")
except Exception as e:
    saved_exception = e
    if socket.gaierror in chained_exception_types(e):
        print("Found socket.gaierror in exception block via e")
    if socket.gaierror in chained_exception_types():
        print("Found socket.gaierror in exception block via traceback")
    if socket.gaierror in chained_exception_types(True):
        print("Found socket.gaierror in exception block via traceback")

if saved_exception:
    print("\nIterating exception chain for a saved exception...")
    for t, ex, tb in chained_exceptions((type(saved_exception), saved_exception, saved_exception.__traceback__)):
        print("\ttype:", t, "Exception:", ex)
        if t == socket.gaierror:
            print("\t*** Found socket.gaierror:", ex)
    if socket.gaierror in chained_exception_types(saved_exception):
        print("\t*** Found socket.gaierror via chained_exception_types")

Here’s the output:

Found socket.gaierror in exception block via e
Found socket.gaierror in exception block via traceback
Found socket.gaierror in exception block via traceback

Iterating exception chain for a saved exception...
    type: <class 'requests.exceptions.ConnectionError'> Exception: HTTPConnectionPool(host='one.two.threeFourFiveSixSevenEight', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    type: <class 'requests.packages.urllib3.exceptions.MaxRetryError'> Exception: HTTPConnectionPool(host='one.two.threeFourFiveSixSevenEight', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    type: <class 'requests.packages.urllib3.exceptions.NewConnectionError'> Exception: <requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known
    type: <class 'socket.gaierror'> Exception: [Errno -2] Name or service not known
    *** Found socket.gaierror: [Errno -2] Name or service not known
    *** Found socket.gaierror via chained_exception_types()

Now I can write the following code:

url = "http://one.two.threeFourFiveSixSevenEight"
try:
    resp = requests.get(url)
except requests.RequestException as e:
    if socket.gaierror in chained_exception_types(e):
        print("Couldn't get IP address for hostname in URL", url, " -- connect device to Internet")
    else:
        raise

Very nice — just what I wanted.

Note that Python 2 does not support exception chaining, so this only works in Python 3.

Aug 10: A colleague of mine, Lance Anderson, came up with a far more elegant solution:

import requests
import socket

class IterableException(object):

        def __init__(self, ex):
                self.ex = ex

        def __iter__(self):
                self.next = self.ex
                return self

        def __next__(self):
                if self.next.__suppress_context__:
                        self.next = self.next.__cause__
                else:
                        self.next = self.next.__context__
                if self.next:
                        return self.next
                else:
                        raise StopIteration

url = "http://one.two.threeFourFiveSixSevenEight"

try:
        resp = requests.get(url)
except requests.RequestException as e:
        ie = IterableException(e)
        if socket.gaierror in [type(x) for x in ie]:
                print("Couldn't get IP address for hostname in URL", url, " -- connect device to Internet.")

An abundance of databases

We live in an age of an abundance of database choices. The databases have trade-offs in terms of work to implement, rigidity vs flexibility, write performance, read performance, query performance, maintenance, support, robustness, security, and so on. It seems that many databases can be tuned to meet requirements, but it may require hiring an expert to get the most out of it, or to tell you that a given database may not be the right fit.

I recently learned of the existence of MemSQL, AeroSpike, Cockroach DB, Clustrix, VoltDB and NuoDB. Several of these came to my attention from reading an InfoWorld article, although what I cover here doesn’t exctly overlap.

MemSQL

  • Commercial only, with gratis community edition.
  • It supports a json column type, and can index, query and update data within the json.
  • Keen insights from their team of engineers. See http://blog.memsql.com/cache-is-the-new-ram/. “Throughput and latency always have the last laugh.” I.e. locality still matters.
  • “As various NoSQL databases matured, a curious thing happened to their APIs: they started looking more like SQL. This is because SQL is a pretty direct implementation of relational set theory, and math is hard to fool.”
  • “We realized that caching cost at least as much RAM as the working set (otherwise it was ineffective), plus the nearly unbearable headache of cache consistency.”
  • http://blog.memsql.com/bpf-linux-performance/

AeroSpike

  • AGPL NoSQL db, led by a former CEO of Salesforce.com. http://stackoverflow.com/questions/25208914
  • key-value store, although since it supports nested key-values, it may be somewhat equivalent to MongoDB’s schemaless json doc storage.
  • Scaleable. Far better than Redis when it’s time to scale.
  • Aerospike is reportedly faster than MongoDB (in 2014, that is)
  • Needs fewer nodes than MongoDB, and so it reportedly costs less.

Cockroach DB

  • APL 2.0
  • survivable
  • scaleable (distributed)
  • SQL
  • beta software
  • Higher write latencies. Built on RocksDB from Facebook.

Clustrix

  • Proprietary drop-in replacement for MySQL.
  • 540 million transactions per minute.
  • Higher write throughput than MongoDB (reportedly).
  • Not a document store. It’s an RDBMS

VoltDB

NuoDB

  • ACID complaint, SQL RDBMS
  • Memory centric
  • Scaleable, without sharding. (how does that work?)
  • More than 1 million transactions per second
  • Flexible schema
  • Java stored procedures
  • Despite claims that it “automatically adjusts for optimal workload”, my guess is that one must monitor and tune it. Computer algorithms are smart… until they’re not.

Runtime debugging tools for Linux

Here’s a useful presentation on Linux debugging tools — tools that don’t require source code, additional prints or logging.

http://jvns.ca/blog/2016/09/17/strange-loop-talk/

  • strace has a new flag that I didn’t know about: -y, which prints the paths that are associated with file descriptors.

  • opensnoop lets you see the details of open() calls across the entire system, or for an individual process, or for paths containing certain characters, or it can print the file paths that couldn’t be opened.

  • pgrep shows the stack trace of a running process, which can be useful to get an idea of what a program spends most of its time doing.

  • dstat shows system resource stats. It is a replacement for vmstat, iostat and ifstat.

  • htop — a more beautiful ‘top’, and easier to use. I still mostly use ‘top’ because it is installed by default. Other great tools I use include ‘powertop’ and ‘iotop’.

  • ngrep — an alternative to tcpdump, but allows the use of regexes to match plain-text data in packets.

  • tcpdump — useful when troubleshooting network connections between servers.

  • wireshark — a more UI-friendly tool than tcpdump, with dissectors for most protocols

Python attrs library; stackoverflow documentation

Article: The One Python Library Everyone Needs: attrs

Some people are excited about eventually being able to program in Python 3 everywhere. What I’m looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I’ve see it used in.

Or, for those who want more power (an complexity) than the attrs module, there’s macropy and it’s case-classes.


Stackoverflow has introduced a new tech documentation tool that focuses on providing examples, rather then merely sparsely documenting an API. The one on Python string formatting is quite useful.

Idioms facilitate communication

No matter what you think of a computer language, you ought to respect its idioms for the same reason one has to know idioms in a human language—they facilitate communication, which is the true purpose of all languages, programming or otherwise.

George V. Neville-Neil

George also explains that “a single cache miss is more expensive than many instructions, so optimizing away a few instructions is not really going to win your software any speed tests”.

HTML Subresource Integrity

LWN covers the new W3C spec for HTML subresource integrity (SRI):

SRI is designed to combat injection attacks that come through third-party content. The originating site can include cryptographic hashes of third-party script and image files, enabling the user’s browser to hash the corresponding files it receives from the third-party servers and verify that the hashes match.

Most browsers already support SRI, including Firefox, Chrome and Opera.

How to store passwords: Use Argon2

If you’re designing a service that requires passwords for authentication, store them using the Argon2 or bcrypt password hashing functions. Don’t use MD5, SHA-1, SHA-2 or SHA-3 — they’re not designed to keep passwords secure against attackers that gain access to your password database.

Reference article: How LinkedIn’s password sloppiness hurts us all by Jeremi M. Gosney

If [online services] aren’t using something like bcrypt or Argon2 for password storage, then they’re doing things very, very wrong. But slow hashing is no longer as effective of a solution as it could have once been had it only been adopted sooner.

When you suspect a password database has been compromised, even just in part, you cash in on that insurance policy [of using forced password resets] immediately by activating your incident response team and your public relations team.

What is Argon2? It’s the winning algorithm from the Password Hashing Competition. Argon2 has been added to recent versions of libsodium.

URL shorteners can compromise security

It’s useful to shorten long URLs, especially when sending them in tweets and in text messages. An LWN.net article helped me learn that they can be a security risk:

URL shorteners such as bit.ly and goo.gl perform a straightforward task: they turn long URLs into short ones, consisting of a domain name followed by a 5-, 6-, or 7-character token. This simple convenience feature turns out to have an unintended consequence. The tokens are so short that the entire set of URLs can be scanned by brute force. The actual, long URLs are thus effectively public and can be discovered by anyone with a little patience and a few machines at her disposal.

Around 7% of the OneDrive folders discovered in this fashion allow writing. This means that anyone who randomly scans bit.ly URLs will find thousands of unlocked OneDrive folders and can modify existing files in them or upload arbitrary content

— VITALY SHMATIKOV

KeyCzar: Encryption made easy

Encrypting sensitive data-at-rest (i.e. in a database) is a good idea, but how does one manage the encryption keys, and rotate keys or start using a new algorithm down the road without orphaning or migrating the old data? Use KeyCzar

Cryptography is easy to get wrong. Developers can choose improper
cipher modes, use obsolete algorithms, compose primitives in an unsafe
manner, or fail to anticipate the need for key rotation. Keyczar
abstracts some of these details by choosing safe defaults,
automatically tagging outputs with key version information, and
providing a simple programming interface.

Keyczar is designed to be open, extensible, and cross-platform
compatible. It is not intended to replace existing cryptographic
libraries like OpenSSL, PyCrypto, or the Java JCE, and in fact is
built on these libraries.

Or learn from what Google did with KeyCzar, and implement the same ideas (key rotation and key version info) using a more modern encryption library, like libsodium.