Captive portal detection

I did a wireshark dump on my Ubuntu 18.04 laptop and noticed that both Firefox and Ubuntu do captive portal detection. Of the two, I think the Firefox method is simpler to implement and use.

Firefox does an HTTP GET on http://detectportal.firefox.com/success.txt
Responds with HTTP 200 OK with a Content Type of text/plain and a body of “success\n”

Ubuntu does an HTTP GET on http://connectivity-check.ubuntu.com
Responds with HTTP 204 and a header of X-NetworkManager-Status: online\r\n

Notice that captive portal detection uses an unencrypted transport — http, and not https.

Programmer Productivity

Twenty years ago, an extended family relation, a patent lawyer, expressed his opinion that there’s not that much variance between engineers — at least, not as much as people suppose. Companies draw from the same pool of talent, and the idea that one company has the bulk of talent is a misconception.

This article by Bill Nichols confirms that idea in the realm of programmers.

Programmer Moneyball: Challenging the Myth of Individual Programmer Productivity

My view is that hard work, good health, persistence, consistency, the ability to work with others make a big difference. On the other hand, poor health, inconsistency and confusion of priorities lead to mediocre results.

Key takeaways from the article, with my commentary below

  1. Keep tasks small.
    • Reworking the design or the infrastructure (e.g. build system) is rarely small, but it can often be done in parallel with the existing solution.
  2. Plan for uncertainty by leaving adequate margins.
    • Planning with adequate margins often comes from similar experience, especially within an organization.
  3. Start critical work early since almost half the time it will take longer than expected, sometimes much longer.
    • There are edge cases that must be handled, and we discover things that we didn’t know up-front: new difficulties, new requirements, etc.
  4. Don’t be fooled by short-term progress.
  5. Provide a quiet work place so that programmers can focus.
    • Carve out times of day where there are no meetings.
  6. Design to control the complexity and size of solutions.
    • Peer feedback is a good way to simplify design, especially when a first attempt has started to be overly complex, or not work as intended.
  7. Encourage frequent peer review.
    • Agreed. Note that not all code changes need the same level of peer feedback, so saddling certain kinds of changes with too much process can be the antithesis of progress and quality in things like readme files, fixing spelling mistakes, adding a comment to clarify code, etc. Find a balance that works well.
  8. Automate routine tasks such as regression test and deployment.
    • I mostly agree. Beware of automating everything, especially one-off-temporary-solutions. Manual testing has its place.
  9. Develop talent with training, such as for design, review, and test.
    • Not all training is of equal value.
  10. Since quality can be taught and benefits apply to the total lifecycle cost, emphasize quality rather than speed.
    • For a revenue producing product, I tend to agree. When there’s a critical time-to-market component that has been adequately quantified, it can make sense to initially prioritize speed over quality — with the understanding that it may require all-hands-on-deck to handle quality failures.

He concludes, “the most motivating and humane way to improve average performance is to find ways to improve everyone’s performance.”

When management doesn’t prioritize items on the above list, it’s important for a software engineer to mange themselves and make them a priority.

Python: How to reduce memory usage

Useful information for reducing memory usage of Python programs:

  • https://m.habr.com/en/post/458518/
  • https://stackoverflow.com/questions/472000/usage-of-slots

TLDR: Dictionaries use a lot of memory. Possible solutions include using a class and slots, namedtuple, recordclass, cython, or numpy.

shellcheck

If you write Linux shell scripts (bash), you should use https://www.shellcheck.net/ to improve the quality of the script.

Code reviews: Benefits and Counterindications

Microsoft has a long article in ACM Queue on what people think they’re getting out of code reviews, what they’re actually getting, as well as a list of benefits a code review tool should provide.

My main takeaways:

  • Requiring two sign-offs is too many for low-risk changes such as renaming internal (not API) methods or local variables — only need one reviewer
  • Tag the files/changes that are at the heart of the change
  • Small reviews get better feedback. More than 20 files, and a code review isn’t going to provide much, if any, value.
  • If code reviews are important, doing reviews should be tracked and rewarded, just like anything else that has value.
  • Reviews tend to focus on: comments about maintainability, documentation, alternative solutions, validation, and API usage
  • Reviews only identify bugs ~15% of the time, so some other form of validation is important
  • A good code review tool can help greatly by recommending reviewers — lightening the burden and getting knowledgeable people involved
  • Show entire file to give reviewers context
  • There’s no one-size-fits-all solution for code reviews — i.e. each team and code base has different needs and different culture.

Models of code ownership

As I’ve worked as a software engineer, I’ve noticed that individuals, teams and companies approach things differently with regard to code ownership. Most engineers and teams take pride in quality code, in solving real problems, and in shipping software.

The model of code ownership affects the dynamics of a team, of a company, and of the software that is created and maintained. I’m strongly in favor of collaborative ownership, with the understanding that individual engineers are stronger in some areas than others. I appreciate collaborative ownership because it engenders a culture of inclusion, participation, cross-functional learning, and openness to a diversity of ideas and experience.

On the other hand, I’ve seen it work for individual ownership of certain components of a software stack — as long as there’s a way for others to give feedback in constructive ways — i.e. the owner takes pride in his/her code, wants to learn and accept feedback so that their code can be the best that it can be.

It drives a wedge in working relationships when a code owner says “hand’s off!” or “how dare you touch my code without consulting me first!”. It also creates problems when a contributor says, “I’ll make changes when I want to, whether you like it or not!”. Such attitudes indicate a lack of trust and respect. Perhaps this is why distributed version control (with pull requests or patch submission) works well compared to the anyone-can-commit model more common with Subversion. E.g. with github or gitlab, anyone can contribute, but the code stewards get to decide whether or not to accept the request.

The same principles apply when outside team members attempt to contribute code to another team. Ideally, the recipient team documents code standards, design decisions, and if nothing else, during code review, they communicate the ideals to the team attempting to contribute the code.

Here are some articles that talk about the styles of code ownership, and the pros and cons:

Show which git branches have been merged and can be deleted

At work, we generate quite a few feature branches, which get tested, and then merge into “develop”. The feature branches don’t get cleaned up frequently. Here’s a series of shell commands I cobbled together to show the most recent person to commit to the branch, and which branches have been merged into develop.

git checkout develop
git pull -r
(for branch in $(git branch -r --merged | grep -vP "release|develop|master") ; do git log -1 --pretty=format:'%an' $branch | cat ; echo " $branch" ; done) | sort | sed -e 's#origin/##'

The output looks something like this:

Jane Doe feature/something
Jane Doe feature/another-thing
Jane Doe feature/yet-another-something
Zane Ears feature/howdy

And they can be deleted as follows:

git push origin --delete feature/something

Iterating the python3 exception chain

I use the python requests library to make HTTP requests. Handling exceptions and giving the non-technical end user a friendly message can be a challenge when the original exception is wrapped up in an exception chain. For example:

import requests

url = "http://one.two.threeFourFiveSixSevenEight"
try:
    resp = requests.get(url)
except requests.RequestException as e:
    print("Couldn't contact", url, ":", e)

Prints:

Couldn’t contact http://one.two.threeFourFiveSixSevenEight : HTTPConnectionPool(host=’one.two.threeFourFiveSixSevenEight’, port=80): Max retries exceeded with url: / (Caused by NewConnectionError(‘<requests.packages.urllib3.connection.HTTPConnection object at 0x7f527329c978>: Failed to establish a new connection: [Errno -2] Name or service not known’,))

And that’s a mouthful.

I want to tell the end user that DNS isn’t working, rather than showing the ugly stringified error message. How do I do that, in python3? Are python3 exceptions iterable? No. So I searched the internet, and found inspiration from the raven project. I adapted their code in two different ways to give me the result I wanted.

Update Aug 10: See the end of this blog post for a more elegant solution.

import socket
import requests
import sys

def chained_exceptions(exc_info=None):
    """
    Adapted from: https://github.com/getsentry/raven-python/pull/811/files?diff=unified

    Return a generator iterator over an exception's chain.

    The exceptions are yielded from outermost to innermost (i.e. last to
    first when viewing a stack trace).
    """
    if not exc_info or exc_info is True:
        exc_info = sys.exc_info()

    if not exc_info:
        raise ValueError("No exception found")

    yield exc_info
    exc_type, exc, exc_traceback = exc_info

    while True:
        if exc.__suppress_context__:
            # Then __cause__ should be used instead.
            exc = exc.__cause__
        else:
            exc = exc.__context__
        if exc is None:
            break
        yield type(exc), exc, exc.__traceback__

def chained_exception_types(e=None):
    """
    Return a generator iterator of exception types in the exception chain

    The exceptions are yielded from outermost to innermost (i.e. last to
    first when viewing a stack trace).

    Adapted from: https://github.com/getsentry/raven-python/pull/811/files?diff=unified
    """
    if not e or e is True:
        e = sys.exc_info()[1]

    if not e:
        raise ValueError("No exception found")

    yield type(e)

    while True:
        if e.__suppress_context__:
            # Then __cause__ should be used instead.
            e = e.__cause__
        else:
            e = e.__context__
        if e is None:
            break
        yield type(e)

saved_exception = None
try:
    resp = requests.get("http://one.two.threeFourFiveSixSevenEight")
except Exception as e:
    saved_exception = e
    if socket.gaierror in chained_exception_types(e):
        print("Found socket.gaierror in exception block via e")
    if socket.gaierror in chained_exception_types():
        print("Found socket.gaierror in exception block via traceback")
    if socket.gaierror in chained_exception_types(True):
        print("Found socket.gaierror in exception block via traceback")

if saved_exception:
    print("\nIterating exception chain for a saved exception...")
    for t, ex, tb in chained_exceptions((type(saved_exception), saved_exception, saved_exception.__traceback__)):
        print("\ttype:", t, "Exception:", ex)
        if t == socket.gaierror:
            print("\t*** Found socket.gaierror:", ex)
    if socket.gaierror in chained_exception_types(saved_exception):
        print("\t*** Found socket.gaierror via chained_exception_types")

Here’s the output:

Found socket.gaierror in exception block via e
Found socket.gaierror in exception block via traceback
Found socket.gaierror in exception block via traceback

Iterating exception chain for a saved exception...
    type: <class 'requests.exceptions.ConnectionError'> Exception: HTTPConnectionPool(host='one.two.threeFourFiveSixSevenEight', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    type: <class 'requests.packages.urllib3.exceptions.MaxRetryError'> Exception: HTTPConnectionPool(host='one.two.threeFourFiveSixSevenEight', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known',))
    type: <class 'requests.packages.urllib3.exceptions.NewConnectionError'> Exception: <requests.packages.urllib3.connection.HTTPConnection object at 0x7fae7d0bfa20>: Failed to establish a new connection: [Errno -2] Name or service not known
    type: <class 'socket.gaierror'> Exception: [Errno -2] Name or service not known
    *** Found socket.gaierror: [Errno -2] Name or service not known
    *** Found socket.gaierror via chained_exception_types()

Now I can write the following code:

url = "http://one.two.threeFourFiveSixSevenEight"
try:
    resp = requests.get(url)
except requests.RequestException as e:
    if socket.gaierror in chained_exception_types(e):
        print("Couldn't get IP address for hostname in URL", url, " -- connect device to Internet")
    else:
        raise

Very nice — just what I wanted.

Note that Python 2 does not support exception chaining, so this only works in Python 3.

Aug 10: A colleague of mine, Lance Anderson, came up with a far more elegant solution:

import requests
import socket

class IterableException(object):

        def __init__(self, ex):
                self.ex = ex

        def __iter__(self):
                self.next = self.ex
                return self

        def __next__(self):
                if self.next.__suppress_context__:
                        self.next = self.next.__cause__
                else:
                        self.next = self.next.__context__
                if self.next:
                        return self.next
                else:
                        raise StopIteration

url = "http://one.two.threeFourFiveSixSevenEight"

try:
        resp = requests.get(url)
except requests.RequestException as e:
        ie = IterableException(e)
        if socket.gaierror in [type(x) for x in ie]:
                print("Couldn't get IP address for hostname in URL", url, " -- connect device to Internet.")

An abundance of databases

We live in an age of an abundance of database choices. The databases have trade-offs in terms of work to implement, rigidity vs flexibility, write performance, read performance, query performance, maintenance, support, robustness, security, and so on. It seems that many databases can be tuned to meet requirements, but it may require hiring an expert to get the most out of it, or to tell you that a given database may not be the right fit.

I recently learned of the existence of MemSQL, AeroSpike, Cockroach DB, Clustrix, VoltDB and NuoDB. Several of these came to my attention from reading an InfoWorld article, although what I cover here doesn’t exctly overlap.

MemSQL

  • Commercial only, with gratis community edition.
  • It supports a json column type, and can index, query and update data within the json.
  • Keen insights from their team of engineers. See http://blog.memsql.com/cache-is-the-new-ram/. “Throughput and latency always have the last laugh.” I.e. locality still matters.
  • “As various NoSQL databases matured, a curious thing happened to their APIs: they started looking more like SQL. This is because SQL is a pretty direct implementation of relational set theory, and math is hard to fool.”
  • “We realized that caching cost at least as much RAM as the working set (otherwise it was ineffective), plus the nearly unbearable headache of cache consistency.”
  • http://blog.memsql.com/bpf-linux-performance/

AeroSpike

  • AGPL NoSQL db, led by a former CEO of Salesforce.com. http://stackoverflow.com/questions/25208914
  • key-value store, although since it supports nested key-values, it may be somewhat equivalent to MongoDB’s schemaless json doc storage.
  • Scaleable. Far better than Redis when it’s time to scale.
  • Aerospike is reportedly faster than MongoDB (in 2014, that is)
  • Needs fewer nodes than MongoDB, and so it reportedly costs less.

Cockroach DB

  • APL 2.0
  • survivable
  • scaleable (distributed)
  • SQL
  • beta software
  • Higher write latencies. Built on RocksDB from Facebook.

Clustrix

  • Proprietary drop-in replacement for MySQL.
  • 540 million transactions per minute.
  • Higher write throughput than MongoDB (reportedly).
  • Not a document store. It’s an RDBMS

VoltDB

NuoDB

  • ACID complaint, SQL RDBMS
  • Memory centric
  • Scaleable, without sharding. (how does that work?)
  • More than 1 million transactions per second
  • Flexible schema
  • Java stored procedures
  • Despite claims that it “automatically adjusts for optimal workload”, my guess is that one must monitor and tune it. Computer algorithms are smart… until they’re not.