Python Magic Methods

Intro

In his excellent Fluent Python book, Luciano Ramalho talks about Python’s “data model” and gives some excellent examples of how the language internal consistency is achieved via the judicious use of a well-defined API and, in particular, how Python’s “magic methods” enable the construction of elegant solutions, which are concise and highly readable.

And while you can find countless examples online of how to implement the iterative magic methods (__iter__() and friends), here I wanted to present an example of how to use two of the lesser known magic methods: __del__() and __call__().

For those familiar with C++, these implement two very familiar patterns: the destructor and the function object (aka, operator()).

Implement a self-destructing key

Note

The full code is available at filecrypt Github repository, and it has been more fully explained in this blog entry.

Let’s say that we want to design an encryption key which will be in turn encrypted with a master key and whose “plaintext” value will only be used “in flight” to encrypt and decrypt our data, but will otherwise only be stored encrypted.

There are many reasons why one may want to do this, but the most common is when the data to be encrypted is very large and time-consuming to encrypt: should the master key be compromised, we could revoke it, re-encrypt the (possibly, many, one-time) encryption keys with a new master key without incurring the time penalty of having to decrypt and re-encrypt possibly several TB’s of data.

In fact, re-encrypting the encryption keys may be so inexpensive (computationally or time-wise) that this could be done on a regular basis, rotating the master key at frequent intervals (e.g., weekly).

If we use OpenSSL command-line tools to do all the encryption and decryption tasks, we need to temporarily store the encryption key as “plaintext” in a file, which we will securely destroy (using the shred Linux tool).

Note

We use the term “plaintext” to signify that the contents are decrypted, not to mean plain text format: the key is still binary data, but, if gotten at that stage by an attacker, it would not be protected with encryption.

However, just implementing the call to the shredding utility as the last step in our encryption algorithm would not be sufficient to ensure that this is executed under all possible code paths executions: there may be errors, exceptions raised, the user my terminate gracefully (Ctrl-c) or abruptly (SIGKILL) the program, and so on.

Guarding against all possibilities is not only tiresome, but also error-prone: how about instead having the Python interpreter do the hard work for us, and ensure that certain actions are always undertaken when the object is garbage collected?

Note

The technique shown here will not work for the SIGKILL case (aka kill -9) for which a more advanced technique (signal handlers) needs to be employed.

The idea is to create a class which implements the __del__() magic method, which is guaranteed to be always invoked when the there are no further references to the object, and it is garbage-collected (the exact timing of that happening is implementation dependent, but if you try that in common Python interpreters, it seems to be almost instantaneous).

This is what happens on a macOS laptop, running El Capitan and Python 2.7:

$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
>>> class Foo():
...     def __del__(self):
...         print("I'm gone, goodbye!")
... 
>>> foo = Foo()
>>> bar = foo
>>> foo = None
>>> bar = 99
I'm gone, goodbye!
>>> another = Foo()
>>> ^D
I'm gone, goodbye!
$

As you can see, the “destructor” method will be invoked eithere when there are no longer references to it (foo) or when the interpreter exits (bar).

The following code fragment shows how we ended up implementing our “self-encrypting” key (I called it SelfDestructKey because the real feature is that it destructs the plaintext version of the encryption key upon exit):

This is a much simplified version of the code, focusing only on the __del__() method; please refer to the full version in the repository for the complete code.

class SelfDestructKey(object):
    """A self-destructing key: it will shred its contents when it gets deleted.

       This key also encrypts itself with the given key before writing itself out to a file.
    """

    def __init__(self, encrypted_key, keypair):
        """Creates an encryption key, using the given keypair to encrypt/decrypt it.

        The plaintext version of this key is kept in a temporary file that will be securely
        destroyed upon this object becoming garbage collected.

        :param encrypted_key the encrypted version of this key is kept in this file: if it
            does not exist, it will be created when this key is saved
        :param keypair a tuple containing the (private, public) key pair that will be used to
            decrypt and encrypt (respectively) this key.
        :type keypair collections.namedtuple (Keypair)
        """
        self._plaintext = mkstemp()[1]
        self.encrypted = encrypted_key
        self.key_pair = keypair
        if not os.path.exists(encrypted_key):
            openssl('rand', '32', '-out', self._plaintext)
        else:
            with open(self._plaintext, 'w') as self_decrypted:
                openssl('rsautl', '-decrypt', '-inkey', keypair.private, _in=encrypted_key,
                        _out=self_decrypted)

    def __str__(self):
        return self._plaintext

    def __del__(self):
        try:
            if not os.path.exists(self.encrypted):
                self._save()
            shred(self._plaintext)
        except ErrorReturnCode as rcode:
            raise RuntimeError(
                "Either we could not save encrypted or not shred the plaintext passphrase "
                "in file {plain} to file {enc}.  You will have to securely delete the plaintext "
                "version using something like `shred -uz {plain}".format(
                    plain=self._plaintext, enc=self.encrypted))

    def _save(self):
        """ Encrypts the contents of the key and writes it out to disk.

        :param dest: the full path of the file that will hold the encrypted contents of this key.
        :param key: the name of the file that holds an encryption key (the PUBLIC part of a key pair).
        :return: None
        """
        if not os.path.exists(self.key_pair.public):
            raise RuntimeError("Encryption key file '%s' not found" % self.key_pair.public)
        with open(self._plaintext, 'rb') as selfkey:
            openssl('rsautl', '-encrypt', '-pubin', '-inkey', self.key_pair.public,
                    _in=selfkey, _out=self.encrypted)

Also, note how I have implemented the __str__() method, so that I can get the name of the file containing the plaintext key by just invoking:

passphrase = SelfDestructKey(secret_file, keypair=keys)
encryptor = FileEncryptor(
    secret_keyfile=str(passphrase), 
    plain_file=plaintext,
    dest_dir=enc_cfg.out)

Obviously, we could have just as easily implemented the __str__() method to return the actual contents of the encryption key.

Be that as it may, if you look in the code that uses the encryption key, at no point we need to invoke the _save() method or directly invoke the shred utility; this will all be taken care of by the interpreter when either passphrase goes out of scope, or the script terminates (normally or abnormally).

Implement the Command Pattern with a Callable

Python has the concept of callable which is essentially “something that can be invoked as if it were a function” (this follows the Duck Typing approach: “if it looks like a function, and can be called like a function, then it is a function”).

To make a class object behave as a callable all we need to do is to define a __call__() method and then implement it as any other “ordinary” class method.

Say that we want to implement a “command runner” script that (similarly to, for example, git) can take a set of sub-commands and execute them: one approach could be to use the Command Pattern in our CommandRunner class:

class CommandRunner(object):
    """Implements the Command pattern, with the help of the __call__() magic method."""

    def __init__(self, config):
        """Initiailize the Runner with the configuration from parsing the command line.

           :param config the command-line arguments, as parsed by ``argparse``
           :type config Namespace
        """
        self._config = config

    def __call__(self):
        method = self._config.cmd
        if hasattr(self, method):
            callable_meth = self.__getattribute__(method)
            if callable_meth:
                callable_meth()
        else:
            raise RuntimeError('Unexpected command "{}"; not found'.format(method))

    def run(self):
        # Do something with the files
        pass

    def build(self):
        # Call an external method that takes a list of files
        build(self._config.files)

    def diff(self):
        """Will compute the diff between the two files passed in"""
        if self._config.files and len(self._config.files) == 2:
            file_a, file_b = tuple(self._config.files)
            diff_files(file_a, file_b)
        else:
            raise RuntimeError("Not enough arguments for diff: 2 expected, {} found".format(
                len(self._config.files) if self._config.files else 'none'))

    def diff_all(self):
        # This will take a variable number of files and will diff them all
        pass

The config initialization argument is a Namespace object as returned by the argparse library:

def parse_command():
    """ Parse command line arguments and returns a configuration object

    :return: the configured options, or `None` if just printing help.
    :rtype: Namespace or None
    """
    parser = argparse.ArgumentParser()

    # Removed the `help` argument for better readability; make sure you
    # always include that to help your user, when they invoke your script
    # with the `--help` flag.
    parser.add_argument('--host', default='localhost')
    parser.add_argument('-p', '--port', type=int, default=8080,)
    parser.add_argument('--workdir', default=default_wkdir)

    parser.add_argument('cmd', default='run', choices=['run', 'build', 'diff', 'diff_all'])
    parser.add_argument('files', nargs=argparse.REMAINDER")
    return parser.parse_args()

To invoke this script we would use something like:

$ ./main.py run my_file.py

or:

$ ./main.py diff file_1.md another_file.md

Worth pointing out how we also protect against errors using other two "magic" methods:

if hasattr(self, method):
    callable_meth = self.__getattribute__(method)

note that we could have used the __getattr__() magic method to define the behavior of the class when attemptiong to access non-existing attributes, but in this case it was probably easier to do that at the point of call.

Given the fact that we are telling argparse to limit the possible value to the choices when parsing the cmd argument, we are guaranteed that we will never get an “unknown” command; however, the CommandRunner class does not need to know this, and it can be used in other instances where we do not have such a guarantee (not to mention that we are only one typo away from some very puzzling bug, if we didn’t do our homework in __call()).

To make all this work, then we only need to implement a trivial __main__ snippet:

if __name__ == '__main__':
    cfg = parse_command()

    try:
        runner = CommandRunner(cfg)
        runner()  # Looks like a function, let's use it like one.
    except Exception as ex:
        logging.error("Could not execute command `{}`: {}".format(cfg.cmd, ex))
        exit(1)

Note how we invoke the runner as if it were a method: this will in turn execute the __call__() method and run the desired command.

We truly hope everyone agrees this is a way more pleasant code to look at than monstruosities such as:

# DON'T DO THIS AT HOME
# Please avoid castle-of-ifs, they are just plain ugly.
if cfg.cmd == "build":
    # do something to build
elif cfg.cmd == "run":
    # do something to run
elif cfg.cmd == "diff":
    # do something to diff
elif cfg.cmd == "diff_all":
    # do something to diff_all
else:
    print("Unknown command", cfg.cmd)

Conclusion

Learning about Python’s “magic methods” will make your code not only easier to read and re-use in different situations, but also more “pythonic” and immediately recognizable to other fellow pythonistas, thus making your intent clearer to understand and reason about.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s