HOW-TO Publish a Pyton Package on PyPi

Create a setup.py file

The arguments for setup() are documented here and are non-trivial: a good example is my filecrypt‘s setup.py file.

NOTE
Do not confuse setuptools with distutils – this is the correct import for setup.py:

from setuptools import setup

The trickiest part is figuring out the packages, modules and the script files: probably best to think about it in advance, but it was possible to rectify that during setup.

The biggest challenge is to come up with a top-level package name that does not conflict with an existing one.

As far as I can tell, it’s currently mostly a process of trial-and-error, see below.

Once the setup.py is in decent shape, you can try and build a wheel:

python setup.py bdist_wheel

After doing that, it’s good practice to create a new virtualenv, and try to install the new package in that one:

pip install dist/my-project.whl

this is particularly useful to test out whether the console_scripts have been correctly configured.

If you use classifiers such as in:

    classifiers=[
        'Development Status :: 4 - Beta',
        'Intended Audience :: System Administrators',
        'License :: OSI Approved :: Apache Software License',
        'Programming Language :: Python :: 3'
    ]

then make sure to consult the classifiers list as anything else will cause an error and prevent registration.

Register your Project

NOTE

The instructions given to use twine for this step did not work for me. YMMV

Unless you have already have an account on PyPi, you will need to create one and login.

You can then head to the Registration Form and upload your PKG_INFO file: this has been created in a [prj name].egg-info/ directory: this may take a bit of back and forth, while you try to appease the Gods of PyPi to accept your configuration choices.

In particular, coming up with a non-conflicting-yet-meaningful package name may take more trials than one may expect – planning ahead is highly advised, as I have been unable to find an easy way to list all package names (if you do know of one, please do get in touch) and…

There are currently 88906 packages here.

(“here” being PyPi, as of 09/16/2016).

Upload to PyPi

Once registration succeeds, the actual upload is rather easy, using twine:

twine upload dist/*

provided you have a valid ~/.pypirc it will just ask for the password and do the needful:

$ cat ~/.pypirc
[distutils]
index-servers=pypi

[pypi]
repository = https://upload.pypi.org/legacy/
username = [your username]

Docker for Mac and insecure registries

Aside

If you are using a private registry with a self-signed certificate, and need to connect to it from a macOS laptop, you are likely to incur in the following error:

$ docker-compose up -d
 Pulling search (docker.registry.mydomain.io:5000/image:2.0.1)...
 ERROR: Get https://docker.registry.mydomain.io:5000/v1/_ping: x509:
 certificate signed by unknown authority

it turns that (a) there is not much of information that is applicable out there and that (b) it is dead easy to fix.

screen-shot-2016-09-12-at-9-20-31-pmSimply click on the menu bar icon and choose Preferences (or hit Cmd-,) and in the Advanced pane, add the full hostname:port for your private registry.

Then Apply & Restart and you’re good to go.

A python notebook to experiment with the Apache Mesos HTTP API – Part 3 of 3

This is the third and final part of a three-part series: Part 1 describes the required setup and how to get Apache Mesos Master and Agent running in two Vagrant VMs; Part 2 shows how to connect to the HTTP API and accept resource offers.

Data center stock image

This series is an extended (and updated) version of the [talk] I gave at MesosCon Europe 2015 updated for Apache Mesos 1.0.0, which has just been released (August 2016) – you can also find the [slides] there.

Recap

By the end of the last part, we had a running “framework” connected to a Mesos Master via the HTTP API, with an open (“subscribed”) connection, running in a background thread, and uniquely identified by a Stream-id:

Connecting to Master: http://192.168.33.10:5050/api/v1/scheduler
body type:  SUBSCRIBE
The background channel was started to http://192.168.33.10:5050/api/v1/scheduler
Stream-id:  31e0c731-f055-4588-b0f0-5cdfaed5260c
Framework 474970d2-1b5e-40f9-82a2-135c71cd1448-0000 registered with Master at (http://192.168.33.10:5050/api/v1/scheduler)

Further, we had just been offered resources from the running Agent, via the Master:

{"offers": [
   {...
    "attributes": [ {
            "name": "rack",
            "text": {
                "value": "r2d2"
            },
            "type": "TEXT"
        },
        {
            "name": "pod",
            "text": {
                "value": "demo,dev"
            },
            "type": "TEXT"
        }
    ],
    ...
    "resources": [
        {
            "name": "ports",
            "ranges": {
                "range": [ {"begin": 9000, "end": 10000}]
        },
        {
            "name": "cpus",
            "role": "*",
            "scalar": {"value": 2.0},
        },
        {
            "name": "mem",
            "role": "*",
            "scalar": {"value": 496.0},
        },
        {
            "name": "disk",
            "role": "*",
            "scalar": { "value": 4930.0 },
        }
    ],
    ...
] }

Next, we are going to use some of these resources to launch a container on the Agent.

Accepting Offers

Although recently Mesos has evolved its model to allow frameworks to pre-emptively reserve resources and sort of stash them aside for peak demands (or launching high-priority workloads), as well as for “best-effort” allocation of unused resources to low-priority tasks, which wouldn’t mind being booted out of those reserved resources are claimed by their rightful owners, we will not address those use cases here.

For those interested, I would recommend reading the Reservation and Oversubscription documents, as well as follow the development activity on the Mesos mailing lists: both features have been introduced only recently and are likely to continue evolving in the feature.

Obviously, if there is enough interest in such topics, we could be convinced to write a dedicated series on the subject…

At any rate, to “accept” offers, all we need to do is to tell Master what we would like to do with them (namely, run an Ngnix container) and how much of what’s been offered we’d like to take (in a shared, high load environment such as a Production Data Center, it is usually good manners only to use as little, or as much, as actually needed, and rely on the “elasticity” of the underlying resources, to deal with sudden increases in load).

The file resources/container.json has the full body of the request (of type ACCEPT) that we will send to Master; as you can see, several fields are marked as null, because they contain dynamically generated values that we need to fill in so that Master can reconcile our request with how many offers, frameworks and tasks it has pending – in a realistic production environment, a Mesos Master could be handling upwards of hundreds of frameworks, thousand of Agents and many tens of thousand tasks (even though, actually the tasks are managed by the Agents themselves).

container_launch_info = get_json(DOCKER_JSON)

# Need to update the fields that reflect the offer ID / agent ID and a random, unique task ID:
task_id = str(random.randint(1, 100))
agent_id = offers.get('offers')[0]['agent_id']['value']
offer_id = offers.get('offers')[0]['id']

container_launch_info["framework_id"]["value"] = framework_id
container_launch_info["accept"]["offer_ids"].append(offer_id)

task_infos = container_launch_info["accept"]["operations"][0]["launch"]["task_infos"][0]
task_infos["agent_id"]["value"] = agent_id
task_infos["task_id"]["value"] = task_id
task_infos["resources"].append(get_json(TASK_RESOURCES_JSON))

The code above does the following:

  • load the DOCKER_JSON file and build a dict out of it;
  • extract the agent_id and offer_id from the OFFERS response we received on the “streaming” channel (see the post() method, as well as the full breakdown of the response in the notebook output – this was described in Part 2) and put them back in our request: this will enable Master to reconcile the two: failing to do so, will cause Master to refuse the request;
  • finally, take a reference to the task_info field of the request, and then update its fields to contain the agent_id and task_id (this is just a convenience to avoid some typing and a very long statement).

Task ID

You may have noticed this strange line in the snippet above:

task_id = str(random.randint(1, 100))

The TaskID is used by the Agent to uniquely identify the tasks from a framework and it’s also meant for the users (“operators”) to be able to track tasks that are launched on Mesos: it can be really anything (Mesos doesn’t care, so long as it’s unique per framework) and you will see its use in a second.

I have chosen to use a random integer between 1 and 100, but note Mesos expects a field of type string, so we convert it via the str() method.

Launching a container

With all the above done, it’s time to fire it off:

    r = post(API_URL, container_launch_info)

which is a bit of an anti-climax, but will result (unless you’ve messed around with the JSON format and made it fail some syntax check or the conversion to Protocol Buffer, the format used internally by Mesos) in a

Connecting to Master: http://192.168.33.10:5050/api/v1/scheduler
body type:  ACCEPT
Result: 202

Note that you will get a 202 regardless of whether the task was successfully launched, or failed post launch, or anything else, for that matter, that could have happened to it.

As mentioned in the previous post, the API model is an asynchronous one and you will get the actual outcome of your request (beyond a simple syntactic validation) over the “streaming” channel (the equivalent of the callback methods of the original C++ Scheduler API).

In a “real life” Framework, after sending a request to launch a task, we would wait for an “event” posted back by the Master and specifically wait for an UPDATE or a new OFFER message to discover whether the task was successfully launched or had failed.

See the Scheduler API for more details and a full list of messages and events.

For our part, we will explore the success (or otherwise) of our launch by using the Master UI:

http://192.168.33.10:5050

should show a list of Active/Completed/Orphan tasks and, with any luck, a new one should be listed in the Active pane, with TASK_RUNNING status:

Ngnix task running

Alternatively, you can click on the Agents button on the tab bar at the top and then follow the link to the one running agent, to see a list of Frameworks whose tasks are running on the Agent:

Frameworks

and clicking on the active one (you can see I’ve been experimenting…) take you to a list of “Executors”(1)

Executors

where you can see the task just launched (notice the ID – this is where we would see whatever we were to set for the task_id: in our case just an integer under 100):

Tasks

and, finally, we could hit the Sandbox link to look at the STDOUT/STDERR from the task execution.

Notice how we only requested a tiny fraction of what we were offered: just 20% CPU and 100MB of RAM – and, in fact, using even less.

The requested resources were defined in the "resources" field:

task_infos["resources"].append(get_json(TASK_RESOURCES_JSON))

and are defined in the resources/task_resources.json file:

    [
        {
            "name": "cpus",
            "role": "*",
            "scalar": {"value": 0.2},
            "type": "SCALAR"
        },
        {
            "name": "mem",
            "role": "*",
            "scalar": {"value": 100},
            "type": "SCALAR"
        }
    ]

If all looks well and the task is in the TASK_RUNNING state (if not, there is a “Troubleshooting” section at the end), you should be now able to access the just-launched Nginx server on it default HTTP (80) port:

http://192.168.33.11

Notice how this is the Agent’s IP address not the Master: the container is running on the Agent and using its resources:

Nginx default home page

Obviously, this just shows the default page: in a real live situation, we would use two possible alternative approaches:

  • build our own container image, with the desired static pages (and whatever other custom Ngnix configuration made sense) and possibly even with some proxying logic to a dynamic online app;

  • download and extract a tarball with the website’s static contents (not just HTML/CSS but Javascript too, possibly), as described below.

Downloading content to Mesos Agents

One of the fields of the Accept Protocol Buffer is an Offer.Operation which in turn can be of several “types”, one of which (LAUNCH) implies that its content are a ‘Launch’ message, in turn containing an array (or list if you are into C++) of TaskInfo dictionaries, which in turn contain a (required) CommandInfo field, called "command".

If you are not confused yet, you’ve not been following closely… but the bottom line is, at any rate, that by setting the content of command to something like this:

    ...,
    "command": {
      "uris": [
        {
          "value": "http://192.168.33.1:9000/content.tar.gz",
          "extract": true
        }
      ],
      "shell": true,
      "value": "cd /var/local/sandbox && python -m SimpleHTTPServer 9090"
    },
    ...

we can specify a list of URIs (which must be reachable by the Agent) whose contents will be fetched and, optionally (if extract is true) uncompressed into the container “sandbox”: the field we have specified at launch with (see the run_agent.sh script):

--sandbox_directory=/var/local/sandbox

This finally closes this, admittedly convoluted, detour: if we were to configure our Ngnix server (or whatever other application we were to launch) to use that directory as the source for the site pages, those would be served instead.

An example of how to do this using a much simpler Python SimpleHTTPServer is demonstrated alongside the alternative method of executing a command on the Agent (as opposed to launching a container – the dynamic of downloading a remote file remains the same) in the notebook’s section called “Launch a Task using the given offers”.

This will require that you somehow set a server up to serve the content.tar.gz file on your local dev box (or wherever you feel like – an AWS S3 bucket would be a perfect candidate for this in a real production environment) and changing the JSON in the request to use resources/container.old.json in the repository, which is maybe a good “exercise for the reader,” but maybe one step too far for some.

A couple of considerations:

  • a recently implemented feature allows Mesos to cache content downloaded dynamically (including container images): for more information, please see the Fetcher Cache excellent document;

  • directly launching a binary on an Agent (as opposed to downloading and running a container image) runs counter, in my view, to the whole spirit of treating your machines “as a herd, not a pet” in a modern, DevOps-oriented production environment: this requires to choose between to equally bad options:

    • use some form of packaging and configuration management (e.g., Chef or Puppet) to ensure all your Agents have the same set of binaries and will be able to run them upon request; or

    • use a (possibly, cumbersome) combination of roles and attributes (see Part 2) to ensure frameworks are only able (or allowed) to launch certain binaries on certain agents (in addition to the above provisioning and configuration management approach).

Compared with the simplicity of having a cluster (possibly, running in the 000’s of machines, all identically provisioned) and letting Mesos just pick one for you and then download and run a container (again, possibly, from a private image registry, managed via a CI/CD infrastructure), it seems to me a much more scalable, agile and definitely less error-prone approach.

But, there you have it: in case you wish to run a binary, the code for how to do it is shown there.

Troubleshooting

If your Master server dies, or you can’t launch the container, there are a few steps that I’ve found useful when creating the code and this blog entry:

1. restart the Master:

    cd vagrant
    vagrant ssh master
    ps aux | grep mesos | grep -v grep
    # Is there anything running here? if not:
    sudo ./run-master.sh
    # you can happily ignore Docker whining about ZK already running (also see below)

2. is Zookeeper still alive?

    cd vagrant
    vagrant ssh master
    sudo docker ps
    # is the zk container running? if not, you can restart it:
    sudo docker rm $(sudo docker ps -qa)
    sudo docker run -d --name zookeeper -p 2181:2181 \
        -p 2888:2888 -p 3888:3888 jplock/zookeeper:3.4.8
    # Alternatively, as ZK going away means that usually Master terminates too, use
    # the solution above to restart Master.

3. Is the Agent active and running?

Start from http://192.168.33.11:5051/state – if nothing shows, then follow a similar solution as to the above to restart the agent:

    cd vagrant
    vagrant ssh agent
    ps aux | grep mesos | grep -v grep
    # Is there anything running here? if not:
    sudo ./run-agent.sh

Head back to your browser and check whether the UI is responsive again.
If the server is running but unresponsive, or seems to be unable to connect to the Agent or whatever, you can just kill it:

    ubuntu@mesos-master:~$ ps aux | grep mesos | grep -v grep
    root     18630  0.0  4.3 1318424 43728 pts/1   Sl   20:29   0:02 mesos-master
    ubuntu@mesos-master:~$ sudo kill 18630

(and resort to a -9, SIGKILL if it really has become completely unresponsive).

4. Keep an eye on the logs

As mentioned also in Part 2, it is always useful to keep a browser window open on the Master and Agent logs – this can be done via the LOG link in their respective UI pages:

Mesos UI Logs

They usually provide a good insight into what’s going on (and what went wrong); alternatively, you can always less them in a shell (via the vagrant ssh shown above): the log directories are set in the launch shell scripts:

    # In run-master.sh:
    LOG_DIR=/var/local/mesos/logs/master

    # In run-agent.sh
    LOG_DIR=/var/local/mesos/logs/agent

and you will something like this (I usually find the INFO logs the most useful, YMMV):

    ubuntu@mesos-master:~$ ls /var/local/mesos/logs/master/
    mesos-master.ERROR
    mesos-master.INFO
    mesos-master.mesos-master.root.log.ERROR.20160828-211508.2991
    mesos-master.mesos-master.root.log.ERROR.20160902-080919.17820
    mesos-master.mesos-master.root.log.INFO.20160827-210041.30302
    mesos-master.mesos-master.root.log.INFO.20160828-210703.2991
    mesos-master.mesos-master.root.log.INFO.20160902-054452.17820
    mesos-master.mesos-master.root.log.INFO.20160903-202955.18630
    mesos-master.mesos-master.root.log.WARNING.20160827-210041.30302
    mesos-master.mesos-master.root.log.WARNING.20160828-210703.2991
    mesos-master.mesos-master.root.log.WARNING.20160902-054452.17820
    mesos-master.mesos-master.root.log.WARNING.20160903-202955.18630
    mesos-master.WARNING

Note that the mesos-master.{ERROR,INFO,WARNING} are just symlinks to the files being currently appended to by the running processes.

5. Dammit, all else failed!

Ok, time to bring the heavy artillery out: destroy the boxes and rebuild them from scratch:

    cd vagrant
    vagrant destroy
    vagrant up

this is nasty and time-consuming, but it guarantees a clean slate and should set you back into a working environment.

Conclusion

This has been a long ride: between updating the Python notebook, creating and fine-tuning the Vagrant boxes and drafting these blog entries, it has taken me almost a month and many hours of experimenting with Mesos and its HTTP API.

It has been overall a fun experience and I believe it has helped me learn more about it; it also provides a good foundation upon which to build a Python library to abstract much of the low-level complexity: I will continue to work and update the zk-mesos repository; please come back from time to time to check on progress or, even better, feel free to fork and contribute back your code via pull requests.

Just remember, no smelly code!

A python notebook to experiment with the Apache Mesos HTTP API – Part 2 of 3

This is the second part of a three-part series: Part 1 describes the required setup and how to get Apache Mesos Master and Agent running in two Vagrant VMs

stock image

This series is an extended (and updated) version of the talk I gave at MesosCon Europe 2015 updated for Apache Mesos 1.0.0, which has just been released (August 2016) – you can also find the slides there.

Recap

So, by the end of the first part, you should have two running VMs, to which you can connect; the following assumes that you can successfully point your browser to:

http://192.168.33.10:5050

and see this page:

Master UI

Further, you have an active virtualenv environment (I called it demo, feel free to give whichever name you feel like) and you can run Jupyter notebooks with it: in other words, running this from a terminal:

source ./demo/bin/activate
jupyter notebook

will show the Jupyter home page, from which you can load the notebooks/Demo-API.ipynb, which will show up looking something like this:

Jupyter Notebook UI

Python Notebook primer

A “notebook” is a mixture of Markdown and code which can be executed inside a Python kernel, the output of the command’s execution is shown below the “cell” – which is essentially a region of contiguous text.

The code itself (or the Markdown, for that matter) can be edited directly (double-click on it, or hit Enter) by simply pressing Shift-Enter (or Alt-Enter, to create an empty cell below the current one).

A full tutorial on Python’s notebooks is vastly beyond the scope of this paper, suffice to say that you can:

  • Follow along by hitting Shift-Enter to execute the cell’s code and move to the next; and
  • Modify and re-execute the code to experiment further with the concepts shown here.

Only one caveat: because the code assumes (and, indeed Mesos HTTP API requires) an ongoing background thread and some amount of shared data, some parts of the code may be “unsafe” to run multiple times; but no fear: if you get stuck, just “Stop and Restart” the kernel (sometimes simply closing and re-opening the notebook may not be sufficient).

In the following, I will simply reproduce brief snippets of the notebook and provide some clarification comments: for the actual output, the full code and the comments there, please see the actual notebook.

Mesos HTTP API

As described in Mesos documentation, the API does not follow REST principles, relying instead on a “callback” mechanism that closely resembles the original libmesos Scheduler API.

This means, in practice, that any client (“framework” in Mesos parlance) will need to maintain two concurrent connections:

  • one “permanent,” streaming data back to the framework; and
  • one “ephemeral,” executing multiple requests from the framework to Master, and carrying the former’s commands (e.g., Accept Offer).

The asynchronous results from the ephemeral requests (beyond the immediate, synchronous validation of the request itself) is carried back via the permanent channel.

Please read the docs (or consult the Mesos in Action book) for more details.

In our notebook, we open (and keep streaming) the connection using the post method defined in the cell that is entitled “POST helper method” :

def post(url, body, sid=None, **kwargs):
    """ POST `body` to the given `url`.

        @return: the Response from the server.
        @rtype: requests.Response
    """

when we pass a stream argument (with whatever value), it will assume that that one is the “permanent” connection, and will keep it open, streaming back the Master’s responses and extracting the values, depending on the type of the response:

if body.get("type") == "ERROR":
    print("[ERROR] {}".format(body))
    global last_error
    last_error = body
# etc ...

please don’t do this at home and see my other post as to why a “castle of ifs” is a Truly Bad Thing and a much better way of handling cases such as this.

API Requests / Responses format

As you can see, the format of both requests and responses is JSON, however, Mesos will also “understand” serialized Protocol Buffers: in fact, the format of the JSON requests looks a bit awkward because it is actually generated automatically from the Proto format (all of the API messages can be seen in the Mesos code repository).

So, for example, to subscribe our framework to Master, we send a SUBSCRIBE message:

SUBSCRIBE_BODY = {
    "type": "SUBSCRIBE",
    "subscribe": {
        "framework_info": {
            "user" :  "ubuntu",
            "name" :  "Demo Mesos HTTP API Framework"
        },
        "force" : True
    }
}

and we expect back a SUBSCRIBED response, that will carry back a Stream ID – to see this in action, run the cell (and all those above first) entitled Registering a Framework:

try:
    kwargs = {'stream':True, 'timeout':30}
    persistent_channel = Thread(target=post, args=(API_URL, SUBSCRIBE_BODY), kwargs=kwargs)
    persistent_channel.daemon = True
    persistent_channel.start()
    print("The background channel was started to {}".format(API_URL))
except Exception as ex:
    print("An error occurred: {}".format(ex))

Connecting to Master: http://192.168.33.10:5050/api/v1/scheduler
body type:  SUBSCRIBE
The background channel was started to http://192.168.33.10:5050/api/v1/scheduler
Stream-id:  31e0c731-f055-4588-b0f0-5cdfaed5260c
Framework 474970d2-1b5e-40f9-82a2-135c71cd1448-0000 registered with Master at (http://192.168.33.10:5050/api/v1/scheduler)

As you can see the snippet above starts a background thread, and executes the post method, sending a SUBSCRIBE_BODY message: what we got back from Mesos is:

  • a Framework ID (which uniquely identifies our “framework”); and
  • a Stream-id, which we will need to store and reuse in every subsequent request.
    # See in the post() method
    if body.get("type") == "SUBSCRIBED":
        global framework_id, stream_id, headers
        stream_id = r.headers['Mesos-Stream-Id']
        headers['Mesos-Stream-Id'] = stream_id

We extract from the response (r) headers the Stream-id and we insert it into the headers that will be reused in every subsequent request.

Consuming Data Center resources

This is all very exciting and much fun, but it wouldn’t be much use to anyone if it didn’t give us the means to do what Mesos was meant to do: namely, provide orderly access to distributed computing resources (more specifcally, computing (CPU), storage (disk) and networking (ports, essentially)).

A complete discussion of Mesos resources management and its allocation strategy (currently based on Dominant Resource Fairness, or DRF) is certainly outside the scope of this series, but suffice to say that Mesos aims at providing fair access to a set of shared resources by ensuring that each framework is only offered a “fair fraction” of the available total.

Setting various flags (around roles and weights) it is possible to fine-tune the allocation of certain resources across frameworks, and keeping them isolated (via Linux cgroups) thus allowing low priority workloads (e.g., development or testing applications) to run alongside high-priority (e.g., production) ones – in turn, this allows for denser packing of runtime binaries and a more efficient utilization of computing resources (not to mention, greatly simplifying the operation and management of the DC resources).

The first step in launching a “task” (in Mesos parlance, this can be a binary runtime or a container) is to accept an Offer from one of the Agents: here, we only have one, so not much to choose from, but in a real production environment, we could be more selective on which one we’d be prepared to accept offers from: this would entail using the Agent’s attributes.

Just for the fun of it, I have set our Agent to have the following attributes (see the run-agent.sh script):

--attributes="rack:r2d2;pod:demo,dev" 

which you can also see when hitting the /state endpoint:

Agent state

As we will shortly see, these are in the Offer response too, so we could use that to filter out those agents which we don’t quite like or trust.

Agent attributes can be any key:value pair, and have no meaning for Mesos: they are given semantics via the way in which they are used to selectively run workloads only on certain clusters/racks and/or select Agents for maintenance purposes.

Run the Wait for Offers frame, and you should see a response that looks something like this (the below is much simplified):

{"offers": [
   {...
    "attributes": [ {
            "name": "rack",
            "text": {
                "value": "r2d2"
            },
            "type": "TEXT"
        },
        {
            "name": "pod",
            "text": {
                "value": "demo,dev"
            },
            "type": "TEXT"
        }
    ],
    ...
    "resources": [
        {
            "name": "ports",
            "ranges": {
                "range": [ {"begin": 9000, "end": 10000}]
        },
        {
            "name": "cpus",
            "role": "*",
            "scalar": {"value": 2.0},
        },
        {
            "name": "mem",
            "role": "*",
            "scalar": {"value": 496.0},
        },
        {
            "name": "disk",
            "role": "*",
            "scalar": { "value": 4930.0 },
        }
    ],
    ...
] }

You can see there both the attributes as well the resources that are available from the Agent (the VM that we started, with 2 cores, and approximately 500MB RAM and 5GB hard disk).

For more info about Mesos resources and the syntax to describe those see this document

The role:"*" means essentially that those resources are not reserved for any specific role (e.g., “Prod” or “Marketing” or whatever) and can be allocated to any framework that requests them.

For more info about Mesos “roles” and their changing nature, as well as a discussion around Access Control Lists, or ACLs, see this document

Note that, whilst the title says “waiting for offers” in reality, those were already there waiting for us: shortly after registering, Master would have selected (using DRF) the most appropriate set of offers (made available from the Agents connected to it) and sent to us via the “streaming” channel.

Be that as it may, we gladly accept them and, in the next and last part of this series, we will use them to run an Ngnix container and serve static pages from our Agent VM.

A python notebook to experiment with the Apache Mesos HTTP API – Part 1 of 3

This is the first of a series of three articles that shows how to setup a Vagrant-based Apache Mesos test/development environment on your laptop; then how to run a Python notebook against the HTTP API; and finally, how to launch Docker containers on the running Agent VM.

Datacenter image

It is pretty jam-packed and requires a certain amount of familiarity with some concepts around containers, VMs and Mesos, but I am taking the time to show all the intermediate steps (hence, the 3-parts) and it should be easy to follow even if you have never used before Vagrant, Mesos or jupyter notebooks, for that matter.

A certain degree of familiarity with Python, requests and handling HTTP responses is going to be certainly helpful, as we will not be going into too much details there.

All the code is available on my zk-mesos git repository:

git clone git@github.com:massenz/zk-mesos.git

and you can also see the README there.

This series is an extended (and updated) version of the talk I gave at MesosCon Europe 2015 updated for Apache Mesos 1.0.0, which has just been released (August 2016) – you can also find the slides there.

Getting Started

In order to follow along, you will need to clone the repository (as shown above) and install Virtualbox and Vagrant: they are both super-easy to get going, please follow the instructions on their respective sites and you’ll be up and running (literally) in no time.

I also recommend to quickly scan the [Vagrant documentation]: a knowledge of Vagrant beyond vagrant up is not really required to get the most out of this series, but it may help if you get stuck (or would like to experiment and improve on our Vagrantfile).

If you are not familiar with Apache Mesos I would recommend to have a look at the project’s site: there are also a couple of good books out there, Mesos in Action being the one I would recommend (also having been one of the manuscript’s reviewers).

We will not be building it from source here, but will instead use Mesosphere packages: you don’t need to download them, the Vagrantfile will automatically download and install on the VMs.

To run the Python notebook, we will take advantage of the Jupyter packages, and use a virtualenv to run all our code: the latter is not strictly necessary, but will prevent you messing up your system Python.

The steps are pretty simple, and YMMV, but if you have never used virtualenv before:

$ sudo pip install virtualenv

and then create and run a virtualenv:

$ cd zk-mesos
$ virtualenv mesos-demo
$ source mesos-demo/bin/activate
$ pip install -r requirements.txt

Finally, verify that you can run and load the Jupyter notebook:

$ jupyter notebook

this should automatically open your browser and point it to http://localhost:8888, from where you can select the notebooks/Demo-API.ipynb — don’t run it just yet, but if it shows up, it will confirm that your Python setup is just fine.

Building and installing Apache Mesos

Here is where the beauty of Vagrant shows in all its glory: installing Apache Messos Master and Agent is not trivial, but in our case, it’s simply a matter or:

$ cd vagrant
$ vagrant up

(make sure to be in the same directory as the Vagrantfile when issuing any of the Vagrant commands, or it will complain about it).

It is worth noting that we are building two Vagrant boxes, so any command will operate on both unless specified; to avoid this, you can specify the name of the VM after the command; for example, to SSH onto the Agent:

$ vagrant ssh Agent

should log you in on that box, from where you can explore, experiment and diagnose any issues.

The vagrant up command will take some time to execute, but it should eventually lead your Virtualbox to have two VMs, named respectively mesos-master and mesos-agent – incidentally, you should never need to use VBox to manage them (all the tasks can be undertaken via Vagrant commands), but you can do that too, if necessary or desired.

Once the VMs are built, ensure you can access Mesos HTTP UI at: ;
you should also see one agent running, accessible either via the Master UI, or directly at: .

NOTE

the Agent runs at a different IP (obviously) than the Master, but also on a different port (5051 instead of 5050): look into vagrant/run-agent.sh to see a few of the command line flags that we use to run the Agent (and in run-master.sh for the Master).

Zookeeper

It’s worth noting that we are also running an instance of Zookeeper (for Leader election and Master/Agent coordination) on the mesos-master VM, inside a Docker container: partly because we can, but also to show how easy it is to do so using containers.

This one line (in run-master.sh), will give you a perfectly good ZK instance (albeit, a catastrophically unreliable one in a production environment, where you want to run at least 3-5 nodes, at least, an on physically separate machines/racks):

docker run -d --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 jplock/zookeeper:3.4.8

Wrap up

That’s pretty much about it: you are now the proud owner of a Master/Agent 2-node Apache Mesos deployment: welcome in the same league as Twitter and Airbnb production wizards.

In Part 2, we will run our Python notebook against the Master API and will accept the Agent’s offers to launch a Docker container.

Python Magic Methods

Intro

In his excellent Fluent Python book, Luciano Ramalho talks about Python’s “data model” and gives some excellent examples of how the language internal consistency is achieved via the judicious use of a well-defined API and, in particular, how Python’s “magic methods” enable the construction of elegant solutions, which are concise and highly readable.

And while you can find countless examples online of how to implement the iterative magic methods (__iter__() and friends), here I wanted to present an example of how to use two of the lesser known magic methods: __del__() and __call__().

For those familiar with C++, these implement two very familiar patterns: the destructor and the function object (aka, operator()).

Implement a self-destructing key

Note

The full code is available at filecrypt Github repository, and it has been more fully explained in this blog entry.

Let’s say that we want to design an encryption key which will be in turn encrypted with a master key and whose “plaintext” value will only be used “in flight” to encrypt and decrypt our data, but will otherwise only be stored encrypted.

There are many reasons why one may want to do this, but the most common is when the data to be encrypted is very large and time-consuming to encrypt: should the master key be compromised, we could revoke it, re-encrypt the (possibly, many, one-time) encryption keys with a new master key without incurring the time penalty of having to decrypt and re-encrypt possibly several TB’s of data.

In fact, re-encrypting the encryption keys may be so inexpensive (computationally or time-wise) that this could be done on a regular basis, rotating the master key at frequent intervals (e.g., weekly).

If we use OpenSSL command-line tools to do all the encryption and decryption tasks, we need to temporarily store the encryption key as “plaintext” in a file, which we will securely destroy (using the shred Linux tool).

Note

We use the term “plaintext” to signify that the contents are decrypted, not to mean plain text format: the key is still binary data, but, if gotten at that stage by an attacker, it would not be protected with encryption.

However, just implementing the call to the shredding utility as the last step in our encryption algorithm would not be sufficient to ensure that this is executed under all possible code paths executions: there may be errors, exceptions raised, the user my terminate gracefully (Ctrl-c) or abruptly (SIGKILL) the program, and so on.

Guarding against all possibilities is not only tiresome, but also error-prone: how about instead having the Python interpreter do the hard work for us, and ensure that certain actions are always undertaken when the object is garbage collected?

Note

The technique shown here will not work for the SIGKILL case (aka kill -9) for which a more advanced technique (signal handlers) needs to be employed.

The idea is to create a class which implements the __del__() magic method, which is guaranteed to be always invoked when the there are no further references to the object, and it is garbage-collected (the exact timing of that happening is implementation dependent, but if you try that in common Python interpreters, it seems to be almost instantaneous).

This is what happens on a macOS laptop, running El Capitan and Python 2.7:

$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
>>> class Foo():
...     def __del__(self):
...         print("I'm gone, goodbye!")
... 
>>> foo = Foo()
>>> bar = foo
>>> foo = None
>>> bar = 99
I'm gone, goodbye!
>>> another = Foo()
>>> ^D
I'm gone, goodbye!
$

As you can see, the “destructor” method will be invoked eithere when there are no longer references to it (foo) or when the interpreter exits (bar).

The following code fragment shows how we ended up implementing our “self-encrypting” key (I called it SelfDestructKey because the real feature is that it destructs the plaintext version of the encryption key upon exit):

This is a much simplified version of the code, focusing only on the __del__() method; please refer to the full version in the repository for the complete code.

class SelfDestructKey(object):
    """A self-destructing key: it will shred its contents when it gets deleted.

       This key also encrypts itself with the given key before writing itself out to a file.
    """

    def __init__(self, encrypted_key, keypair):
        """Creates an encryption key, using the given keypair to encrypt/decrypt it.

        The plaintext version of this key is kept in a temporary file that will be securely
        destroyed upon this object becoming garbage collected.

        :param encrypted_key the encrypted version of this key is kept in this file: if it
            does not exist, it will be created when this key is saved
        :param keypair a tuple containing the (private, public) key pair that will be used to
            decrypt and encrypt (respectively) this key.
        :type keypair collections.namedtuple (Keypair)
        """
        self._plaintext = mkstemp()[1]
        self.encrypted = encrypted_key
        self.key_pair = keypair
        if not os.path.exists(encrypted_key):
            openssl('rand', '32', '-out', self._plaintext)
        else:
            with open(self._plaintext, 'w') as self_decrypted:
                openssl('rsautl', '-decrypt', '-inkey', keypair.private, _in=encrypted_key,
                        _out=self_decrypted)

    def __str__(self):
        return self._plaintext

    def __del__(self):
        try:
            if not os.path.exists(self.encrypted):
                self._save()
            shred(self._plaintext)
        except ErrorReturnCode as rcode:
            raise RuntimeError(
                "Either we could not save encrypted or not shred the plaintext passphrase "
                "in file {plain} to file {enc}.  You will have to securely delete the plaintext "
                "version using something like `shred -uz {plain}".format(
                    plain=self._plaintext, enc=self.encrypted))

    def _save(self):
        """ Encrypts the contents of the key and writes it out to disk.

        :param dest: the full path of the file that will hold the encrypted contents of this key.
        :param key: the name of the file that holds an encryption key (the PUBLIC part of a key pair).
        :return: None
        """
        if not os.path.exists(self.key_pair.public):
            raise RuntimeError("Encryption key file '%s' not found" % self.key_pair.public)
        with open(self._plaintext, 'rb') as selfkey:
            openssl('rsautl', '-encrypt', '-pubin', '-inkey', self.key_pair.public,
                    _in=selfkey, _out=self.encrypted)

Also, note how I have implemented the __str__() method, so that I can get the name of the file containing the plaintext key by just invoking:

passphrase = SelfDestructKey(secret_file, keypair=keys)
encryptor = FileEncryptor(
    secret_keyfile=str(passphrase), 
    plain_file=plaintext,
    dest_dir=enc_cfg.out)

Obviously, we could have just as easily implemented the __str__() method to return the actual contents of the encryption key.

Be that as it may, if you look in the code that uses the encryption key, at no point we need to invoke the _save() method or directly invoke the shred utility; this will all be taken care of by the interpreter when either passphrase goes out of scope, or the script terminates (normally or abnormally).

Implement the Command Pattern with a Callable

Python has the concept of callable which is essentially “something that can be invoked as if it were a function” (this follows the Duck Typing approach: “if it looks like a function, and can be called like a function, then it is a function”).

To make a class object behave as a callable all we need to do is to define a __call__() method and then implement it as any other “ordinary” class method.

Say that we want to implement a “command runner” script that (similarly to, for example, git) can take a set of sub-commands and execute them: one approach could be to use the Command Pattern in our CommandRunner class:

class CommandRunner(object):
    """Implements the Command pattern, with the help of the __call__() magic method."""

    def __init__(self, config):
        """Initiailize the Runner with the configuration from parsing the command line.

           :param config the command-line arguments, as parsed by ``argparse``
           :type config Namespace
        """
        self._config = config

    def __call__(self):
        method = self._config.cmd
        if hasattr(self, method):
            callable_meth = self.__getattribute__(method)
            if callable_meth:
                callable_meth()
        else:
            raise RuntimeError('Unexpected command "{}"; not found'.format(method))

    def run(self):
        # Do something with the files
        pass

    def build(self):
        # Call an external method that takes a list of files
        build(self._config.files)

    def diff(self):
        """Will compute the diff between the two files passed in"""
        if self._config.files and len(self._config.files) == 2:
            file_a, file_b = tuple(self._config.files)
            diff_files(file_a, file_b)
        else:
            raise RuntimeError("Not enough arguments for diff: 2 expected, {} found".format(
                len(self._config.files) if self._config.files else 'none'))

    def diff_all(self):
        # This will take a variable number of files and will diff them all
        pass

The config initialization argument is a Namespace object as returned by the argparse library:

def parse_command():
    """ Parse command line arguments and returns a configuration object

    :return: the configured options, or `None` if just printing help.
    :rtype: Namespace or None
    """
    parser = argparse.ArgumentParser()

    # Removed the `help` argument for better readability; make sure you
    # always include that to help your user, when they invoke your script
    # with the `--help` flag.
    parser.add_argument('--host', default='localhost')
    parser.add_argument('-p', '--port', type=int, default=8080,)
    parser.add_argument('--workdir', default=default_wkdir)

    parser.add_argument('cmd', default='run', choices=['run', 'build', 'diff', 'diff_all'])
    parser.add_argument('files', nargs=argparse.REMAINDER")
    return parser.parse_args()

To invoke this script we would use something like:

$ ./main.py run my_file.py

or:

$ ./main.py diff file_1.md another_file.md

Worth pointing out how we also protect against errors using other two "magic" methods:

if hasattr(self, method):
    callable_meth = self.__getattribute__(method)

note that we could have used the __getattr__() magic method to define the behavior of the class when attemptiong to access non-existing attributes, but in this case it was probably easier to do that at the point of call.

Given the fact that we are telling argparse to limit the possible value to the choices when parsing the cmd argument, we are guaranteed that we will never get an “unknown” command; however, the CommandRunner class does not need to know this, and it can be used in other instances where we do not have such a guarantee (not to mention that we are only one typo away from some very puzzling bug, if we didn’t do our homework in __call()).

To make all this work, then we only need to implement a trivial __main__ snippet:

if __name__ == '__main__':
    cfg = parse_command()

    try:
        runner = CommandRunner(cfg)
        runner()  # Looks like a function, let's use it like one.
    except Exception as ex:
        logging.error("Could not execute command `{}`: {}".format(cfg.cmd, ex))
        exit(1)

Note how we invoke the runner as if it were a method: this will in turn execute the __call__() method and run the desired command.

We truly hope everyone agrees this is a way more pleasant code to look at than monstruosities such as:

# DON'T DO THIS AT HOME
# Please avoid castle-of-ifs, they are just plain ugly.
if cfg.cmd == "build":
    # do something to build
elif cfg.cmd == "run":
    # do something to run
elif cfg.cmd == "diff":
    # do something to diff
elif cfg.cmd == "diff_all":
    # do something to diff_all
else:
    print("Unknown command", cfg.cmd)

Conclusion

Learning about Python’s “magic methods” will make your code not only easier to read and re-use in different situations, but also more “pythonic” and immediately recognizable to other fellow pythonistas, thus making your intent clearer to understand and reason about.

filecrypt – OpenSSL file encryption

overview

Uses OpenSSL library to encrypt a file using a private/public key pair and a one-time secret.

A full description of the process can be found here.

configuration

This uses a YAML file to describe the configuration; by default it assumes it is in /etc/filecrypt/conf.yml but its location can be specified using the -f flag.

The structure of the conf.yml file is as follows:

keys:
    private: /home/bob/.ssh/secret.pem
    public: /home/bob/.ssh/secret.pub
    secrets: /opt/store/

store: /home/bob/encrypt/stores.csv

# Where to store the encrypted file; the folder MUST already exist and the user
# have write permissions.
out: /data/store/enc

# Whether to securely delete the original plain-text file (optional, default true).
shred: false

The private/public keys are a key-pair generated using the openssl genrsa command; the encryption key used to actually encrypt the file will be created in the secrets folder, and afterward encrypted using the public key and stored in the location provided.

The name will be pass-key-nnn.enc, where nnn will be a random value between 000 and 999, that has not been already used for a file in that folder.

The name of the secret passphrase can also be defined by the user, using the --secret option (specify the full path, it will be left unmodified):

  • if it does not exist a random secure one will be created, used for encryption, then encrypted and saved with the given path, while the plain-text temporary version securely destroyed; OR

  • if it is the name of an already existing file, it will be decrypted, used to encrypt the file,
    then left unchanged on disk.

NOTE we recommend NOT to re-use encryption passphrases, but always generate a new secret.

NOTE it is currently not possible to specify a plain-text passphrase: we always assume that
the given file has been encrypted using the private key.

The store file is a CSV list of:

"Original archive","Encryption key","Encrypted archive"
201511_data.tar.gz,/opt/store/pass-key-001.enc,201511_data.tar.gz.enc

a new line will be appended at the end; any comments will be left unchanged.

usage

Always use the --help option to see the most up-to-date options available; anyway, the basic
usage is (assuming the example configuration shown above is saved in /opt/enc/conf.yml):

filecrypt.py -f /opt/enc/conf.yml /data/store/201511_data.tar.gz

will create an encrypted copy of the file to be stored as /data/store/201511_data.tar.gz.enc,
the original file will not be securely destroyed (using shred) and the new encryption key to be stored, encrypted in /opt/store/pass-key-778.enc.

A new line will be appended to /home/bob/encrypt/stores.csv:

/data/store/201511_data.tar,pass-key-778.enc,/data/store/201511_data.tar.gz.enc

IMPORTANT

We recommend testing your configuration and command-line options on test files: shred erases files in a terminal way that is not recoverable: if you mess up, you will lose data.

You have been warned.

code

The code has been uploaded to github.

See the requirements.txt to install required Python libraries:

pip install -r requirements.txt

(the use of a virtualenv is recommended here).

To install OpenSSL in Ubuntu see this page, but it boils down essentially to:

 sudo apt-get install -y openssl

references