Monday, October 15, 2012

Bundling Python files into a stand-alone executable


One of the problems with building a medium to large sized program in Python (or similar scripting languages) is distributing it to users. When a Python script grows beyond a couple hundred lines, most programmers prefer to split that single script file into multiple Python modules and packages. For an individual developer, modules and packages are primarily an aid in mental organization, though they also ease navigating around the project. For a large Python program being developed by a team, modules and packages are an important way to communicate the structure and intent of the code.

Unfortunately, distributing a multi-module Python program has a number of problems. First, you must carefully assemble all your program's dependencies in a single directory tree. Second, you need to make a zip or tarball of the directory tree for distribution. Third, you need to instruct your end users on how to unpack the zipped or tarballed program and how to correctly set their PYTHONPATH and which Python file or shell script in the directory tree to invoke to run your program.

Python has long included the distutils module to help developers distribute Python code. Distutils is focused on distributing Python modules and packages for use by other Python developers and is great for its intended purpose; it can also install shell scripts in the standard operating system command directory (such as /usr/local/bin on most UNIX-derived systems). It has a big problem though: Python libraries installed by distutils are made available to all Python code unless special care is taken. If you include any common third-party libraries in your program, you run the risk that your end user may have a different, possibly incompatible version of that library already on their system. You risk breaking other Python programs, and being broken in turn if you share libraries. Windows users have long dealt with DLL Hell, a similar problem where different Windows applications would install incompatible versions of shared libraries.

Today even the computer in your pocket has dozens of gigabytes of storage so modern development has moved away from sharing library code between programs. For Python developers, virtualenv allows you to quickly and easily create separate virtual Python installations on a single computer, each one isolated from the others and from the "real" Python installation. You can install Python modules and packages in one virtualenv without affecting the others. Used along with the pip package manager, it's easy to document and recreate a virtualenv Python environment, which is a boon to Python web developers.

Virtualenv is still overkill for end users, technical or not, who simply want to run your program in order to get their work done. Fortunately, Python quietly added a new feature in 2.5 that makes it possible to bundle up a directory full of Python code into a single executable file. I say "quietly" because Python 2.5 was released in 2006 and I only heard about this feature now in 2012, six years later. (Okay, it's possible I wasn't paying close attention. :-) Typical of Python, the feature isn't pretty but it has a certain elegance to it: the __main__.py file.

How to use a __main__.py file
The Python documentation for the __main__.py file explains its purpose succinctly but barely hints at the possibilities. I'll try to do a better job. Lets start by creating a directory for our Python application named app:
$ mkdir app
Now open your favorite text editor and create the file app/__main__.py. Add the following code to it:
# file app/__main__.py

def main():
  print('The rain in Spain falls mainly in the plain.')

if __name__ == '__main__':
  main()
If you've done some Python programming, you'll recognize the __name__ == '__main__' idiom used to determine if a python module is being executed directly rather than imported as a module. When it's executed directly, the example simply calls the main() function, which prints "The rain in Spain falls mainly in the plain." to standard out.

Now let's run this program. Instead of calling __main__.py directly, we can treat the app directory as our Python program:
$ python app
The Python interpreter sees that app is a directory and checks for a __main__.py file inside it. Note that Python only checks the top level of the directory; it doesn't search subdirectories. Since there is a __main__.py directly in app, the interpreter runs it and the output is:
The rain in Spain falls mainly in the plain.

In addition, the Python interpreter will add the directory to the start of sys.path so that all imports will check the that directory first. By placing all of the modules and packages that our program depends on in the directory, we can stay isolated from whatever versions the end user may have installed as well as keep our dependencies isolated from the end user's system.

Zip it up
Python has supported loading modules and packages out of a zip file since 2.3. Just as it now looks in a directory for __main__.py, Python will also look in a zip file for __main__.py. Let's zip up the app directory and test this.

Note that the __main__.py file needs to be at the top level in the zip container, not in a subdirectory. This makes creating the zip file a little tricky. We want to recursively zip up everything in our app directory, but not include the app directory itself. (Windows users will need a command line zip program to follow along.)
$ cd app
$ zip -r ../app.zip *
$ cd ..
(Use *.* instead of * on Windows.)

To test that you've zipped things up correctly, run your Python program directly from the zip file:
$ python app.zip
You should see the expected output:
The rain in Spain falls mainly in the plain.

Python will place the zip file first on sys.path just as it does for a directory; all modules and packages imports will search the zip file first. Be sure to place your modules and packages at the top level in your directory along side the __main__.py file.

Load a resource
If you've put all your Python code in the right place using this scheme, everything pretty much just works as you expect it to. But some programs depends on resources aside from Python code, and need to load various data files that come bundled with the program. The easiest way to find and load a program bundle like this is to use the pkg_resources module. The pkg_resources module does a lot of things, but you'll want to look first at the ResourceManager API which has the most common functions for finding and loading resource files.

Let's add a resource file to our little app and load it using the pkg_resources.resource_string function. Create a subdirectory under app called resources.
$ mkdir app/resources
Using your favorite text editor again, create the file app/resources/inFrance.txt and add some text to it:
But the ants in France are mainly in your pants.
Now edit app/__main__.py so that it looks like this:
# file app/__main__.py

import pkg_resources

def main():
  print('The rain in Spain falls mainly in the plain.')
  print(pkg_resources.resource_string('resources', 'inFrance.txt'))

if __name__ == '__main__':
  main()
You may already have pkg_resources.py installed on your system. If you don't, you'll find it's part of the distribute package. Download the latest version of distribute, unpack the tarball and find pkg_resources.py inside. Copy pkg_resources.py to app/pkg_resources.py. (Even if you already have the pkg_resources module on your system, if you use it in your program, you should add it to your bundle before distributing it to others.)

Now when you run the program:
$ python app
You should see this output:
The rain in Spain falls mainly in the plain.
But the ants in France are mainly in your pants.

Make it executable
Finally, you can turn your zipped program bundle into a stand-along executable on UNIX-like systems using a couple of commands. Zip up the latest version of the program in the app directory and name it app2.zip.
$ cd app
$ zip -r ../app2.zip *
$ cd ..
Now use a bit of UNIX magic to turn app2.zip into an executable.
$ echo '#!/usr/bin/env python' | cat - app2.zip > app2
$ chmod +x app2
The first command inserts a UNIX shebang at the start of the zip file and writes it to a new file called simply app2. The zip file format is designed to allow a small executable program to be inserted at the front (that's how self-extracting zip files are created), so this is kosher and doesn't corrupt the zip file. The second command sets the executable bits on app2.

Now you can simply run app2 like any executable.
$ ./app2
And you should see the expected output.
The rain in Spain falls mainly in the plain.
But the ants in France are mainly in your pants.

9 comments:

Davide said...

Nice and useful post! Is there a way to solve that last step (making the executable) in Windows?

Anonymous said...

The quick and easy way is to give the zip file a unique extension, like ".pyexe" and create an association between Python and that extension.

The slicker way would be to create a small executable in C that searches the directories on the PATH environment variable until it finds Python.exe, then executes Python with the path to the small executable. Then you would embed the small executable at the start of the zip file (I think the DOS copy command can be used to do this, but the zip command may also be able to do this). Now instead of a self-extracting zip, you have a self-launching Python program.

WinCrazy said...

The best way to make an EXE in MSW is to use PY2EXE. It packages the python interpreter DLL and every dependent file. It also makes this entire discussion moot !

The downside to PY2EXE is that the simplest EXE will be 3 or 4 MB in size. However, disk space is extremely inexpensive these days.

Anonymous said...

@WinCrazy That's good to know. Is PY2EXE part of the standard Python distribution on Windows?

Given that Windows doesn't include Python by default like OS X and most Linux/UNIX systems, bundling Python.exe and the needed standard library files is probably the best way to go for distributing to Windows users.

Unknown said...

Don't feel bad about missing the __main__.py feature. It came out in python 2.6 (the docs saying 2.5 are in error) and they didn't do a great job of advertising and explaining it.

Thanks for the post though. It gave me a good feeling of "suffering together" i.e. I'm not doing this part completely wrong, it's just weird in ways.

Rhitik said...

Cool stuff.

Unknown said...

to avoid having to include pkg_resources just use pkgutil.get_data() to access resources:

print(StringIO.StringIO(pkgutil.get_data('resources', 'inFrance.txt')))

pkg_resources uses pkgutil.get_data() anyway .. and watch out if loading unicode in python2, pkgutil.get_data returns binary data!

Nick Kay said...

Ok and if you want to ensure you only include compiled bytecode instead of exposing .py files....is this still the way to go?

Anonymous said...

I haven't tested that, but I don't see why not. Just compile all your .py files to .pyc and only include the compiled ones in the zip.