Singletons and their Problems in Python
The infamous Singleton design pattern is now widely seen as stupid and evil and also causes some hatred. Fortunately singletons in Python are not that common and few people use it. It seems to be a natural thing not to create a singleton class.
But beware. Just because you do not implement the singleton design pattern it does not mean you avoid the core problem of a singleton. The core problem of a singleton is the global, shared state. A singleton is nothing more than a glorified global variable and in languages like Java there are many reasons why you would want to use something like a singleton. In Python we have something different for singletons, and it has a very innocent name that hides the gory details: module.
That's right folks: a Python module is a singleton. And it shares the same problems of the singleton pattern just that it's a little bit worse.
Namespaces
So let's dive into the problems of Python modules by looking at a completely different language. Let's compare our beloved Python modules with C# namespaces for a moment. If you don't know C#, let me show you how they are declared:
namespace MyNamespace { class MyClass { } }
So as you can see, a namespace in C# is something you specify explicitly. On the surface that looks like the big difference between a Python module and a C# namespace. However, that's really just the surface. The biggest difference is that a C# namespace is something like a folder. You put stuff into it so that you can better organize it. And the only stuff you can stuff into a C# namespace are classes, interfaces, other namespaces, enums and delegates (something like a function prototype in C).
In Python a module is an object. And that object is an instance of a class called module and it has as many attributes as you like. You can put whatever you want on it. What's stored on there are usually the imported objects, the classes and functions declared in that module and other global variables or constants.
That means the big difference is that a Python module has the ability to store state, a C# namespace does not. There is nothing you can store on a C# namespace that could change at runtime. That means the only thing “stored” on a C# namespace is compiled code that was loaded from an assembly (something like a .pyc file in Python, just more portable).
So what are the implications?
There can only be one…
So I have already told you that modules in Python are simple objects with attributes. So what happens if you write import meh (Ignoring the obscure details about the Python import system)? First the Python interpreter checks if the module was already imported and if yes, it's using the already imported module, otherwise it's creating a new module object and executes the code that creates it.
The already imported modules are stored in a special dictionary inside the interpreter. sys.modules points to this dictionary, so you can access that from the Python code. Each module that was already imported (and also modules that are known to not exist) are stored in there to ensure there will only be one. So as you might have guessed, it's what we call a singleton.
The second step, the execution of code to create the module attributes is the second “problem” here. It's what creates the shared state or what can create the shared state. In order to not talk about irrelevant things, let's have a look at one of the modules from the standard library, the mimetypes module.
Have a look:
inited = False def init(files=None): global inited db = MimeTypes() ...
This is actual code from the mimetypes module that ships with Python, just with the more gory details removed. The point is, there is shared state. And the shared state is a boolean flag that is True if the module was initialized or False if it was not. Now that particular case is probably not that problematic (trust me, it is) because mimetypes initializes itself, but you can see that there is a files parameter to the init function. If you pass a list of files to that function, it will reinitialize the mime database in memory with the mime information from those files. Now imagine what would happen if you have two libraries initializing mimetypes with two different sources …
Now obviously, that's a problem of the library that implements it not of Python itself. Nobody should have shared state in module scope. Unfortunately there are many standard library modules that have that (cgi, logging, mimetypes, csv, …) and it seems to be standard practice in Python world. There is a lot of shared state in Django and nearly all modern frameworks, not just for the web.
Let it be None?
Now before I ask for more than one, I want to ask for none. Because this is the problem that freaks me out the most. I'm mainly doing Python webdevelopment and that means I have some long running processes that are managed by some external server I don't really control. Not only do I work with Python, I also obsessed by the idea to have extensible systems. Which is why a project of mine has a plugin interface. Users can upload new plugins in the web interface and activate and deactivate them.
What does all of that have to do with singletons and modules? Unfortunately too much. I told you already that once a module is imported, it's stored in that sys.modules thing. Now imagine a user uploads a new version of a plugin, he upgrades it. In order for the new code to load you would first have to shutdown the Python interpreter and restart it again. Unfortunately there is no way for a WSGI application to request a restart from the webserver.
So how does one unload a module to reload a new version? There is no documented way for that, and the thing I'm doing is dangerous, not portable, kills little kittens and you should never, ever do that.
The road to insanity or code reloading in Python:
- Put your reloading code into a separate module, one with a special name (
zine._corein my case) - Have some sort of lock.
- Acquire that lock, and do that when you're sure no other thread is executing code from your package (haha, good luck)
- Clear all modules from
sys.modulesthat belongs to your code, except the one that implements the reloader. - Import your package again and execute the code that sets up the application again.
This is dangerous and stupid. Imagine what happens if a thread is still active in the old code and you kick away the modules it's executing in. Because of weak references you could get rid of the global scope (the module one) a called function is still weakly referencing and the function would break with an obscure error.
Currently there is no solution for that problem, and I don't expect one to appear in Python anytime soon, at least not without breaking stuff. Because what we would need is …
… more Singletons
If one singleton does not solve the problem, a second one could. That's the point where you should disagree with me and call me names, but let me explain myself first. The problem is shared state, but why is shared state the problem? In Python development we seem to love shared state, a whole lot. And it does make development simple and lets you learn and understand the language quickly. The shared state is usually stored on modules or stuff stored on modules, so modules seem to be the root of all evil. There can only be one version of a module, what does this mean for us? Imagine we have one running Python interpreter, the following things do not work:
- that interpreter runs application A and application B, A wants libfoo in version 1.0, B wants libfoo in version 2.0, both API incompatible
- we can't reload code on the fly because we would have to tear everything down first and restart, we can't load the new version of the code and slowly moving over to it and get rid of the old code with the help of the garbage collector when it's no longer needed.
- we can't have two instances of the same application running in the same process that want different search paths for plugins loaded with the regular import API (instance 1 loading the modules below
app.pluginsfrom/var/www/instance1/pluginsand instance 2 loading the modules from/var/www/instance2/plugins)
The funny (and sad) part is that all these nice things do not work just because of one single object: sys.modules, the übersingleton of Python.
But we can't get rid of it because our modules are objects and we want to get the same module back if we import it in two different modules. So if we can't get rid of the singleton, add some more!
This solution would solve the problems of the three cases outlined above, but there would still be many problems left. Also there is no way this could be implemented in a backwards compatible fashion in Python due to the fact how pickle imports objects and how we refer to objects, but this is how it could work:
Tagging sys.modules
Currently the key for the items in sys.modules is the name of the module. In an ideal world, the keys would be tuples in the form (module_name, tag) where tag could be used for the following things:
- specify a specific version of the library (like
'1.0') - a secondary import of a library (like mimetypes import for library B)
- an random ad-hoc identifier to enforce fresh imports (think about testsuites and benchmarks that need to work on clean imports of a library because of … well … shared state …)
How to express which tag to use?
# a string literal as tag from sqlalchemy['0.6'] import create_engine # the contents of a variable as tag from zine.plugins[my_instance] import myrtle_theme
What if no tag is provided? No idea man.
What's your Point?
I guess … there is none. It shows a problem I have with Python and provides the first part of a solution. It explains why Zine is doing funny things and why there can only be one Zine instance per interpreter. It's some brainstorming I wanted to share with the world and maybe someone can use that to implement a new dynamic language that fixes that problem. It's not like that's a problem only Python has …
Some of the things I hate most about Python are the module and import systems - they have many corner cases (for example, it's not possible to do circular imports in some cases). I have done some pretty lame stuff to by-pass circular imports and I have seen that quite some others are facing the same problems...
I think all the problems would be solved if modules could not execute code and did not have shared state (like in C# or Java). Of course, this would make Python a lot different, but I think it would produce some cleaner code and force people to not do ugly hacks (for example early versions of CherryPy altered the importer's environment) .
— amix on Friday, July 24, 2009 23:37 #
Yeah, I've run into problems with the mimetypes library module state before. Nice post.
— Michael Foord on Friday, July 24, 2009 23:42 #
Isn't it called SINGLETRON?
— Ruben on Saturday, July 25, 2009 1:33 #
Nice article!! Very
By the way I think you missed the R in singletron.
— Rolv Wesenlund on Saturday, July 25, 2009 1:33 #
Erlang has the best "story" for real-world code reloading on the fly.
The gross way to get the best of both worlds (Erlang reloading plus conventional procedural coding in Python) is to have (1) a Erlang session running always, (2) a thin wrapper of Erlang code that (3) farms out the real work to a separate Python process where all communication is done with message passing. It is disgusting, but it avoids all the failure modes listed. But it will not win any beauty awards. People will curse the inefficiency - but what good is efficiency + tempting failure modes? -- you can trip over dozens of failure modes when you make the changes you already know you have to make on a regular basis, all in the name of "efficiency". Your latency and throughput will still be good - paying the price in more processes.
— manuelg on Saturday, July 25, 2009 3:13 #
Not to distract from the problems associated with modules being singletons, but shouldn't the reload builtin do handle most of the issue with updating modules?
To solve the legacy versioning issue, how about convincing module maintainers to keep older versions of a module in an appropriate subdirectory so that one could:
from foo.version_y import func as func_y
— Benjamin on Saturday, July 25, 2009 6:18 #
I agree with the main point of imports being somewhat restricting. And these restrictions can be remedied on the language level. And these restrictions force us to implement dirty hacks. Therefore they should be remedied.
OTOH, wouldn't it be safer to have a class singleton (Borg?) instead of treating modules as singletons. This way you can avoid import oddities.
I would move all state in the module inside a class that adheres a specific API (load/unload?). Then wouldn't it be less complicated to upgrade?
Am I making any sense? Or is this something you have considered before and found out not to be feasible?
— Atamert Ölçgen on Saturday, July 25, 2009 7:23 #
Nice brainstorming. Maybe I'm missing something, but does not builtin function reload do exactly what you want?
— Domen Kožar on Saturday, July 25, 2009 7:49 #
reload is fairly limited and stupid
it will certainly break in the face of complex applications
— Ronny Pfannschmidt on Saturday, July 25, 2009 9:21 #
I beleive a "simple" sollution would be if the Python interpreter will allow creating your own import system.
That should enable to avoid using the import statement, the default search paths and the sys.modules singleton.
Any ideas how complicated this is?
— Damjan on Saturday, July 25, 2009 14:25 #
I havn't actually tried this, but mod_wsgi 2.5 will restart the wsgi daemon process if you touch the wsgi handler, so perhaps you could get your web application to do that. I'd like to think it should never be neccesary though.
— Scott Sadler on Saturday, July 25, 2009 15:32 #
So you want dynamic module reloading and an externally managed process? Sounds like you get what you ask for.
Normally you just wrap your process with a supervisor that restarts on a signal.
— Frog on Sunday, July 26, 2009 5:04 #
When I had a closer look at zine, I noticed the code you mentioned in zine._core. I'm happy you see this as critical as I did, when I saw it the first time. ;)
— Pascal on Sunday, July 26, 2009 11:45 #
There seems to be a similar, but easily avoidable problem/bad practice in Lua (if I got it right) discussed at lua-users.org/wiki/LuaModuleFunctionCritiqued
@11: Yes, that works (since mod_wsgi 1.0 or before, IIRC) and allows for a user to restart an app without requiring (root) administration privileges for the Apache HTTPd.
@3, @4: No, it's "singleton". The one with the 'r' is the cult movie "Tron" :)
— Jochen Kupperschmidt on Sunday, July 26, 2009 13:03 #
You could borrow the code that EVE uses to do their module reloading: blip.tv/file/1949687
I have used "reload()" to good effect in the past as well. If you are careful it can work well.
Python import is pretty bad though, it needs to be reworked.
— patrick on Monday, July 27, 2009 3:35 #
Those of you who are on "just use reload" didn't get the point...
We're after module versions and "namespace" collisions here...
— Markus on Monday, July 27, 2009 15:49 #
While singletons are obviously issue sometimes and all you said until reloading is true,
i reckon you started to be wrong at this point: "Unfortunately there is no way for a wsgi application to request a restart from the webserver."
Who told you there is no way? Why not create an API to webserver to make restarts? Simplest possible: kill -HUP self.pid
Of course, you'd need something a bit more complex to make it work right way. But that would really be small work!
In our production environment we run several instances of WSGI webserver (at least 2, at most number of cores on server). Frontend balancer (nginx in our case) proxies to all WSGI apps in turn. To reload code:
No downtime, no mess with reload. Step 5 may introduce a bit stress on a single instance. If that's a problem, you need to install another server anyway.
— temoto on Monday, July 27, 2009 16:08 #
I did not say reloading does not work. If you control your server there are ways. But GUI applications might want to reload parts of the system at runtime too.
Erlang shows how that can be done for example.
Anyway. Point is: it could be fixed in the language and it would also allow things like multiple library versions at one. Reloading is just part of the issue.
— Armin Ronacher on Monday, July 27, 2009 19:21 #
PyGTK and GStreamer bindings for Python do such thing by issuing some kind of require(version) to load appropriate version of library. It's ugly.
— zgoda on Tuesday, July 28, 2009 18:12 #
Very good article. Thank you for the post.
Just to say u, u miss the r in Singlertons.
— Marie Rencontre on Wednesday, July 29, 2009 9:53 #
Nice insights! But the correct spelling is Singletoon, in opposition to Marriedtoon.
— Kevin Flynn on Wednesday, July 29, 2009 21:44 #
The real solution is not to use import machinery for plugins, at least not directly. Use exec or execfile. Inject your own __import__ function into namespace of the plugin to control its imports. Now you can have two different versions of the plugin loaded at the same time.
— Serge on Saturday, August 1, 2009 9:56 #
Yeah, exec is a great solution too.
— temoto on Sunday, August 2, 2009 12:52 #
You pinpointed one big singleton problem, but there is another one called the GIL. For modules one could use something like Tcl's interpreters (each has its own module loader, shared nothing, just put your plugins in one interpreter each and it gets a totally clean environment and can easily be reloaded AND sandboxed), but Python broke that early, because its not needed and does not work with the way Python works.
— schlenk on Monday, August 3, 2009 8:02 #
I believe Trac handles module (plugin) reloading in some way, even when running as a WSGI application. If you reconfigure Trac by installing, removing, or upgrading plugins, they will eventually become visible on the site without even a server restart. I believe what happens is that as the current WSGI processes run, they notice there has been a change to the configuration and kill themselves. The supervisor, the web server in this case, starts new processes to fill the process pool and the new processes load the new modules. While it may not be totally ideal from the server and application point of view, it's pretty much completely transparent to the user.
— Kamil Kisiel on Thursday, August 6, 2009 0:23 #
Nice article. The C# reference has me somewhat baffled however - as I thought C#'s namespaces were fairly similar to C++'s.
A C++ namespace is nothing like a directory - because there is no way to go from the namespace to its contents. Indeed, that's one of its virtues - I can create symbols inside a given namespace from different compilation units which don't know about each other.
A C++ namespace is literally just that - a way to make names unique.
I thought C#'s namespaces were similar. If so, you should note the fault in your analogy.
— Tom Ritchford on Thursday, August 6, 2009 15:45 #
Hmm, that came off as meaner than I meant. But I really did enjoy the article - very into the tech specifics - and just wanted to add a little correction.
— Tom Ritchford on Thursday, August 6, 2009 15:48 #
There is ``reload()`` to reload modules.
docs.python.org/library/functions.html#reload
— snafu on Friday, September 4, 2009 9:56 #
Sorry.
— snafu on Friday, September 4, 2009 9:59 #
I like the analysis but rather think about a radically different solution: avoid the import statement alltogether and performall module instantiation through a loader object. The loader:
that can be created multiple times
this means that all module-global states are effectively tied to a particular loader instance. A nice property of this scheme is that it requires no syntax changes and i believe it could be made compatible to all Python versions. Applications could incrementally move to be written in this style.
— holger.krekel on Wednesday, October 21, 2009 15:49 #