written on September 21, 2011
Turns out, this does not work reliably, in fact it will only work when packages are involved. I originally wrote the core for Flask extensions and it appeared to work, but I never verified that it works without extensions being involved. And in fact the module cleanup breaks it. Apparently Python does clean it up on module deallocation.
For a long time Python's import system was (although customizable) at the very core a black box. You could hook into some parts of it but others were hidden from you. On top of that the only signalling that the import system has is “here is your module, be happy” or “oh look, an import error”. Unfortunately Python's exceptions are an example of a stringly typed API, and one of the worst.
But one step after another. What's the actual problem of that black box. it works, right?
The problem arises when you start doing things and want to respond to errors. A good example are imports where you try to import something and if that fails you want to do something else. For instance you have a module name as a string and you want to try to import that. If that module does not exist (not if it fails to import!) you want to do something else. Django's middlewares for instance are defined as strings in the configuration module and if there is a typo you want to tell the users where the problem is.
If you import module A and if that does not exist you want to fall back to module B, you don't want to swallow the import error of module A since that one might have been a dependency that failed loading.
Consider you have a module called foo
that depends on a module named
bar
. If foo
does not exist you want to retry with simplefoo
. This
is what nearly everybody is doing:
try:
import foo
except ImportError:
import simplefoo as foo
However if now foo
is failing to import because bar
is missing you get
the import error “No module named simplefoo” even though the correct error
would have been “No module named bar”.
The problem is that Python does not provide you with information if the
module was not found or failed to import. In theory you could build
yourself something with the imp
module that splits up finding and
loading but there are a handful of problems with that:
The Python import process is notoriously underspecified and exploited
in various ways. Just because an importer says it finds a module it
does not mean it can properly import it. For instance there are many
finders that will tell you that find_module
succeeded just to fail
later with an error on load_module
.
The Python import machinery is complex and even with the new
importlib
module everything but easy to use. To replicate the logic
that Python is applying to locate modules you need around 80 lines of
code, even with importlib
available.
The import process is highly dynamic and there are various ways in
which people can customize the importing, going beyond what is
possible with regular import hooks by overriding __import__
.
The second possibility that is actually in use sometimes is parsing the error message of the import error. This however is a lost cause because the error message is implementation defined and differs quite often. On top of that is the import machinery in Python a recursive process and gives very awkward results:
>>> import missing_module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named missing_module
>>> import missing_package.missing_module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named missing_package.missing_module
>>> import xml.missing_module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named missing_module
As you can see, the error message does not even include the whole import path at all times. Sometimes the error message is something completely unrelated, sometimes the whole error message is just the module name. Sometimes it's “No module named %s”, sometimes the module name is on quotes. This is because various parts of the system can abort an import process and since this is customizable …
The way imports work is that at a very early point an entry in
sys.modules
is created for the new module. When the module code is
executed it will be executed in a frame where the globals of the frame are
the dictionary of the module in sys.modules
. As such this is valid in
Python:
import sys
a_value = [1, 2, 3]
this = sys.modules[__name__]
assert a_value is this.a_value
Now in theory one could think that if an import fails we will have a
partial entry in sys.modules
left to introspect if the import failed at
a later point. This however is usually not the case because on import
errors caused by the actual importers an importer is required to remove
the entry in sys.modules
again so we don't have much luck there.
Consider this fail_module.py
:
import sys
# this works
this = sys.modules['fail_module']
# this fails
import missing_module
If we however attempt to access fail_module
later it will be gone:
>>> import sys
>>> import fail_module
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fail_module.py", line 7, in <module>
import missing_module
ImportError: No module named missing_module
>>> import sys
>>> 'fail_module' in sys.modules
False
Since we also can't replace sys.modules
with a custom data structure
where we get callbacks when things are inserted we have no chance there.
I had to solve this problem again yesterday when I worked on a way to get rid of namespace packages in Flask without pissing existing users off. I think I found something that works reliable enough where I don't want to shoot myself for writing the code.
The idea is that if you get an import error you don't only get an import error but also a traceback object if you want. And that traceback object has all the frames of the traceback linked to it. If you walk the traceback you can find out if at any point the module you attempted to import was involved. If that was the case, the module succeeded in loading and something that it did resulted in an import error.
Now obviously there are downsides of this approach, so let's go over them:
It assumes that the module we import does not override __name__
.
Since that is a horrible idea anyways that's something we can ignore.
It assumes that there will be at least one traceback frame originating from that module. This will not be the case if that module was a C module that dynamically imported another module. This however is negligible since this is on the one hand a very uncommon thing to do and secondly this comes with its own set of problems.
It walks a traceback so your JIT will not be happy with that. On the other hand you should only import modules in non critical code paths anyways.
So how does the code look?
import sys
def import_module(module_name):
try:
__import__(module_name)
except ImportError:
exc_type, exc_value, tb_root = sys.exc_info()
tb = tb_root
while tb is not None:
if tb.tb_frame.f_globals.get('__name__') == module_name:
raise exc_type, exc_value, tb_root
tb = tb.tb_next
return None
return sys.modules[module_name]
You can use it like this:
json = import_module('simplejson')
if json is None:
json = import_module('json')
if json is None:
raise RuntimeError('Unable to find a json implementation')
Generally the implementation is straightforward. Try to import with
__import__
, if that fails get the current traceback and see if any of
the frames originated in the module we tried to import. If that is the
case, we reraise the exception with the original traceback, otherwise just
return None
to mark a missing module.
Since None
has a special meaning in sys.modules
which marks an import
error we know that an imported module never is None
and we can use this
as return value to indicate a module that does not exist. If we would
instead raise an exception we would have the very same problem again since
exceptions bubble up and we don't know if someone would handle it. So
raising something like ModuleNotFound
instead of returning None
would
cause troubles if the module we import recursively imports something with
import_module
and does not handle the exception.
Now you would think this only makes sense that it works, but it actually
surprised me that it does. The reason it surprises me is that Python
normally shuts down modules in a very weird way by setting all the values
in the global dictionary to None
. Since the actual modules is long
gone when you get the import error you would think that the reference to
the globals you have is full of None