Armin Ronacher

Whitespace Sensitivity

written by Armin Ronacher, on Tuesday, July 1, 2008 12:26.

I was reading a thread on ruby-forum.com about Python that said that the whitespace-sensitivity of Python is from hell or something. There are people from every programming language that can rant about Whitespace sensitivity in Python but clearly not Ruby programmers. Why? Because Python doesn't care about Whitespace at all. The only thing that somewhat has to do with whitespace is the indentation that the lexer convers into indent and outdent tokens. But after that, no whitespace any more, the parser doesn't know anything about that.

That however is not true for Ruby! foo[42] does a completely different thing than foo [42]. The first calls foo without argument and calls the [] method of the return value with 42 as argument, the latter calls foo with [42] as Argument which happens to be an Array with one element. But there are more examples.

Take this example:

foo = 23
def bar
  42
end

puts bar/foo
That prints “1”. That prints “1”.

However take this minor modification:

foo = 23
def bar
  42
end

puts bar /foo

Now this gives you an error that the regular Expression literal is unterminated. That's what I call whitespace sensitivity :)

You're wonderhing why I'm using a method for “bar” and not a locale variable? Because the parser keeps track of all assigned local variables or methods (Not sure what exactly it does) and the syntax ambiguities are resolved that way.

Comments

  1. Fortran is probably the only language which will pass your white-space insensitivity test for all values of test.

    But I haven't done much research on the topic...i just know that Fortran can deal with whitespace inside variables etc because it was run from punch cards which can accidentally have white space added.

    —  Mark Bradley on Tuesday, July 1, 2008 14:12 #

  2. You're grasping at straws. Find a code sample from a reputable programmer that has separated an array from it's index. Or find an example where space around an operator is uneven.

    I've never seen:

    a = b +5
    or worse...
    a = b *5 //am I multiplying or asking for an address

    in a reputable code sample.

    The criticism of python for whitespace is silly. I don't use python much, but I have no issue with it. But your response is even sillier.

    —  tim on Tuesday, July 1, 2008 14:47 #

  3. The example you give above is a pretty common one. Ruby is pretty expressive so of course you're going to run into these cases occasionally, the key is being able to identify them. Does this mean that Ruby is whitespace sensitive? To some degree as all languages are.

    (Your argument that the indentation found in Python only "somewhat has to do with whitespace" is much more interesting.)

    —  Garren on Tuesday, July 1, 2008 14:47 #

  4. @tim: That was one example. There are a lot more. The unary/binary operator thing is actually a very good one. Because the ambiguities involved there are resolved by whitespace:

    paste.pocoo.org/show/78278/

    @Garren: I consider a language as whitespace insensitive if whitespace is not part of the grammar. I doubt that there is an official definition of what a whitespace sensitive language is :-)

    —  Armin Ronacher on Tuesday, July 1, 2008 15:11 #

  5. The point here is that every language that I am aware of is "whitespace" significant.

    Some more than others.

    —  markus on Tuesday, July 1, 2008 15:24 #

  6. Hmm...

    In the .NET world (and amongst a few Python programmers) it is common to use:

    something.some_method ( args )

    Rather than (the preferred in my opinion):

    something.some_method(args)

    Is there potential ambiguity there?

    I also see very different habits in expressions. Some people prefer:

    a + b * c

    Whilst others prefer:

    a+b*c

    So accidentally ending up with 'a = b +5' or 'a = b *5' doesn't seem unlikely at all.

    Michael

    —  MIchael Foord on Tuesday, July 1, 2008 15:30 #

  7. Ok, so don't call python's issue a whitespace issue. Many programmers don't like blocks delineated by indentation. Coming up with an obscure whitespace issue in ruby is irrelevant and doesn't change the critique of python.

    I can't imagine good programmers having the issues you raised in ruby, or in the very least, not fixing it pretty quickly.

    Can I see good programmers hosed up because they are sharing a python file and one guy uses a tabs and the other uses an editor that converts tabs to spaces. That sounds like a real problem.

    Again, it wouldn't stop me from using python, but I don't see how your example is supposed to be equivalent to the issue some people have with python. You like the language, obviously, so use it.

    —  tim on Tuesday, July 1, 2008 16:32 #

  8. @tim: Which indentation issue? I never heard of a Python programmer that had a real problem with the indentation. Even if you are working with someone that indents using tabs a good editor will adapt automatically.

    I never said that this is an issue with Ruby, I just said that Ruby is whitespace sensitive. Nothing more and nothing less.

    —  Armin Ronacher on Tuesday, July 1, 2008 16:56 #

  9. Actually, that setup is pretty common in an editor, especially general programmer's editors like vi and emacs. I configure how I want mine to behave when I type, but I don't want that retroactively applied when I open up someone else's work.

    Regardless, lots of people don't like python's block handling. They may be misusing the word "whitespace" when they make that criticism. But pointing out that <code>deffoo()5.timesdoputs"hello"endend</code> isn't valid ruby doesn't discount their criticism, just their semantics.

    —  tim on Tuesday, July 1, 2008 17:18 #

  10. I found this very issue to be very frustrating in Ruby. The optional closing parens for function calls is a strange decision IMHO. For example:

    def x(n) 2 * n end
    def y(n) 3 * n end
    def z(n) 4 * n end
    a = x y z 2

    What I really wanted was something like:

    a = x(y(z(2)))

    When I was programming Ruby, I ended up just using parens almost all the time simply because allowing an opportunity for misunderstood syntax just didn't seem worth it. For the very same reasons, I always use curly braces for if statements in C/Javascript/Java/C#, etc.

    —  Eric Larson on Tuesday, July 1, 2008 18:12 #

  11. @Eric Larson: because of the ambiguity you mention, not using parenthesis on such cases throws a warning.

    —  Sunny on Tuesday, July 1, 2008 21:29 #

  12. "Because Python doesn’t care about Whitespace at all."

    Which is why you can write:
    defbar():return32

    Right?

    —  agnoster on Tuesday, July 1, 2008 22:37 #

  13. check this out, this won't work either:

    foo = 12

    puts f o o

    —  your a dumb on Tuesday, July 1, 2008 23:48 #

  14. I am currently coding in Ruby but recently I was doing a little Python programming. I was cutting and pasting some code from a webpage and the indentation was screwed up after the pasting. Arghhh!

    —  PC on Wednesday, July 2, 2008 3:23 #

  15. @12: I was refering to the actual grammar. After the lexing all is left are indent/outdent/newline tokens. I admit that the statement above was a big too harsh :)

    —  Armin Ronacher on Wednesday, July 2, 2008 3:32 #

  16. And `def bar` is not the same as `defbar` is not the same as `de fb ar`! WAKE UP SHEEPLE

    —  web design on Wednesday, July 2, 2008 14:45 #

  17. dumb blogger...

    —  dumb on Thursday, July 3, 2008 5:26 #

  18. I just recently came across "Python: Myths about Indentation" (which also has been posted by GvR on a forum) and just want to share it with you: www.secnetix.de/olli/Python/block_indentation.hawkSo far I think it's one of the better explanations and has a good potential to settle the often fanatic discussions.

    Python's requirements to the programmer in this very case turned out to be a great relief in regards of writing and reading, but itself are derived from languages that influenced Python (especially ABC, as far as I can tell). Actually I'd sometimes be happy to trade curly braces as block delimiters for Python's style in quite some other languages (including Ruby and Perl, but why not also nearly every common language including Java and maybe plain C?).

    —  Jochen Kupperschmidt on Saturday, July 5, 2008 12:57 #

  19. Is it just you thats retarded, or python programmers in general?

    —  LOL on Saturday, July 19, 2008 7:55 #

  20. Looks like you annoyed a few Ruby fanboys here. Way to make your language community seem mature guys.

    —  Michael Foord on Thursday, July 31, 2008 9:57 #

  21. The Ruby ambiguities remind me of C. The designers of Java had been hurt by C and paid a lot of attention potential parsing gotchas. I think Java will qualify as non-whitespace sensitive.

    I would say my definition of a whitespace sensitive language is when the inclusion of extra whitespace between two tokens, that could other wise told apart, alters the semantics. I hope that's clear!

    —  David Roussel on Friday, September 26, 2008 22:25 #

Leave a Reply