Python Does What?!?: 2017

Thursday, November 30, 2017

python3 set literals in 3, 2, 1....

>>> {1,2}.add(3)
>>> {1}.add(2)
>>> {}.add(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'add'

why no empty set literal:
https://mail.python.org/pipermail/python-3000/2006-April/001286.html
https://mail.python.org/pipermail/python-3000/2006-May/001666.html

Wednesday, November 29, 2017

a main by any other name

$ cat <<EOF > what_is_happening.py
if __name__ == "__main__":
    import what_is_happening
else:
    print("what is happening?")
EOF

$ python what_is_happening.py
what is happening?

Ambiguous entrypoints can create a maze of state in your program. In case the above example doesn't seem so bad, lets make it worse.

$ cat <<EOF > innocent_bystander.py
import what_is_happening

def func(): raise what_is_happening.TrustFall('catch me!')
EOF
$ cat <<EOF > what_is_happening.py
import innocent_bystander

class TrustFall(Exception): pass

if __name__ == "__main__":
    try:
        innocent_bystander.func()
    except TrustFall:
        print("gotcha!")
    except Exception as e:
        print('oops, butterfingers!')
        print('{} is not {}.... what have I done?'.format(
            type(e), TrustFall))
EOF

$ python what_is_happening.py
oops, butterfingers!
<class 'what_is_happening.TrustFall'> is not <class '__main__.TrustFall'>.... what have I done?

What happened? This is executing the main module twice, a special case of double import.

One solution is to put import guards in all entrypoint scripts:
if __name__ != "__main__":
raise ImportError('double import of __main__')

UnicodeDecode SyntaxError

When executing a bytecode for the '+' operation, an invalid byte will raise UnicodeDecodeError. However, when concatenating adjacent string and unicode constants, it will be a SyntaxError. (I guess because there is not byte-code executing this is happening at compile time.)

>>> u'a' + '\xff'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

>>> u'a' '\xff'
File "<stdin>", line 1
SyntaxError: (unicode error) 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

Thursday, August 24, 2017

sqlite does what

>>> import sqlite3
>>> c = sqlite3.connect(':memory:')
<sqlite3.Connection object at 0x10d25c9d0>
>>> c.execute('select null and 1').fetchall()
[(None,)]
>>> c.execute('select null and 0').fetchall()
[(0,)]
>>> c.execute('select null or 1').fetchall()
[(1,)]
>>> c.execute('select null or 0').fetchall()
[(None,)]
SQlite's docs are fantastic: https://sqlite.org/nulls.html

Saturday, April 29, 2017

a return to yield

I remember when, almost a decade ago, I was first discovering generators. It was a heady time, and I saw applications everywhere.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x
    return

I also remember the first time I tried to mix a return value into my generator.

def fib_gen():
    x, y = 1, 1
    while x < 100:
       x, y = y, x + y
       yield x
    return True

Imagine my surprise, as I'm sure countless others experienced as well:

SyntaxError: 'return' with argument inside generator

A rare compile-time error! Only the decorative, bare return is allowed in generators, where they serve to raise StopIteration.

Now, imagine my surprise, so many years later when I import that same code in Python 3.

...

Nothing! No error. So what happened?

Turns out that the coroutine and asyncio machinery of Python 3 has repurposed this old impossibility.

If we manually iterate to skip over our yield:

fib_iter = fib_gen()                                                                                                                                                                                                    
for i in range(11):                                                                                                                 
    next(fib_iter)                                                                                                                  
next(fib_iter)

We see what's really happening with our return:

Traceback (most recent call last):
  File "fib_gen.py", line 13, in <module>
    next(fib_iter)
StopIteration: True

That's right, returns in generators now raise StopIteration with a single argument of the return value.

Most of the time you won't see this. StopIterations are automatically consumed and handled correctly by for loops, list comprehensions, and sequence constructors (like list). But it's yet another reason to be extra careful when writing your own generators, specific to Python 3.

Wednesday, April 12, 2017

Bit by bit: CPU architecture

There are a variety of reasons you might want to know how many bits the architecture of the CPU running your Python program has. Maybe you're about to use some statically-compiled C, or maybe you're just taking a survey.

Either way, you've got to know. One historical way way is:

import sys
IS_64BIT = sys.maxint > 2 ** 32

Except that sys.maxint is specific to Python 2. Being the crossover point where ints transparently become longs, sys.maxint doesn't apply in Python 3, where ints and longs have been merged into just one type: int (even though the C calls it PyLongObject). And Python 3's introduction of sys.maxsize doesn't help much if you're trying to support Python <2.7, where it doesn't exist.

So instead we can use the struct module:

import struct
IS_64BIT = struct.calcsize("P") > 4

This is a little less clear, but being backwards and forwards compatible, and given struct is still part of the standard library, it's a pretty good approach, and is the one taken in boltons.ecoutils.

But let's say you really wanted to get it down to a single line, and even standard library imports were out of the question, for some reason. You could do something like this:

IS_64BIT = tuple.__itemsize__ > 4

While not extensively documented, a lot of built-in types have __itemsize__ and __basicsize__ attributes, which describes the memory requirement of the underlying structure. For tuples, each item requires a pointer. Pointer size * 8 = bits in the architecture. 4 * 8 = 32-bit architecture, and 8 * 8 = 64-bit architecture.

Even though documentation isn't great, the __itemsize__ approach works back to at least Python 2.6 and forward to Python 3.7. Memory profilers like pympler use __itemsize__ and it might work for you, too!

Tuesday, March 21, 2017

When you can update locals()

There are two built-in functions, globals and locals. These return dicts of the contents of the global and local scope.

Locals usually refers to the contents of a function, in which case it is a one-time copy. Updates to the dict do not change the local scope:

>>> def local_fail():
... a = 1
... locals()['a'] = 2
... print 'a is', a
...
>>> local_fail()
a is 1

However, in the body of a class definition, locals points to the __dict__ of the class, which is mutable.

>>> class Success(object):
... locals().update({'a': 1})
...
>>> Success.a
1

Monday, March 13, 2017

identity theft

>>> class JSON(int):
... from json import *
...
>>> json = JSON()
>>> json.dumps()
'0'