Tuesday, 5 July 2016

Why is Python slow? « kmod's blog

Why is Python slow « kmod's blog: "Python spends almost all of its time in the C runtime



 This means that it doesn't really matter how quickly you execute the "Python" part of Python.  Another way of saying this is that Python opcodes are very complex, and the cost of executing them dwarfs the cost of dispatching them.  Another analogy I give is that executing Python is more similar to rendering HTML than it is to executing JS -- it's more of a description of what the runtime should do rather than an explicit step-by-step account of how to do it.



Update: why is the Python C runtime slow?

Here's the example I gave in my talk illustrating the slowness of the C runtime.  This is a for loop written in Python, but that doesn't execute any Python bytecodes:

import itertools
sum(itertools.repeat(1.0, 100000000))
The amazing thing about this is that if you write the equivalent loop in native JS, V8 can run it 6x faster than CPython.  In the talk I mistakenly attributed this to boxing overhead, but Raymond Hettinger kindly pointed out that CPython's sum() has an optimization to avoid boxing when the summands are all floats (or ints).  So it's not boxing overhead, and it's not dispatching on tp_as_number->tp_add to figure out how to add the arguments together.

My current best explanation is that it's not so much that the C runtime is slow at any given thing it does, but it just has to do a lot.  In this itertools example, about 50% of the time is dedicated to catching floating point exceptions.  The other 50% is spent figuring out how to iterate the itertools.repeat object, and checking whether the return value is a float or not.  All of these checks are fast and well optimized, but they are done every loop iteration so they add up.  A back-of-the-envelope calculation says that CPython takes about 30 CPU cycles per iteration of the loop, which is not very many, but is proportionally much more than V8's 5."



'via Blog this'

No comments:

Post a Comment