The philosophy behind list indexes in python

I bet this is asked quite frequently, however after quite a few hours searching I haven't found an answer.

What is the thinking behind stopping 'one short' when slicing or iterating through lists?

By example;

>>> a=[0,1,2,3,4,5,6]
>>> a
[0, 1, 2, 3, 4, 5, 6]
>>> a[2:5]
[2, 3, 4]

To my mind, it makes more sense to go to 5. I'm sure there's a good reason, but I'm worried it will result in a lot of 'one-off' errors for me, so I need to get my head around the philosophy of this behaviour, and where else it is observed (or not observed.)

I'm just a hobbyist who likes to learn things, and the Raspberry Pi has me interested in Python. I have dabbled in QB, VB, Spin (Parallex), Bascom, and Arduino's in the past.

There are two equally plausible ways to identify positions in a list/string/whatever. One is to number the elements, the other to number the gaps between them. I'll try my hand at some ASCII art... be sure to view this in a monospaced font.

[ 0 , 1 , 2 , 3 , 4 , 5 , 6 ]
| | | | | | | |
0 1 2 3 4 5 6 7
-7 -6 -5 -4 -3 -2 -1 (0)

When you ask for the slice from 2 to 5, you get the elements between those slot markers. That's [2,3,4].

When you ask for negative indices, the same applies, only there's no parallel way to ask for negative 0 aka end of list. [1]

>>> a[2:-2]
[2, 3, 4]

There are a number of reasons for working this way. For instance, the length of the range a[x:y] is simply y-x, negative indices aside. It's even more significant when you look at something that doesn't have
discrete units - such as times. Suppose you invent a data type to represent a time range. You might
describe a TV show as lasting from 10:00 till 10:30; but what do you really mean by those times? Do you mean from the start of 10:00 until the end of 10:30? When is the end of 10:30? Is it the end of the
minute 10:30, the end of the second 10:30:00, the end of the millisecond 10:30:00.000? Easier to describe it as the beginning of that moment, because that has the same meaning regardless of your
resolution. You can always add more trailing zeroes to either the start time or the end time, without changing the meaning of the range.
Same applies to generation of random numbers. If you have a function that generates a random number uniformly in the range [0,1) - that is, including 0 but not including 1 - and you multiply it by an integer
and truncate the decimal, you get a random integer uniformly in the range [0,x), which is an extremely useful thing. You don't even need to care what the actual range in the RNG is (does it produce 0.000
through 0.999, or 0.000000 through 0.999999?), as long as it's significantly more than your target range. But if the RNG could return 1.0, then you need to deal with that possibility in your result, which
frankly isn't much use.

It takes some getting used to, perhaps, given that most people in the real world work with closed ranges; but ultimately it makes far more sense. And if it weren't for a huge case of lock-in, I would wish we
could change the way Scripture references are interpreted, for the same reasons. Taking examples from tomorrow's church service, two of the readings are Matthew 18:15-20 and Philippians 3:8-10. When you
look in a Bible, you'll find verse numbers preceding the verses (at least, that's the convention in most editions). If the ranges were written as half-open (eg Matt 18:15-21), it would be simply from verse
marker 15 to verse marker 21; and "to the end of the chapter" or "to the end of the book" would have obvious notations (eg Matt 18:15-19:1 or Matt 27-Mark 1). Of course, this would make for a huge amount of confusion, since the present system has been around for centuries... but it would make more sense, so I'm very much glad it's the way Python chose to do it :)

[1] You can use None or omit the index, but there's no "negative 0" integer to use.

The philosophy behind list indexes in python

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview