• Members 46 posts
    Oct. 15, 2019, 6:26 p.m.

    This article is an example of why making it easy for someone to code something they don't truly understand can be devestating in the long run. It is just as important if not more important to understand what a program is doing and exactly how it is doing that as it is to be able to easily write the program.

    The fact that this code was simply taken and reused without anyone checking it before use is a huge oversight as well. Additionally, this code leaves it up to the operating system to sort files and in doing so destroys determinism and goes against some of the core principles of software engineering and science. This is why professional computer scientists should be incorporated into an study that uses computer added data analysis.

  • Oct. 15, 2019, 8:43 p.m.

    Oh I heard about this on twitter! Although this is definitely not a bug as the article purports - as it says in the first paragraph of the documentation for that module:

    Whether or not the results are sorted depends on the file system.

    These results are deterministic on a given file system, but they aren't across file systems. Python is likely referring to a file-system (or more likely, OS) implementation of the search and passing those results back through.

    I saw an interesting idea floated in response to this - volunteer code review for scientific code. I personally enjoy participating in code reviews and would love this idea, and I think it would be enlightening for both sides. Doing that might require more labs to be open source though, which is kind of a touchy topic as most labs aren't even open science!

  • Members 46 posts
    Oct. 15, 2019, 10:38 p.m.

    It is stated that the OS handles the sorting of the files. So yes it is deterministic on a given OS. The problem though, is that only one order is correct, otherwise the solutions are wrong. The person who wrote the code though basically said, at the time, it worked for those OS's. Now, we know that that is not good enough for code like this and that the program needs to handle the sorting itself, but the problem is is that a lot of people who just pick up programming to automate something don't understand the pitfalls they need to avoid. The result being stuff like this.

    Yeah, it is just straight up bad coding practices.

  • Oct. 16, 2019, 8:15 a.m.

    I disagree, it's a design decision. I was going to say this, but then I looked at the source code for the glob function. I thought that it might be doing some lazy iteration for performance which might be why it's not sorted. That would be true for iglob, but apparently glob just calls the list constructor on the results of iglob 🤦‍♀️. Since we're already putting the entire object in memory with the list call (and the sorted function in python even returns a list), why that's not sorted(iglob(<x>)) is beyond me.

  • Members 46 posts
    Oct. 16, 2019, 9:36 a.m.

    @Aether Yeah, the problem stems directly from the fact that the order is left up to the OS which is outside of the domain of control of the program, which I can't imagine senario where that is even close to best practice, given that the order is strictly important to the proper excicution of the program. If this was a tool that only ran on one OS, maybe, since the ordering would be somewhat predictable, but python is by design multi OS compatible so things like this need to be addressed in the code, not left up to the operating system.

  • Oct. 16, 2019, 9:53 a.m.

    In most cases yes, in some cases there are some non-trivial differences in OS-level implementation that require things to be different. Case in point being subprocesses and asynchronous event loops - at the higher level python provides an interface to accomplishing both of those things across systems, but many aspects of them are system specific. There's no getting away from that in those cases (and I'm sure there are others I'm not thinking of).

  • Members 46 posts
    Oct. 16, 2019, 3:55 p.m.

    Just to be clear, I don't think we disagree on anything. I mean, if we do, I am happy to discuss it, but I don't think we do...

  • Oct. 16, 2019, 4:14 p.m.

    👍🐍