In a recent tool I developed, I had the need to split up a list (or any iterable) into equal-sized lists of values. In this post, I document the process on how I reached my goal, what steps it took and the decisions behind them.
The StackOverflow way
The most comfortable way to solve problems is often to just search on the internet. When looking for "python chunk list", one of the first results is an already answered question on StackOverflow.
While this tiny function solves the original question in a beautiful and simple way, there is still room for improvement.
Iterate over anything
A shortcoming of the solution above: It only works with lists that have a known length. If your source data is an iterator or generator that doesn't have that, it won't work. As I'm a great fan of composable functions, I refactored the solution to allow that:
In comparison with the (much smaller) previous approach, this one uses a deque as temporary buffer, until enough items have been collected.
As soon as the iterator has ended, whatever is left will be yielded as last chunk. You cannot use return
for that, as it's would drop the value.
Adding and verifying types
Since the last few releases, Python added great support for adding type hints to code, which adds clarity about what functions receive and return, and allow for nice integration with your IDE of choice.
Because we don't know what data exactly comes in and goes out, we need to define a generic type to mark it as placeholder for the type checker to fill in. It is then able to correctly infer the result and use that to verify what the caller is doing with it later.
I'm not using the Generator type here, as we don't use any generator-related features, and the Python docs recommend to use the simpler and more generic Iterable type instead.
Testing it
To ensure that the written method works in all cases, I added a unit test to the project's test suite (using pytest).
I often write tests using parametrizing test functions, which allows pytest to generate separate tests for each case, which gives an nicer output in case something goes wrong. The use of dataclasses and typing allows again for a nicer IDE experience.
In the test I call the function twice, once with the input directly, and once wrapped wit iter()
to ensure that both cases work correctly.
And with the right invocation of pytest, we can see that we tested all code (and ran all of test code):
100% test coverage for our tiny helper function