Understanding iter_rows in openpyxlJuly 15 2017
Let me say first that I love
openpyxl. The more I work with it, the more I admire its power and ruthless elegance. But occasionally the documentation leaves me wanting.
Reading data with
Let’s say I have a file called
example.xlsx with some simple data, shown here.
As with any two-dimensional array, the most obvious way to store this data in Python is a nested list. And thankfully,
openpyxl makes this easy. To start, import the library and open the file.
from openpyxl import load_workbook wb = load_workbook('_path_to_file') ws = wb['sheet1']
We can access individual cell values easily with A1 notation like so.
>>> ws["A1"].value First >>> ws["C3"].value Ninth
For this specific problem, we could make a nested list by manually typing each cell reference as a list element assignment.
myRange = [  ] myRange = ws["A1"].value myRange = ws["B1"].value myRange = ws["C1"].value ...
But this method is time-consuming and inflexible. If we want to reuse this code later we’d need to manually type width * height cell references by hand. Instead we’ll use the
iter_rows method. It looks like this:
1 2 3 4 5 6 myRange =  for row in ws.iter_rows(min_row=1, max_row=3, min_col=1, max_col=3): rowList =  for cell in row: rowList.append(cell.value) myRange.append(rowList)
Lines 1 & 3 we declare our target lists for cell values.
Line 2 begins a
for loop that loops through the output of the
iter_rows returns a tuple of tuples. The parent tuple represents the entire range. The child tuples contain rows of cell objects. We select our desired cells using the
min arguments. Think of these like (1,1) notation from Excel VBA or
min_col = 1translates to column A
max_col = 3translates to column C
Line 4 begins a for loop to cycle through each cell object. We append each cell value to our nested list in line 5. Once the row has been fully iterated we append it to the parent list.
myRange now contains a nested list of the values in the range A1:C3.
>>> myRange [['First', 'Second', 'Third'], ['Fourth', 'Fifth', 'Sixth'], ['Seventh', 'Eight', 'Ninth']]
If we need to rotate the data we could use the
iter_cols method instead.
>>> myRange_columns [['First', 'Fourth', 'Seventh'], ['Second', 'Fifth', 'Eight'], ['Third', 'Sixth', 'Ninth']]
Neither method requires arguments. The default
min value is 1. If no
min is provided, the method begins looking for cell objects at A1. And neither method requires a
max keyword. If no
max is provided, the method continues sucking up cell objects until it reaches an empty cell. In the code sample above we can remove the arguments to
iter_rows and would still receive the same results. Only use the
min arguments if you must specifically exclude cells adjacent to your target range.
Reading ranges with A1 notation
If A1 notation makes sense for your application, keep in mind that
iter_cols underpin the A1 notation system as well. Whether your call to ws[“range”] returns a column or row depends on the kind of range you call.
|Range Type||Example||Method Return Type|
Always remember that both methods return tuples. And if there are multiple rows or columns selected, the tuple will be nested.
Writing data with
Let’s say we need to write our nested list back to example.xlsx after having converted each value from ordinal number format to the informal format. My list now looks like this:
[['1st', '2nd', '3rd'], ['4th', '5th', '6th'], ['7th', '8th', '9th']]
To write this back to the my Excel file in place, I’ll use the following.
1 2 3 4 5 6 7 8 9 10 11 myRange = [['1st', '2nd', '3rd'], ['4th', '5th', '6th'], ['7th', '8th', '9th']] targetRange = ws.iter_rows() for tRange, mRange in zip(targetRange, myRange): for cell, val in zip(tRange, mRange): cell.value = val wb.save('example_update.xlsx')
In lines 1 - 3 I declare my nested list of values to be written. In a real application, this would probably be generated programmatically elsewhere.
In line 5 I set
targetRange equal to a nested tuple of cell objects from my worksheet. As before, I don’t need arguments for
iter_rows because I do not need to exclude any cells from my file write.
Line 7 says “Set my nested list and nested tuple beside each other (zip). Call the target range tRange and the source range mRange.”1
Line 8 says “One by one, take a cell object from the target range and set its contents equal to the value of the adjacent value in mRange (zip again).”
In line 11 I save my workbook as a new file called example_update.xlsx.
This approach is a lot different than how VBA or libraries like
xlwings treat ranges. Those tools allow you to references cells with (1,1) notation.
openpyxl doesn’t let you do this directly. A call to
iter_rows returns a generator that makes tuples, instead of a tuple directly. As a result, you can’t slice it with square brackets unless you do some extra hacking. I’ve experimented witha few models but have yet to find one that works as efficiently with as few lines of code.
Despite the added mental gymnastics needed to grok how
iter_rows works, it is a superior function. It economizes on memory use and it’s blazing fast. I wrote a process in
xlwings a few weeks ago that transforms data from one format to another. When I rewrote it in
openpyxl the script run time decreased by 2500%.
“When I learned this process I found it best to translate these nested
for loopsinto plain English. So I’ll do that here.” ↩