Jesse Whitehouse Developer | Analyst

Understanding iter_rows in openpyxl

Let me say first that I love openpyxl. The more I work with it, the more I admire its power and ruthless elegance. But occasionally the documentation leaves me wanting.

Reading data with iter_rows

Let’s say I have a file called example.xlsx with some simple data, shown here.

  A B C D
1 First Second Third  
2 Fourth Fifth Sixth  
3 Seventh Eight Ninth  
4        

As with any two-dimensional array, the most obvious way to store this data in Python is a nested list. And thankfully, openpyxl makes this easy. To start, import the library and open the file.

from openpyxl import load_workbook

wb = load_workbook('_path_to_file')
ws = wb['sheet1']

We can access individual cell values easily with A1 notation like so.

>>> ws["A1"].value
First
>>> ws["C3"].value
Ninth

For this specific problem, we could make a nested list by manually typing each cell reference as a list element assignment.

myRange = [ [] ]
myRange[0][0] = ws["A1"].value
myRange[0][1] = ws["B1"].value
myRange[0][2] = ws["C1"].value

...

But this method is time-consuming and inflexible. If we want to reuse this code later we’d need to manually type width * height cell references by hand. Instead we’ll use the iter_rows method. It looks like this:

1
2
3
4
5
6
myRange = []
for row in ws.iter_rows(min_row=1, max_row=3, min_col=1, max_col=3):
    rowList = []
    for cell in row:
        rowList.append(cell.value)
    myRange.append(rowList)

Lines 1 & 3 we declare our target lists for cell values.

Line 2 begins a for loop that loops through the output of the iter_rows method. iter_rows returns a tuple of tuples. The parent tuple represents the entire range. The child tuples contain rows of cell objects. We select our desired cells using the max and min arguments. Think of these like (1,1) notation from Excel VBA or xlwings.

  • min_col = 1 translates to column A
  • max_col = 3 translates to column C

Line 4 begins a for loop to cycle through each cell object. We append each cell value to our nested list in line 5. Once the row has been fully iterated we append it to the parent list.

myRange now contains a nested list of the values in the range A1:C3.

>>> myRange
[['First', 'Second', 'Third'],
['Fourth', 'Fifth', 'Sixth'],
['Seventh', 'Eight', 'Ninth']]

If we need to rotate the data we could use the iter_cols method instead.

>>> myRange_columns
[['First', 'Fourth', 'Seventh'],
['Second', 'Fifth', 'Eight'],
['Third', 'Sixth', 'Ninth']]

Arguments optional

Neither method requires arguments. The default min value is 1. If no min is provided, the method begins looking for cell objects at A1. And neither method requires a max keyword. If no max is provided, the method continues sucking up cell objects until it reaches an empty cell. In the code sample above we can remove the arguments to iter_rows and would still receive the same results. Only use the max and min arguments if you must specifically exclude cells adjacent to your target range.

Reading ranges with A1 notation

If A1 notation makes sense for your application, keep in mind that iter_rows and iter_cols underpin the A1 notation system as well. Whether your call to ws[“range”] returns a column or row depends on the kind of range you call.

Range Type Example Method Return Type
Entire Column A:A iter_cols
Entire Row 1:1 iter_rows
Bounded Row A1:C3 iter_rows
Bounded Column A1:A3 iter_cols

Always remember that both methods return tuples. And if there are multiple rows or columns selected, the tuple will be nested.

Writing data with iter_rows

Let’s say we need to write our nested list back to example.xlsx after having converted each value from ordinal number format to the informal format. My list now looks like this:

[['1st', '2nd', '3rd'],
['4th', '5th', '6th'],
['7th', '8th', '9th']]

To write this back to the my Excel file in place, I’ll use the following.

1
2
3
4
5
6
7
8
9
10
11
myRange = [['1st', '2nd', '3rd'],
          ['4th', '5th', '6th'],
          ['7th', '8th', '9th']]

targetRange = ws.iter_rows()

for tRange, mRange in zip(targetRange, myRange):
    for cell, val in zip(tRange, mRange):
        cell.value = val

wb.save('example_update.xlsx')

In lines 1 - 3 I declare my nested list of values to be written. In a real application, this would probably be generated programmatically elsewhere.

In line 5 I set targetRange equal to a nested tuple of cell objects from my worksheet. As before, I don’t need arguments for iter_rows because I do not need to exclude any cells from my file write.

Line 7 says “Set my nested list and nested tuple beside each other (zip). Call the target range tRange and the source range mRange.”1

Line 8 says “One by one, take a cell object from the target range and set its contents equal to the value of the adjacent value in mRange (zip again).”

In line 11 I save my workbook as a new file called example_update.xlsx.

This approach is a lot different than how VBA or libraries like xlwings treat ranges. Those tools allow you to references cells with (1,1) notation. openpyxl doesn’t let you do this directly. A call to iter_rows returns a generator that makes tuples, instead of a tuple directly. As a result, you can’t slice it with square brackets unless you do some extra hacking. I’ve experimented witha few models but have yet to find one that works as efficiently with as few lines of code.

Despite the added mental gymnastics needed to grok how iter_rows works, it is a superior function. It economizes on memory use and it’s blazing fast. I wrote a process in xlwings a few weeks ago that transforms data from one format to another. When I rewrote it in openpyxl the script run time decreased by 2500%.

  1. “When I learned this process I found it best to translate these nested for loops into plain English. So I’ll do that here.”