As our business grows, our needs to analyse and understand it grow even faster. This means we must get better at generating reports so that more people can use them with less effort. One tool for improving long reports is to break them up in sections. In Excel spreadsheets, this can be done with the outline feature. This post looks at how we made the axlsx
gem show outlines the way we wanted.
This is a simple example of a table with 3 rows of detailed data and their sums at the end:
We can press the minus-in-a-box to collapse the detail:
With 10 sections like this in a report, the reader has the option to look at all the details or only an overview—with all the data fitting on one screen. The example was created using this snippet which should be fairly straight forward:
xls(name: "sums_below.xlsx") do |sheet|
sheet.add_row ["Lorem ipsum", 123, 234, 345]
sheet.add_row ["Lorem ipsum", 321, 432, 543]
sheet.add_row ["Lorem ipsum", 345, 456, 567]
sheet.add_row [
"Sums:", "=SUM(B1:B3)", "=SUM(C1:C3)", "=SUM(D1:D3)",
]
# Set rows R1 to R3 (zero-based: 0 to 2) to
# outline level 1 and do not hide them.
sheet.outline_level_rows(0, 2, 1, false)
end
Please ignore the xls helper function for the moment, but note that it uses the Axlsx gem which is a great tool.
Now, for various reasons I want the sums above the data instead of below, so I change the code:
xls(name: "sums_above.xlsx") do |sheet|
sheet.add_row [
"Sums:", "=SUM(B2:B4)", "=SUM(C2:C4)", "=SUM(D2:D4)",
]
sheet.add_row ["Lorem ipsum", 123, 234, 345]
sheet.add_row ["Lorem ipsum", 321, 432, 543]
sheet.add_row ["Lorem ipsum", 345, 456, 567]
sheet.outline_level_rows(1, 3, 1, false)
end
The sums are on top all right, but the collapse/expand button stays at the bottom, which is disappointing:
(If you think that is obvious, consider that if you create this sheet manually in Excel, the button will end up next to the sums automatically). So how do we make it go to the top? The documentation (and the vast collection of examples) have nothing to say on the subject, but they don’t go into a lot of detail about anything, so it doesn’t surprise me. On the other hand, past experience shows that the gem is better than the documentation, so I’m confident that we can find a way to do it. But how?
Use the Source, Luke!
Luckily, Microsoft replaced the binary file formats for the Office suite with standardised zip archives full of XML files years ago, so we can inspect the raw sheet to learn its secrets. Here’s what I did:
- Save “sums_above.xlsx” as “plus_below.xlsx” (using File > Save As). This gives us an official baseline. (This step is important! Axlsx creates great spreadsheets, but the source doesn’t look exactly like the one Excel creates.)
- Clear all outlines (Data > Group and Outline > Clear Outline).
- Turn on automatic outlines (Data > Group and Outline > Auto Outline). This creates the look we want:
- Save the file as “plus_above.xlsx”.
- Unzip the two xlsx files:
unzip -d below plus_below.xlsx && unzip -d above plus_above.xlsx
The hypothesis is that the only difference between the two sets of XML files will be whatever makes Excel put the collapse button before the rows that should be hidden. But how do we find the differences? Running diff
on the two sets (diff -u -r below above
) should do the trick, but the output is not very helpful. It finds differences in two files, not one, and the files are basically one long line and diff does not say where the differences are:
$ diff -u -r below above
diff -u -r below/docProps/core.xml above/docProps/core.xml
--- below/docProps/core.xml 1980-01-01 00:00:00.000000000 +0100
+++ above/docProps/core.xml 1980-01-01 00:00:00.000000000 +0100
@@ -1,2 +1,2 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><dc:creator>axlsx</dc:creator><cp:lastModifiedBy>Microsoft Office User</cp:lastModifiedBy><cp:revision>0</cp:revision><dcterms:created xsi:type="dcterms:W3CDTF">2018-03-02T18:07:18Z</dcterms:created><dcterms:modified xsi:type="dcterms:W3CDTF">2018-03-02T17:07:50Z</dcterms:modified></cp:coreProperties>
\ No newline at end of file
+<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><dc:creator>axlsx</dc:creator><cp:lastModifiedBy>Microsoft Office User</cp:lastModifiedBy><cp:revision>0</cp:revision><dcterms:created xsi:type="dcterms:W3CDTF">2018-03-02T18:07:18Z</dcterms:created><dcterms:modified xsi:type="dcterms:W3CDTF">2018-03-02T17:08:08Z</dcterms:modified></cp:coreProperties>
\ No newline at end of file
diff -u -r below/xl/worksheets/sheet1.xml above/xl/worksheets/sheet1.xml
--- below/xl/worksheets/sheet1.xml 1980-01-01 00:00:00.000000000 +0100
+++ above/xl/worksheets/sheet1.xml 1980-01-01 00:00:00.000000000 +0100
@@ -1,2 +1,2 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><dimension ref="A1:D4" /><sheetViews><sheetView tabSelected="1" showWhiteSpace="0" workbookViewId="0" /></sheetViews><sheetFormatPr baseColWidth="10" defaultColWidth="8.83203125" defaultRowHeight="14" outlineLevelRow="1" x14ac:dyDescent="0.15" /><cols><col min="1" max="1" width="11" bestFit="1" customWidth="1" /><col min="2" max="4" width="6.6640625" bestFit="1" customWidth="1" /></cols><sheetData><row r="1" spans="1:4" x14ac:dyDescent="0.15"><c r="A1" t="s"><v>0</v></c><c r="B1"><f>SUM(B2:B4)</f><v>789</v></c><c r="C1"><f>SUM(C2:C4)</f><v>1122</v></c><c r="D1"><f>SUM(D2:D4)</f><v>1455</v></c></row><row r="2" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A2" t="s"><v>1</v></c><c r="B2"><v>123</v></c><c r="C2"><v>234</v></c><c r="D2"><v>345</v></c></row><row r="3" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A3" t="s"><v>1</v></c><c r="B3"><v>321</v></c><c r="C3"><v>432</v></c><c r="D3"><v>543</v></c></row><row r="4" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A4" t="s"><v>1</v></c><c r="B4"><v>345</v></c><c r="C4"><v>456</v></c><c r="D4"><v>567</v></c></row></sheetData><pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5" /><pageSetup orientation="portrait" /></worksheet>
\ No newline at end of file
+<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"><sheetPr enableFormatConditionsCalculation="0"><outlinePr summaryBelow="0" /></sheetPr><dimension ref="A1:D4" /><sheetViews><sheetView tabSelected="1" showWhiteSpace="0" workbookViewId="0" /></sheetViews><sheetFormatPr baseColWidth="10" defaultColWidth="8.83203125" defaultRowHeight="14" outlineLevelRow="1" x14ac:dyDescent="0.15" /><cols><col min="1" max="1" width="11" bestFit="1" customWidth="1" /><col min="2" max="4" width="6.6640625" bestFit="1" customWidth="1" /></cols><sheetData><row r="1" spans="1:4" x14ac:dyDescent="0.15"><c r="A1" t="s"><v>0</v></c><c r="B1"><f>SUM(B2:B4)</f><v>789</v></c><c r="C1"><f>SUM(C2:C4)</f><v>1122</v></c><c r="D1"><f>SUM(D2:D4)</f><v>1455</v></c></row><row r="2" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A2" t="s"><v>1</v></c><c r="B2"><v>123</v></c><c r="C2"><v>234</v></c><c r="D2"><v>345</v></c></row><row r="3" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A3" t="s"><v>1</v></c><c r="B3"><v>321</v></c><c r="C3"><v>432</v></c><c r="D3"><v>543</v></c></row><row r="4" spans="1:4" outlineLevel="1" x14ac:dyDescent="0.15"><c r="A4" t="s"><v>1</v></c><c r="B4"><v>345</v></c><c r="C4"><v>456</v></c><c r="D4"><v>567</v></c></row></sheetData><pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5" /><pageSetup orientation="portrait" /></worksheet>
\ No newline at end of file
Let’s use xmllint
to format the XML files in a way that’s more readable for humans1:
tmpfile=$(mktemp -t xml)
for f in $(find above below -name "*.xml"); do
cp $f $tmpfile
xmllint --format -o $f $tmpfile
done
(I haven’t found a way to make xmllint
change files in-place, hence the use of a temp file.) Now the diff
output becomes more palatable:
$ diff -u -r below above
diff -u -r below/docProps/core.xml above/docProps/core.xml
--- below/docProps/core.xml 2018-03-01 15:29:39.000000000 +0100
+++ above/docProps/core.xml 2018-03-01 15:29:39.000000000 +0100
@@ -4,5 +4,5 @@
<cp:lastModifiedBy>Microsoft Office User</cp:lastModifiedBy>
<cp:revision>0</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2018-03-01T13:17:46Z</dcterms:created>
- <dcterms:modified xsi:type="dcterms:W3CDTF">2018-03-01T12:18:55Z</dcterms:modified>
+ <dcterms:modified xsi:type="dcterms:W3CDTF">2018-03-01T12:19:24Z</dcterms:modified>
</cp:coreProperties>
diff -u -r below/xl/worksheets/sheet1.xml above/xl/worksheets/sheet1.xml
--- below/xl/worksheets/sheet1.xml 2018-03-01 15:29:39.000000000 +0100
+++ above/xl/worksheets/sheet1.xml 2018-03-01 15:29:39.000000000 +0100
@@ -1,5 +1,8 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" …>
+ <sheetPr enableFormatConditionsCalculation="0">
+ <outlinePr summaryBelow="0"/>
+ </sheetPr>
<dimension ref="A1:D4"/>
<sheetViews>
<sheetView tabSelected="1" showWhiteSpace="0" workbookViewId="0"/>
The first chunk is a timestamp, so we can ignore that. The second has what we need: the good file has a sheetPr
section with a outlinePr
element with a summaryBelow
attribute—that has to be it!
Now let’s find out how to make Axlsx set that attribute the way we want. The gem is listed in my Gemfile
so I open it using VISUAL=subl bundle open axlsx
(I even have an alias, bopen='VISUAL=subl bundle open'
) and start looking. Searching for summaryBelow
yields two files that belong to the test suite but nothing more. But having used axlsx
for a while, I know that it has never made me use any CamelCase identifiers (it’s a Ruby library through and through) and it’s a small leap to instead search for summary_below
. This points to an OutlinePr
class, which (as a search reveals) is only ever used by the SheetPr
class, which mimics the structure we saw in the XML file. SheetPr
in its turn is only used by the Worksheet
class, and now we have enough information to string this together:
xls(name: "sums_above.xlsx") do |sheet|
sheet.add_row [
"Sums:", "=SUM(B2:B4)", "=SUM(C2:C4)", "=SUM(D2:D4)",
]
sheet.add_row ["Lorem ipsum", 123, 234, 345]
sheet.add_row ["Lorem ipsum", 321, 432, 543]
sheet.add_row ["Lorem ipsum", 345, 456, 567]
sheet.outline_level_rows(1, 3, 1, false)
sheet.sheet_pr.outline_pr.summary_below = false
end
We run it, and it works!