|
|
Why I Created Workbook Compare
Workbook Compare evolved out of real-world needs to compare Excel data. I am a software developer
and a strong proponent of testing -- from unit testing to regression testing which validates that no
unintended problems are introduced by changes to a spreadsheet solution.
After a series of arduous regression tests I performed to Excel spreadsheets over a couple of weeks,
it was finally time to build a comparison utility that could handle all of the situations I had encountered over the years.
Life Without Workbook Compare - An Example
Let's say, for example, that I have created an Excel reporting solution for my client.
The report is finance-related, with columns such as Investment, Market Value, Yield, Quantity, Price, etc.
The client has used it for a few months, it is reliable, and produces the correct numbers.
Over the last several months, however, the client has accumulated a lot more data to crunch.
Business is growing, more data is accumulating and the report is now looking at a much larger dataset
than a few months ago. When the report was first released, it only took a few seconds to generate. Now -- several months later
with more data to crunch -- it is time to optimize the performance a bit.
Moreover, the client requests some important changes be made, including new report columns, moving columns around,
and a new sort order.
So here is the $64,000 question: once these changes are made, how can I know if my new, faster report is correct? I am eyeballing the new report
and there are no glaring problems, but there are hundreds of rows of data and dozens of columns. I really need to check the data in
detail to know if the new report is working properly.
The old report has been used for months and the numbers are reliable and correct. One way to determine whether the new report breaks anything
is to compare it to the old report to see if there are any differences (regression testing).
Regression testing is especially crucial here because I have altered many of the formulas to make it calculate faster.
Even though the formulas are different, they should produce the same numbers. This allows me to leverage the accuracy of the data in the old report.
If I am not using Workbook Compare, I must "roll my own" manual comparison process. This might entail creating a new results worksheet with the formula
=IF('old report'!A1='new report'!A1,"","No Match")
then copy it for however many rows and columns.
Then I look at the "No Match" cells, which show how many differences exist.
I trace where the formula originates from and see where the problem is. I discover that one of the new formulas I created during the
optimization process is incorrect, and fix it. There are other miscellaneous differences too and I fix them as needed. To make sure the fixes are correct
I must re-run the new report. But to my dismay, I see that the report creation process prepares the report by deleting all rows, resulting in broken validation
formulas and the resultant #REF! errors. To get back to where I was, I have to re-create the validation formulas from scratch.
One of the fixes I make in the new report is to sort differently. This is one of the project change requirements requested by the client. So I modify the
new report to sort differently as requested. Also, the client had requested a different column order too, so I move the columns around in the new report as needed.
Now on my second iteration of testing, I realize the simple comparison formula
=IF('old report'!A1='new report'!A1,"",1) won't work anymore because the sort order is now different and columns are moved. To get around this,
I can either create a helper structure and use simple INDEX/MATCH formulas, or I can create a mega formula which is powerful enough to do this without a helper structure.
Now I can see that the "roll my own" process is getting a lot more complex, and I am spending a disproportionate amount of time just setting up an environment to find differences
for these spreadsheets.
Rows That Exist In One Sheet But Not The Other
With the "roll my own" comparison solution, a new problem manifests itself: knowing when rows exist in one sheet, but not the other.
What if a report has investments listed that the other does not have? I need to know those differences too.
This adds another layer into the increasingly-more-involved comparison process.
I can figure this out by performing two lookups against the primary column(s) which determines uniqueness in the data table.
The first lookup matches the old report's Investment column to the new, and the second lookup matches the new report's Investment column to the old.
Macros To Help With The Manual Process
As I progress through more work "rolling my own" comparison process, it is clear that this is not a trivial task and I can see
that I will need to perform several cycles of this process.
I decide to make my life easier by creating some helper macros, and spend a grueling hour or two of
writing some throwaway code to perform testing. It is so highly customized to the reports being compared that it
cannot be reused. Sure...I can decide to create better structured macros, parameterized functions, modular programming so
that I might be able to reuse some of this for another report, but that requires more time and I am just trying to validate that my report is correct.
Whatever I create is usable just for this report.
Since each report being compared has its own prep macro for the validation process, whenever I make the slightest change
to the new report layout, I have to modify the helper macros. This locks me in to the report layout. What if I want to modify
the new report layout a bit at this point? If I do, I have to also spend time keeping my prep macros in synch with the new report.
I repeat these steps for each iteration of running the report & then comparing the old VS new, until I have finally
identified the differences and fixed them.
At long last, after hours of data validation, I have resolved all of the differences, and now my new report matches my old report.
Why using Workbook Compare Is A Much Better Solution
In the Workbook Compare dialog, I select two workbooks -- the old_report.xls and new_reports.xls. When I do this, Workbook Compare
scans the workbooks and maps all of the worksheets automatically. It also maps all of the column names automatically.
I put a check mark next to the worksheet name I want to compare.
Since I have some new columns in the new report, I go to the column setup form for the worksheet being compared and uncheck
them. No need to see them because they are expected differences.
Since the new report has a different sort order, I need to create a "unique key" in the new report. So in the column setup
form I click the unique key checkbox next to the "Investment" column. Click OK to dismiss the column setup form.
On the main form, I click the "Compare" button, and Workbook Compare creates a highly structured Excel report with the differences.
If I want, I can select from different report outputs and run it again.
As was with the manual process, I see that there are some differences due to an error in one of the new report's formulas.
So I fix the formula and re-run the new report.
Freedom To Develop
If I want, I can shift the data in my new report down, add another report title, a date/time stamp, whatever.
I can move columns around too, or sort the report any way I want. Then, when I run Workbook Compare again
and it scans the workbooks,
it automatically maps the columns to the old report. This totally frees me up so that I can focus
on improvements to my report and I don't have to worry about how I am going to compare data after I do so.
Here's the cool part -- since Workbook Compare automatically saved the last comparison project I just ran, I simply click the Workbook Compare menu item
called "Open Most Recent Project". This configures the comparison exactly as I have set it up, with unique keys, columns unchecked, worksheets selected, etc.
Based on the difference report, I make further edits to the new report as needed, re-compare, and repeat until all of the differences are gone.
Workbook Compare never "touches" data during the comparison process. One of the core philosophies behind Workbook Compare is that data should never be
modified by any utility which is comparing it for accuracy. There are arguably two exceptions to this rule.
#1 is that during the column setup, Workbook Compare
"selects" the headers to make them stand out for you, and "tiles" the worksheets in a neat two-window view.
If you are using the Worksheet_SelectionChange event, it will get triggered by the selection of the headers. Therefore, this option may be turned OFF in the
advanced options.
The 2nd arguable exception is that you have the option to highlight the differences after your Excel report of differences is generated.
However, the fact remains that data is never modified, ever. No rows are ever inserted. No helper columns are ever created. No sorting is ever done.
Workbook Compare is very powerful and compares your entire workbooks with the lightest possible footprint.
SQL Data Too
While Workbook Compare was designed to compare Excel data, it more simply compares any data
you can throw into an Excel spreadsheet. I regularly use Workbook Compare to compare the results of
SQL Server stored procedures and views. If you are
making revisions to a stored procedure or a view, or an Access query for that matter, you can
copy the recordsets into Excel worksheets and find the differences between them.
With the power of Workbook Compare's "Unique Keys", you can treat Excel data as data in a database table.
Conclusion
When you use Workbook Compare, you will see how much easier it is to compare your Excel data. The steps needed to manually
compare Excel data take a long time to create as you saw above. Workbook Compare, on the other hand, was designed to do all of this work automatically.
It literally only requires a few seconds to set up the
comparison project using Workbook Compare, versus hours of doing it manually.
Workbook Compare is an essential tool which allows developers to release new versions of spreadsheet solutions and have
100% absolute certainty that there are no differences except those that are expected. Clients, bosses, managers, customers (and ourselves) appreciate that.
Tim Zych
|
|
|
|
|
Copyright ©2008-2009 HigherData
|
|