Excel Sheets Should be Immutable - 2017-05-22
Mutable data can be problematic.
If you have ever worked in an enterprise company you have probably witnessed usages of Microsoft Excel spreadsheets in ways you never would have imagined. I have. These spreadsheets could represent anything like budgets, resource planning, todo lists, project plans (Gantt), meeting protocols and requirements.
These ad-hoc solutions are signs of missing IT-systems in the organization. It is important to monitor the needs of users who invent their own solutions, especially when the number of users and/or the complexity of the spreadsheets becomes too large.
There are four important cases to consider:
- Few editors and few consumers.
- Few editors and many consumers.
- Many editors and few consumers.
- Many editors and many consumers.
If you are a small team of 2-3 people case 1 can work quite well. It is easy to adjust the content of the sheets and communicating changes to the other persons is done either by putting the file on a shared network drive or sending a new version by e-mail.
For case 2 there is no major editing problem. The editors can communicate with each other and synchronize their changes quite easily, but it is more difficult to communicate when changes happen to (in many cases unknown) consumers. If a consumer refers to a spreadsheet in a report its contents might have changed making the report outdated.
In the third and fourth cases, there is a concurrency problem regarding the editors. Excel spreadsheets only allow one editor at a time and all of the editors may not know about each other. This is when the spreadsheet should be migrated to a database with proper business logic.
With many editors, the consumers are exposed to frequent changes in the information. As in case 2, it is even more difficult to be able to understand what information each consumer is basing their decision on at a given point in time and what has changed since that decision was made.
One crude solution is to create a new file for each version of the spreadsheet where the filename contains a timestamp or version number. Unfortunately, it is still possible to change the contents of earlier files unless being write protected.
What is needed here is a way to make the spreadsheets immutable and to allow concurrent editing. Microsoft Sharepoint has some basic support for revision management, but in my experience, being used to Mercurial and Git, it is too simple. Decision documents should refer to specific revisions of other documents such that if requirements in other documents change the reader of the decision document should be notified of the presence of changed prerequisites.
The important point to make here is that old versions of documents should not be possible to change (mutate). Not having a reliable history would end up feeling like Winston Smith in the book 1984 who was altering history.
There is a database called Datomic which has applied the principles described above by letting the user go back in time and revisit the state at a particular timestamp. Regular databases mutate and lose old information unless history is designed into the database structure.
Companies would benefit from using the local Excel movements to identify the need for new IT-systems and to avoid having information mutated without other users being notified.