In case you have not yet heard it from every IT vendor, every non-IT vendor, neighbor, friend and analyst, we are in the age of the digital transformation. This age means that we do mobile first (or mobile only?), everything is software-defined, 40% of the Fortune 100 will fail in the next five years or six months or something like that, IT (including the CIO) will soon report to a Chief Digital Officer–the new CDO will be 23 years old but with deep experience with tech blogging. And, in case you have not yet heard, everything is now different and the new normal is that there is no normal.
All right, I am being a bit factitious about the hype around the digital transformation. That said, things really are different. Technology has and is changing the world. Everything moves at the pace of technology innovation and that pace is break-neck. Many of us (at least those not yet reporting to the new 23 year old CDO) are driving organizational innovation and so have a sense of the changes taking place. Advances in technology generate new advances in technology which generates new advances in technology and so on. With all that is changing, it is easy for us to forget about the basics of IT–the blocking and tackling that ensure that we deliver operational excellence. For me, that blocking and tackling is based on the principles of IT service management (ITSM). I have found that the better I am at service management, the better I will be at operational excellence, customer satisfaction and, more importantly, IT agility. And, in a world that moves quickly and changes constantly, IT agility just might be the most important thing I can develop. Perhaps my personal example will demonstrate how solid ITSM enables digital transformation.
We design, develop and support two SaaS products that our clients and their employees use. When I first inherited these products, they were overly complex, fragile and brittle. All by themselves, they broke several times a week. As you might expect, it is difficult to focus on innovation and digital transformation when we are not good at the basics of delivery. My first priority had to be to make these products solid, reliable and performant. I am happy to report that we have accomplished those tasks.
As you might expect, it is difficult to focus on innovation and digital transformation when we are not good at the basics of delivery
How did we pull that off? ITSM. We defined a service catalog and service levels and then used a quality incident response and resolution process to get to and resolve the root cause of our application issues. As we consistently applied ITSM to our products, performance improved and we were able to shift our efforts from firefighting to the proactive innovation of our products. In parallel, we got good at the process of production change. In my experience, the vast majority of system downtime comes from not-fully-verified changes to production systems. We defined a simple process (complex processes encourage workarounds and are themselves unreliable) in which every change, before we made the change, had to have a valid, executed test plan, a roll-back plan (just in case we missed something with our testing) and a communication plan (so that we would have visibility around the change and to make sure that we fully understood the reach and impact of the change on other things).
Within a short period of time, our results improved as did our credibility (and credibility matters when we are trying to convince the rest of the organization to digitally transform– we typically trust people who are demonstrably competent). Even as we got better at the business of IT, we retained our focus on our use of ITSM principles.
As our product foundation improved, we started to seize and realize opportunities to shorten our time to market. This led us to experiment with and start to use agile software development methods and DevOps. We moved from twice-yearly product releases to monthly releases. Then to daily releases and now to multiple micro-releases a day (if we need to). Shrinking our release cycle has created significant value. Fixes, enhancements and good old innovations get to our customers much faster. And, each release is smaller and therefore less complex and less risky. In order to pull this off, we had to be really good at service definition, delivery and management. We also had to eliminate exception handling so that we could standardize our processes in advance of automating our processes. In other words, we had to continue to have our ITSM act together.
As we shorten our development cycles, we must shorten our incident response and root cause analysis cycles. In our case, we use production worthiness standards that define what we mean by a quality service. We meet weekly to review pending changes (including the week’s product release) and any open incidents (an open incident is one for which we have not yet identified the root cause or implemented the counter measure). We keep incidents open for as long as it takes us to get to and resolve root cause. In one case, we had an open incident for over a year waiting for us the learn enough about what happened so that we could ensure that nothing like it would ever happen again.
In searching for root cause, we focus our attention on the process rather than the specific incident. If the last release introduced a software bug, we ask ourselves how we need to change our software development processes in order to reduce the likelihood of releasing a bug. If there are too many open database connections, we do not “solve” this by rebooting the database server. Rather, we work until we get to the core issue. I don’t want to just find a memory leak, I want to find out what causes memory leaks.
Digital transformation depends on quality and agility. I have a strong opinion–formed over years of success and failure–that ITSM is the key to both quality and agility. And I need both if I want to lead an innovative approach to digital transformation.