Architects as well as senior engineers are most often faced not only with technical but also with organisational challenges when they try to extend or adjust existing enterprise solutions. In contrast to small and medium-sized corporations, the IT governance process of huge enterprises most likely includes a group of different stakeholders who need to decide on pre-defined quality gates necessary to pass before changes can be put to live. Some quality gates will focus on the business value, others will focus on security, the next on operational costs and so forth. The documentation ranges from various MS Word documents, MS Powerpoint or Keynote presentations as well as MS Excel cost calculations sheets. In the best case all the artifacts are collected in a folder on a shared drive accessible by project managers.
A different documentation approach
Recently faced with these challenges for one of our clients, I got a hint to check the arch42 approach for documenting a 360° view on architecture. I did so and I’m going to share my experience of using this template via Atlassian Confluence. In the following, I will present just the chapter structure suggested by the arch42 standard. The reader will find a very brief summary of the intended use as well as my experience working with the format.
Introduction and Goals
According to the documentation standard this section should be used to describe the underlying business goals, functional requirements, quality goals as well as stakeholders and their expectations.
Working in enterprises often means negotiation with many different business as well as technical groups who need to contribute to get a feature done. By providing not only information about the intended solution, but also background information why this feature has been chosen and how it aligns to the company strategy, saved me a lot of discussion and arguing time.
The stakeholder list became one of the most valuable parts. Just by having all the stakeholders, their incentives as well as contact information in a single place, opened quite some doors more easily.
According to the standard, this section should cover any kind of constraints limiting the architect in design and implementation decisions.
By splitting the constraints into organisational and technical ones, I was able to address the different target groups of decision makers and technical people quite easily. Phrasing the organisational constraints supported the steering committee to remove blockers when possible. The technical constraints supported engineers in coming to concise decisions regarding certain technologies, service etc.
Context and Scope
The standard suggests to provide a clear picture of the boundaries of the system to build and its external dependencies.
As with constraints, this section has been split into a business and technical section. The business context was used for any communication with product managers and stakeholders to emphasize how the new feature would be integrated into existing solutions. Showing the missing links within the existing landscape was easier while having diagrams and high-level blocks at hand.
The technical context was used to explain the existing technical landscape to engineers. It helped to highlight information gaps in understanding the status quo. In contrast to the business context, the technical context changed continuously during the project. As is often the case in enterprises, the knowledge about existing services, technical interfaces etc. is distributed among the various departments. Therefore, the depth of information varies a lot. Using the context and scope document to highlight the current level of understanding the existing service landscape, helped to keep an overview of progress and "black holes.”
The standard suggests to provide fundamental decisions and solution strategies regarding the final architecture here. This section is supposed to describe the decomposed components of the system as well as technology decisions.
Following the pattern of high level and technical view, I split this section into two parts: an executive summary and a detailed technical section. The executive summary served in discussions with enterprise architects and in several board and panel presentations. It mainly listed the basic technical concepts, as e.g. using Single Page Web Application (SPA) backed by an engagement service that shields the business service layers within a DMZ. Furthermore, it listed tech stack information and named available managed 3rd party services.
The detailed sections provided reasons for each technology and service. In addition, it provided risk mitigation and migration patterns. This section has been used by the engineering team to validate its decisions made during implementations.
Building Block View
This section is supposed to provide a static view on the various components.
Following the standard, I simply placed the different components into logical layers providing high-level relationships among them. Figures and explanations here have been quite redundant to the context and scope section. Thus, a key learning would be to focus only on external dependencies around a black box in the context scope section. The detail of the intended solutions should go here.
The standard suggests to place the concrete behaviour and interactions of the system’s building blocks here.
Following this advice, I provided the domain data transferred between the various services in this section. In addition, detailed information about session managements and caching has been added for the engineers. A short description of the intended system behaviour in case of none-reachable services completed this section.
The runtime view audience has consisted mainly of developers and devOps engineers. They used this section to get a basic understanding of how the feature and its use cases spread across actual services and components.
In accordance with the standard, this section describes the technical infrastructure. The standard suggests to list all deployment environments used from development, over test to production.
I used this section to describe the overall setup of cloud and on-premise components. The chapter provided a high-level view of infrastructure elements such as firewalls, virtual private network, and proxies. The overall setup has been fairly easy. Thus, I omitted concrete machine requirements, and network configurations.
Looking back, I would opt to provide more detailed information regarding machine sizing (CPU, RAM, I/O), logging infrastructure, monitoring infrastructure etc. for my next projects here. This eases cost calculations as well as preparations for the devOps team.
According to the standard, this section covers solutions and ideas that are relevant in multiple parts of the system. Thus, it should list domain models, architecture design patterns, implementation rules.
I used this section to list interface specifications for new services and components as well as for 3rd party service to be used. The concept page mutated basically to the most referred and accessed page, as it provided detailed information about the data exchange and security.
In future projects, I would opt to use this section to provide general concepts for logging, monitoring pattern as well as coding principles agreed by the team. In addition, this section should link to enterprise-wide standards to be followed in conducting engineering projects. I would put the interface definitions into the runtime view section.
The standard suggests to list important, large scale or risky architecture decisions here. It mainly shall be used to document why a certain alternative has been chosen.
During the project, certain 3rd party services have not been able to deliver necessary changes, which led to decreased functions respectively changing the feature. I used this section to document the different feature variants, reasons for the change as well as the impact of each one not only on the business value but also on the architecture. Retrospectively, I would consider this a mixture of project management and technical decisions. By staying high level, this section helped in discussions with stakeholders and steering committees. Detailed technical decisions should go near the code as architectural decision records right into the version control systems.
This section includes quality objectives based on scenarios, such as system behaviour in case of consumer growth, expected response time, disaster recovery etc.
This section has not been maintained during my project as most of the quality objectives have been mentioned in other sections already. The runtime view covered the error scenarios for missing 3rd party services. The concept section described the expected maximum amount of parallel users for public APIs. The deployment view covered necessary service to de-couple systems from denial of service attacks etc.
However, running my next projects I would place the different scenarios mainly in this section to provide a concise overview of the different aspects in one place. Using Confluence makes it easy to place the content on several pages without duplication.
Risks & Technical Debt
This section should list all identifier risks or technical debts.
I split the section into two different chapters: risk and technical debt. Both chapters included assessed risk and mitigation paths. Risks most often evolved from technical and organisational decisions.
Technical debts had been introduced most often just due to resource constraints or missing commitment from 3rd party providers.
The glossary shall ensure the same understanding of expressions and terms used within the project documentation.
Development Team Info
This section mirrored certain necessary parts of the overall documentation via excerpt includes (Atlassian Confluence). This means relevant content from the previous chapters have been “copied by reference” into one single page to give a consistent and brief overview for the engineering in daily life.
In general, the arch42 approach provides a very nice 360° view on the architecture in the scope of the existing landscape and necessary service extensions. Given the common pressure to deliver results in a timely manner, I found it hard to find the appropriate chapter for documenting relevant information without repeating myself. Reflecting the project and reading the suggestions again, I would suggest the following high-level guidelines:
|Introduction/Goals||Zoom out of the immediate technical ideas and reflect the business environment and objectives.|
|Constraints||Focus on challenges, constraints from both angles business & technology, keep in mind organisational processes.|
|Context/Scope||What does the landscape look like, and where do services/components belong in this picture.|
|Solution Strategy||Decide on what kind of service mobile/web/client app, just a service). Decide on tec- stack and operations (cloud/on-premise/both).|
|Building Block View||Decide on solution components and services and how they relate to each other.|
|Runtime View||Depict the information flow throughout your solution.|
|Deployment View||Taking the infrastructure view, think about needs, setup, restrictions, security.|
|Concepts||Think about cross-functional implications such as logging, auditing, monitoring guidelines and how to fulfill them.|
|Architectural Decisions||Assess different solution opportunities, weigh them against each other, take a decision (ADRs).|
|Quality||Just think about all what could possibly wrong when running the system (performance, breaking 3rd party services, reporting, auditing).|
|Risk & Technical Debt||List technical debts resulting from architectural decisions as well as risk. Provide mitigation approaches.|
|Glossary||List all technical terms product management does not know by heart, list all business terms, development does not know by heart.|
In addition to the standard chapter, I suggest a cost planning section which includes operational costs in terms of hardware, service fees, but also 1st/2nd level support costs. Most often neglected are costs for keeping the software up-to-date regarding infrastructure changes for cloud providers and library updates of 3rd parties. Changes in 3rd party APIs need to be factored in as well. The standard does not include an operations sections, which might be useful to hand over to the operations team. It listed error scenarios, corresponding system output and recovery strategies. In other projects and contexts, it is sometimes named Operations Cookbook.
The standard can be quite useful even in agile environments, when it is updated and filled step by step. In the end, it provides a holistic view of the architecture.
Credits for cover image go to: heise.de