Orbiting from small to enterprise and back again
Looking back towards the future
The development of a new project feels like staring at a fresh piece of paper. For the perfectionists amongst us, embarking on something new is daunting in the sense that it is often a calling out to “get it right this time round”. This is especially true when you’ve suffered the experience of herding aging codebases, and the pressures of feature requests, all while trying to pay down technical debt, and facilitating moves away from some sticky architectural flaws. Depending on the politics of an organisation; there may be a lack of understanding of what technical debt truly is, or an aversion to change and refactoring all-together.
That’s just “the code”, but there are also processes around figuring out what should be developed that can be an adventure in and of itself. Sticking to the convention-come-rulebook often produces sub-optimal results. Did the capture of user stories cover all the use cases? How important are these requirements really? Are they well-investigated or are they presumptive? Can they be negotiated to ease implementation and maintainability? How valid are these tests given that we have this minority set in production? These are just some of the questions that need to be scrutinised before even one line of “the code” is even materialised.
The development team at MWR has gone through an interesting journey. The earliest software product in our living memory was an Enterprise Detection and Response (EDR) solution developed from the ground up. This solution was for clients with very particular data segregation requirements (before GDPR legislation was even a thing), thus calling for a highly segregated environment. It was a sprint toward a solution that could be scaled and yet remain reasonably flexible and manageable, but most importantly be effective at a low cost. Cloud was not an option yet at this point, since clients didn’t fully trust it, and from our consulting experience this was rightfully so at the time.
Further along in our company’s timeline we were acquired by a global company, and integration of two vastly different tech stacks began. A lot is learnt about a codebase when fundamental adjustments must be made, and almost every assumption that was historically made is challenged in the process. Keeping the machine running is challenging when a full re-write is not a possibility, and when forking a codebase would incur significant overhead. Development under such circumstances leads to many learning opportunities, from the systems aspect of development, down to the way code is written and structured.
Alas, change seems to be the only constant in software development. A small wedge of the existing MWR business unit in South Africa saw an opportunity to take the branch in a different direction this year, one that was more fitting in many ways to our customers in the MEA region. Development continues, however, towards different purposes and having shed much of the enterprise-software nature.
Through all of this, we have learnt more than a few things, and collectively have experienced software codebases written in 1990s-style C, AWS-native Lambas at scale, and much of the in-between. Opinions will differ as to the “how” of software development, and one engineer’s assumed golden-way may often be someone else’s pit-of-antipatterns. There are lessons to be gleaned if you step back and take stock, which is what we’ll be doing in this blog post. Taking what is good and also knowing when it’s good are both important. Coffee is good on a Monday morning, while often it isn’t so good on a Sunday evening.
Programmers are people, people are challenging
Working with people both for the immediate and the long-term is vital to the success of an organisation, and vital for oneself as a matter of professional development. This is expressed in the daily interactions within the organisation, the decisions being made large and small, all of which will result in the nature of the software artifacts produced.
It is commonly thought that a development team’s effectiveness is related to how much work they can do and product they can ship in a unit of time. In a good part of our experience this is secondary, and more an artifact of local optimisation and the method of measurement, than of reality. Software engineers need to understand where their activities fit in the overall scheme of the business, and by doing so, contribute to globally optimising business operations via a specialisation in development activities. If engineers can achieve this without writing a single line of code, they have simultaneously provided value and reduced the cost that would have been incurred by developing a solution that was not necessary. Engineers making decisions that avoid future problems means that the development resources needed to fix those problems never needs to be allocated in the first place.
This observation does, however, often interfere with company culture, which may or may not foster collaborative problem solving. Excessive collaboration can also be a problem, since the logistics of getting everyone into a meeting on a regular basis are untenable. There is also inefficiency in the application of resources to collaboration that would be otherwise productively engaged.
One key thing to remember is that one cannot fight with the physics of a situation; since the physics always wins, and no amount of self-manipulation, on either an individual or an organisational level, can adjust the fundamental physics of a situation. Having the right people that are oriented towards professional teamwork and that are putting in the effort to be fact-based wherever possible, rather than opinion-based, is important to giving the business the best chance at succeeding in the long-term.
Building the culture in the right direction will improve the solutions being created or maintained. A significant part of such a culture is the understanding groomed on all levels of the organisation to generate and treat software as an asset to be built and maintained properly. As part of this process, software engineers should learn to describe the abstract artifacts and processes they work with in a more concrete manner for the minds of other business stakeholders. There is a large amount of untapped potential product innovation and increased operational efficiency when development activities are workshopped with higher-level planning and executive organisational functions, and the various areas of the organisation have learnt to speak each other’s language.
More is different
The choice of programming language, frameworks, testing-approach, architectural design, and other fundamental aspects underlining a software project will affect how the project will develop in the short and the long-term. Unfortunately, both the short and the long-term aspects are important; revenue generation is important in the short-term to make the long-term more probable. The real lifetime of a project is also often unclear well into the development cycle. There are many projects that should have been put to pasture yet are still hauling coal, regardless of what the original intent was.
Drawing characteristic curves of how scaling the various parameters affect software development processes helps in giving insight as to what the impact of a choice will be. The implicit assumption made psychologically is that of linearity: that is to say; if we keep with the current effort we should progress at a constant rate. This is, however, rarely the situation. Complexity rears its ugly head, or it’s beautiful head, depending on what you are working on and the direction of progress of the requirements is affected. It is important to realise that as a project progresses in one dimension there will be a gradual change in the rate of other variables, and possibly even abrupt turning points. Life, in general, is nonlinear. Projects most often become more difficult to progress as they increase in complexity. The Conant-Ashby theorem regarding complexity is an important read in this respect and should be understood to help define a bounding limit of complexity required.
To this end, it is important to use and create tools that help to tame development complexity, without creating significant added development complexity in working around the restrictions they will necessarily involve. You may wish to foster a level of emergent functional complexity that is required of the system, however, developmental complexity does not necessarily have to follow at the same rate. Emergent complexity can aid the adaptability of the system to new requirements if it is enabled in a controlled manner. It arises from the interaction of code tools that are well designed for a particular purpose, but that can be used in flexible ways with each other at multiple levels. Arguably the emergence of many of the suite of tools provided by the large cloud services show the value of emergent complexity; they are defined enough to be reasoned about, while still being adaptable to the needs of specific clients.
The process of creating emergent complexity in fundamental areas is what allows the system to be adapted by the users of the system themselves, rather than having to request changes. This fast-feedback loop approach leverages untapped development potential in the userbase, offloading work from developers. Often the requirements for a system are difficult to gather and even more difficult to accurately define, and such an offloading may be highly beneficial.
Simple is not so simple
Keep it simple stupid (KISS) is often repeated, but how is it achieved? How does one manage to achieve optimal simplicity without using gut-intuition and opinion-based preferences, especially in the highly abstract domains in which software may be created? These are not easy questions to answer: on a literature survey over books dealing with complexity you’ll find a substantial proportion will talk about complexity and simplicity without ever attempting anything beginning to approach a rigorous definition.
Given sufficient domain knowledge; one can often easily identify designs that are obviously overly complex, however, moving those designs towards optimal simplicity is another matter (and a large part of engineering skill and experience). Gaining some level of understanding of what constitutes “optimally simple” is even more difficult. This is where the people problem arises in force: what is perceived as simple to one is not simple to another. Often the debate is a result of looking at the problem with differing measurement sticks. This disagreement is in essence good, since the “right” stick to use is not a trivial problem.
Attempting a concrete analogy; let’s consider a little physical gadget. A widget is needed to do some task, such that it requires moving pieces that are held in tension by springs. It’s possible to give a dedicated spring to each part or create the shapes such that springs are shared between different parts. Less springs seems “simpler” visually and when counting the parts, so it’s a no-brainer to many, and should be cheaper to manufacture. However, on further inspection this design results in a coupling between the interaction of parts that may be undesired, a complex balance of forces, and assembly may have just become more complex to boot. This isn’t merely an academic analogy either; the “lockwork” of firearms is a microcosm where one can visually see engineers grappling with balancing cost to manufacture and feature requirements throughout history, given the technological factors of the day.
In general, it’s important to KISS, but keep in mind that there will be both observed complexity and unobserved complexity. The unobserved complexity is the part that is often not accounted for. Unobserved complexity is often colloquially understood in development terms as coupling, but this is only one manifestation. Discovering the simple solution will take work and collaboration from the team. Looking at the problem from multiple perspectives is part of the process, all with a healthy understanding that what may be considered simple today may result in not-so-simple tomorrow.
A question of paradigm
There is much TODO about functional programming now. Having maintained codebases in that area, there are some observed tensions that inevitably arise when functional purity smacks into unavoidable real-world states. Depending on your educational background, the object-oriented manner of implementation may be more familiar, though it has its issues. OOP is considered an implicit default by many due to this familiarity, to which others are often unfairly compared. Our experience shows that attempting to be a purist in any paradigm is not recommended. Learning lessons from every discipline is a more mature approach.
Having clear state management that is enhanced by language constructs is important. Selected immutability and functional purity are your friend in reasoning about a codebase. Type systems can be used effectively in conveying the assumptions that underly the flow of code, however, generic types are generally more useful.
Avoiding classes or types around concepts for the sake of functional purity often leads to a lot of duplicate code and routines that are so specialised that tests around them tend to become very fragile. Creating classes that don’t need to be created is also something to avoid, since creating abstractions comes with its own impact on understandability and flexibility. Plain old pure utility functions that may be applied widely when it is proper to do so are not to be underestimated.
An excessive search for (often speculative) generality will most often either result in sub-optimal abstractions or require greater time than is available for them to be effectively designed. A complete lack of any search for generality will most often result in a very WET codebase that is fragile and inefficient to work with, prone to bugs, and that cannot be used as a base for other projects. The balance comes with experience and a willingness to learn both from the past and from the imminent realities in the code in front of you.
Proofs, testability, and tests
The fundamental model of how a system is going to operate is the first point of making software robust. You cannot optimally build what you don’t sufficiently understand. To some the level of the model should be mathematically sound and provably so. Most often this luxury of provability is not afforded to real-world problems, however, it is still necessary to be able to conceptualise how the system operates, and the bounds around the states in which it may be in.
Every program will conform to some sort of model. This may be implicit and poorly understood, or it may be explicit and better understood. In most practical applications an explicit model is created by the act of making parts of the system testable yet flexible and maintaining a set of tests that deal with both the whole and its parts.
Note that there is intentionally no mention here of various metrics and practices; code coverage, cyclomatic complexity, continuous integration testing, etc. Do what is needed in your domain to be able to sufficiently understand the model of the software and at a sufficiently regular time interval and be able to drill-down into further model details. Everything else will tend toward fetishising the result of a measurement, rather than being useful.
Optimising the feedback mechanisms used in your development and testing processes for a combination of frequency and actionable informative content will enable you to better almost all aspects of your product and development processes. The ratio between the frequency and actionable content, along with the amount of each, will depend on what you are working on.
Technical debt and technical investments
In our experience, technical debt is one of the most controversial issues that arises not only from differing views of stakeholders within a business, but also from within software development teams themselves. To some it doesn’t exist and is simply aesthetic and time-wasting futzing about. To others it is plain and obvious to the point of making a development experience miserable, akin to having to work on a team with someone who is continually belligerent and stubborn.
For those who don’t know what technical debt is; it’s sub-optimal design and implementation that arises mainly from a combination of local optimisation (as opposed to global optimisation), changing requirements, immediate time pressures, imperfect knowledge of the future, and developments in the ecosystem on which the system depends. It gets in the way when you’re trying to fix issues, add features, or improve tests and testability. It’s unpleasant, but there is no removing all of it, rather one must manage it as part of development activities. If you don’t believe that it exists, we’re not going to be able to convince you. If you don’t believe in technical debt, you probably don’t believe in refactoring either, a concept to which there is likely a similarly rigid mental state.
As a company with a heavy focus on security, it is our experience that technical debt usually tends to bleed into security issues. It further influences team motivation, the ability to reason about the problems at hand, and makes for a lot of wasted energy in trying to navigate around it. Paying down technical debt is an investment when you consider the effect it has on making other things easier, and effective software design is one form of technical debt reduction that has a long-term impact.
On this topic one of our team said it best: “I’d really like to move the couch before I have to sweep under it.”. That is a bit of concrete language around an abstract concept that everyone understands.
Avoid dark corners
The design of software should involve meeting current stakeholder requirements; a statement which every development team should have experience with. To some this statement is also where it ends, sometimes given the caveats of good practice, and anything else is surplus to requirements and to be avoided and de-prioritised. Is this realistic though?
Let’s consider a thought experiment: Given the pick of two systems, a choice is to be made. The first fits perfectly to the current requirements, however, is challenging to change. The second does not fit all requirements perfectly, however, is much easier to change. Given this choice, which do we choose? It is not a false dilemma; since we already know a third option of a system with a perfect fit to requirements and easy changeability would be most desirable; so is of no use to mention.
Given this choice, arguably most with experience would agree that the second option of an imperfect yet more easily changeable system is better in the real world. Requirements are almost always changing and “moving with the cheese” is necessary. The importance of the ability to adapt to the future as an aspect of software design is thus an important first-class consideration.
To achieve adaptability there are several strategies that may be employed. Modularity, layering and re-usability of software components allow for adaptability when they may be wired into different combinations. Separation of concerns and the Law of Demeter allows for generality to be extracted and tested, creating minimal coupling for re-use later. These are all well-known principles, but their application as both tools for reasoning and as future-oriented devices is not as well-perceived.
These strategies, however, take work, experience, and allocation of resources. The first solution to a problem often needs massaging into a more malleable form. Since the future is uncertain, these strategies will necessarily involve some degree of speculation, and thus speculative generality is in some degree necessary. The question is to what extent.
One approach is to avoid seeking generality until after a similar case pops-up, at which point the extraction of composability and generality is done prior to the main task. This results in work which is considered by many as out-of-scope, and as a result is disincentivised. The learnings of the original are forgotten due to distance in mind and time. As a result, this approach is not recommended.
Applying change-facilitating development up-front at every step is thus important from a discipline point of view. The degree of which depends on the ratio between up-front development cost and the technical debt which results in costs born later. A guide to this is some relative measurement of the degree of uncertainty in the stability of the requirements, with uncertain development being paradoxically less tolerant of technical debt in the long-term.
The development of development
Management of software development processes is in essence a distributed string of decisions throughout time. Decisions require information to be presented, reasoning methodologies applied, options evaluated, learnings made and conveyed, areas of investigation identified, and existing decisions re-evaluated. Decisions are made at every level of the development process, and for responsiveness they should be made at the level of concern. In our experience the continuous decision-making process typified by Python PEPs, IETF RFCs or similar are highly effective at making good decisions when used on multiple levels, and amongst the team. Not only are better decisions made, but the context for those decisions is captured, and the rationale thus documented.
Management of software development is highly linked to the process of software design used. Design is in all disciplines necessarily an iterative process; a result of non-zero uncertainty, imperfect information, exploration, experimentation, and environmental changes. Some domains require more iterative steps than others. The success of a software development management methodology will depend both on the situation it is deployed to solve and the coverage of the aspects of design that the methodology achieves. In situations where the requirements are stable, but the feedback loop is very slow or non-existent, a lot of up-front planning and simulation is needed before development starts, and a lot of testing is needed after major development phases. An example of this would be firmware to be deployed on an orbiting satellite. In situations where requirements are very unstable, but the feedback loop is fast, much less up-front planning is needed, and sufficient testing should be performed for much smaller development increments.
The tools used within the methodology to manage the development process need to be tuned to the nature of the software being developed. Part of the management will be measurement of progress against milestones of any resolution. Such measurement performed at such a rate must be effective yet should not be done at a greater rate due to the costs incurred by measurement itself. This is simply due to the physics of multitasking and translates into every level of the use. If the resolution or detail expected is significantly greater than the actual rate of response that can be brought to make a change, too much of the effort is being spent on measurement. If the resolution or detail is significantly smaller than the effective rate of response possible and necessary, then insufficient measurement is being performed.
Above all, an understanding of the project management triangle is necessary during iterations of design. There is no such thing as a formula-one tank, no matter how much hype/persuasion/coercion is put around it.
Design via development
In business terms, the result of software development is a software artifact or service that clients may use. The nature and form of the user-facing components is often held as the prerogative of product managers, user experience practitioners, and the users themselves. This is natural if it is easy to gather requirements from or predict the requirements that would best suit the user.
In the situation where users have little to no idea what they might need, such as when supplying bleeding-edge functionality, the process is somewhat different. Product managers are put in a very awkward situation, where they need to specify functionality with little to no context. Our experience has been that developers are better able to propose functionality in that situation, since they are closer to the technical restrictions that influence product adaptability. Product management may then review the proposals, and work with the developers to produce specifications for early versions. Being able to adapt in such a situation of uncertainty is important to be able to provide user-minded functionality in the long-term. User experience will be gathered from such a starting point, and iteration and refinement performed. A user-minded development team will be able to consider the user when guided by product management, and the first pass will be reasonably accurate to provide value up-front.
The challenge in most businesses will be the need to transition in the ways of working from one mode to another depending on the maturity of the area being worked on. Building a collaborative culture is important to be able to make these kinds of transitions when necessary.
Laborious or factorious
The processes around development are often as important as the target of development and building these with repeatability and extensibility in mind is important. Unfortunately, in business terms, these processes are considered overheads and do not get sufficient attention until they become a drag on development. Such ancillary processes include automated testing on different architectures, deployment, integration tests, and documentation. In the context of a security-oriented company, the rigour to which these processes are held needs to be equal to the product itself.
Developers and development management will need to advocate for these aspects as a matter of professionalism. The feedback loop between either a catastrophe or naggingly sluggish development and the remedy is simply too slow for an effective response via reactive managemental channels. Quality concerns start at the point of development, with quality engineering coming later in the development cycle, when problems are harder to correct. Not everything can, nor should everything be automated; the relative benefit should be weighed against the amount of effort involved, and the expected lifetime of the automation solution.
Pilots and doctors are given checklists for important processes for a good reason; people are prone to mistakes. Software development can make its checklists and routine operations run themselves; we could have a lot more to complain about!
Our development in retrospect
MWR CyberSec has been on quite a journey. Our development work in the cybersecurity space has seen some of the most interesting problems. These range from kernel drivers, and performance-sensitive algorithms, through to embedded scripting, onward to highly scalable backend pipelines, arriving at portals with high utilisation factors. There are few sets of challenges that allow one to achieve well-rounded maturity in development techniques; if you come with the willingness to make every day a school-day, optimising not only code, but all aspects of the processes, always, with inherent security as a prime directive.
It is our collective experience in a relatively complex space that has given nuance to many of the topics discussed. While some may tend to ignore anything that does not recommend concrete hard-and-fast rules, it is much more important to understand the tools of thought that may be applied to your unique problems. To those we have only had the time for a brief introduction.