Source code management: protecting the process
Building software using a team of multiple programmers working from different time and geographical zones brings production efficiencies, but managing the source code effectively is essential.
Mention source-code management (SCM) to most developers, and there is the strong possibility that they will think about version control of individual source-code files. At its most basic, version control is a method of maintaining separate versions of files, rather than simply overwriting the old ones.
If SCM is regarded as nothing more than a version-based safety-net for over-enthusiastic edits, or a cumbersome method that over-complicates development projects, then managing your software configuration can become costly in time and effort. SCM can go a lot further, and a more comprehensive SCM system can lead to happier and more productive developers.
The most up-to-date SCM systems offer developers and project managers a way of implementing an efficient and productive workflow: they go beyond simple version control - they can streamline project management. SCM should also be part of every professional programmer's 'skillset' however, the discipline is still not always featured as an integral part of academic software education programmes. Many, maybe most, programmers become familiar with the SCM's principles in their first jobs, where trainee developers are introduced to the basics by more seasoned software engineers. A solid working knowledge of SCM usually comes when programmers assume a more active role in project delivery, as that enables them to see the full reach of SCM in their organisation.
Like any computer science discipline, SCM has evolved over the last 50 years (see timeline, pages 52-53), but its objective has always been informed by a requirement to enable software projects to be completed properly to schedule, and to enable multiple contributions by software specialists who, although working on the same assignment, may be geographically-dispersed for the duration of completions; as such, the tenets of SCM have played a part in defining the dynamics of managing global project teams.
ESCM change control
A modern SCM system also enhances traditional version control with capabilities such as workspaces, changelists, branching, 'lazy copying', and collaborative development. A workspace is the area on the developer's desktop where the SCM renders private copies of the specific versioned files the developer requests. This allows the developer to work at his or her own pace without inhibiting the work of colleagues. As the developer has private copies, he or she can also accomplish builds and unit tests before checking in files.
Changelists allow developers to check in their changes into the SCM as easily referenced, transactional units of work. If a changelist number is assigned to the transaction, then the developer knows that his or her work is stored safely within the confines of the SCM database. A changelist consists of the set of files that have been changed, their revision numbers, and a description of the work performed. Basic file operations common to all SCM systems (such as adding, editing, deleting files and backing out changes) are recorded as changelists to help answer questions such as: Which other files were affected by this change?; Which defects does this change fix?; Which releases contain the fix for this defect?; Which releases do not contain the fix for this defect?
Working without using changelists can be time-consuming and confusing, as developers have to keep manual records of their work, and cannot easily see how different file revisions are related. More advanced SCM systems ensure that if a changelist affects a number of files then either the changes for all the files are submitted, or none are. If the network connection is interrupted during changelist submission, the entire submit fails, and the files return to a 'pending' state to allow any issues to be resolved. This feature removes the worry of partial check-ins completely from developers' minds.
One use case for a changelist is to create a 'branch' by taking a copy of entire areas of the code base at specific points in time. Branching allows these copies (or 'codelines') to diverge over time, while offering the option of merging selected changes between them. A common example is that of propagating a bug fix from a release codeline back into a development codeline so it can be included in a future release. The SCM system records which changes have been propagated and removes the need to repeat the same development effort in each branch.
Branches inherently have history, so changes to files, additions and deletions can all be managed within the context of the branch. Merging changes between branches is much easier: the SCM system uses the information it has about the history of the branch, and the point at which the files forked. In this way, many merges can be performed automatically and human intervention is only necessary when two changes have affected the same part of a file - a so-called 'conflict'.
'Lazy copying' ensures that the size of the code repository does not grow unnecessarily when branches are created by storing a pointer to the source file(s) rather than creating a physical copy. As the variant is modified, all that is recorded are the differences between the modified version and the original.
If large code frameworks are used to support development and a branch is needed for the purposes of rapid prototyping (a cornerstone of agile development methods), then attempting to copy that framework in its entirety is a long-winded exercise. It is not unusual for even extensive modifications to a large code framework to result in a divergence of just a few per cent between variants. Because lazy copying only requires space for a tiny proportion of the overall code base, the storage savings can be immense.
Other advantages that branching provides become commercially significant when you consider collaborative development effort going beyond one office. There might be developers working on the same project in different countries, and also subcontractors who can be given access only to portions of the code base. Traditional ways of handling this involve locking files and waiting for others to communicate their status.
This can lead to delays, especially if engineers are working in different time-zones. SCM that goes beyond versioning files makes it possible to 'back out' of a change even when other modifications have been made to the source-code module. It works by taking the state of the system back to the point before the bad change was made. It is possible to skip that edit, but then roll forward, accepting later ones so the resulting file has all the correct changes in place.
Sometimes a change is effective, but it is also far from pretty. Consider a product that is already shipping, but some users have encountered a bug. In these cases, the emphasis is often on speed rather than elegance. The expedient fix might be simply to repair the actions of the bug rather than seek out its true cause, which might be in some other, possibly quite distant module. This kind of fudge works but is not something that should be propagated to a new version of a product. Some SCM systems offer the ability to mark these 'ugly code' changes so they do not propagate to later releases. When copying the data across, these 'ignored' changes are passed over by the system, leaving 'clean' code ready to receive the correct remedy.
It is possible to label sections of code manually to show developers which ugly fixes they need to roll back although this involves a lot more intervention. The idea of having different codelines - for shipping products and those in development - is an important concept in SCM. Many SCM systems advocate the use of 'labels' - a named list of revisions of files - for tracking product releases. The big problem with labelling is that labels can be altered at any time; thus it's possible to 'lose' the records of an entire release, or worse still, to deliver bugs to customers because the records for a release have become inaccurate.
There are other issues with the labelling approach as labels themselves have no history. Usually another label is needed to record every new state and so an explosion of labels results - often with a convoluted naming convention to help people make some sense of the enormous list. An Intelligent branching model will help remove the issues that come from relying too heavily on labels.
SCM also quickly handles more complex situations such as when some of the code to be incorporated is derived from open-source software. Changes made to code provided under a GNU licence, for example, will need to be supplied on demand to other users of that source code. Some commercial source-code licences also demand that modifications be provided to the vendor.
Traditional methods might require all the code to be kept together. Having third-party code in a separate branch in your SCM system is critical in ensuring all contractual obligations are met without confidential information being disclosed at the same time.
When developing on more than one platform, the differences in the line-ending conventions across platforms can lead to some annoying problems. SCM systems can resolve the differences between development platforms because, by understanding the conventions of the platform, the SCM system can automatically translate between the required formats. This reduces the chances of a build failing later if the platform used for compilation differs from that used for editing.
SCM need not only be a way to maintain source code, but can also provide intelligence on the progress of entire projects. It is not a fancy name for version control, and it offers time and effort savings over more traditional methods. For advanced users, SCM forms the underpinning for successful project management because of the structure it provides. If you ever find yourself adding comments or changing file names to add things like 'Version 2.1.2 fix', software authors and software project managers need to take another look at SCM and what it can do for improving the quality of their workflow.
Workspaces and codelines
Two fundamental concepts of source-code management are workspaces and codelines. A workspace is tailored to an individual or project, and is sometimes called a 'sandbox' or a 'view'. Instead of multiple engineers working with multiple folders, each workspace allows users to organise the individual files they need in a way that enables them to work most efficiently. Changes to managed SCM repository files begin as changes to files in a workspace, and it is therefore important to make sure the workspace environment is well structured. Sharing workspaces can cause confusion - like users trying to share a desk and all the papers on top of it.
The codeline is another integral part of a modern SCM system, and is used to group together the source files needed to produce your software. There is frequently not just one codeline - branches will evolve into separate codelines, each embodying a different release, variant, project, or version. Changes are propagated from one codeline to another as required. A fix made to a bug found in a release codeline, for example, can be 'pushed' to the different variant codelines to ensure that they also benefit from the fix. This helps reduce development effort by ensure the bug need only be fixed once.
Five steps to effective software configuration management
Workspaces are important. Do not share them between developers and ensure work does not progress outside the workspace. If that happens, vital information is lost.
Avoid 'jello views', where files in one workspace are linked to another. It means developers are not in control of their own workspaces because files can change without their intervention.
Check-in often. Integrating work with other peoples' means you can check in changes as soon as they are ready.
Give each codeline a policy and an owner. A development codeline policy might allow more extensive changes than one intended for release and can allow check-ins before full testing is completed. In contrast, a release codeline policy might limit check-ins to bug fixes, ensure software passes a regression test or source-code analysis, and confirm coding guidelines have been followed before a check-in is allowed. Ownership means, where the policy is ambiguous, a designated person can decide what to do.
Make sure there is a 'mainline': this is the branch of the codeline tree that evolves 'forever', and provides an ultimate 'destination' for almost all changes. Individual release codelines and development codelines branch out from this mainline. Ultimately, work from branches that might be propagated forward is merged back into the mainline.
|To start a discussion topic about this article, please log in or register.|
"Africa is abundant with engineering opportunity. We look at some of the projects and the problems."
- Greenpeace frowns at Centrica's getting a shale-gas venture stake
- HMS Queen Elizabeth nears completion
- World’s most advanced comms satellite shipped to launch site
- Scientist to benefit from exascale supercomputer deal
- Chinese space capsule reaches its ‘Heavenly Palace’
- Dinosaurs’ app uses augmented reality
- E&T magazine - Debate - HS2, the need for speed [01:33 pm 18/06/13]
- Creating an Iphone App [05:50 pm 17/06/13]
- CO2 is good [07:29 pm 16/06/13]
- DECC-EDF makes yet another attempt to fund 3rd Generation Nuclear at any cost [05:02 pm 15/06/13]
- Transformers Vector Group [09:46 am 15/06/13]
Tune into our latest podcast