Overview of Version Control - Version Control - Professional Team Foundation Server 2013 (2013)

Professional Team Foundation Server 2013 (2013)

Part II

Version Control

· CHAPTER 5: Overview of Version Control

· CHAPTER 6: Using Centralized Team Foundation Version Control

· CHAPTER 7: Distributed Version Control with Git and Team Foundation Server

· CHAPTER 8: Version Control in Heterogeneous Teams

· CHAPTER 9: Migration from Legacy Version Control Systems

· CHAPTER 10: Branching and Merging

· CHAPTER 11: Common Version Control Scenarios

Chapter 5
Overview of Version Control

What's in this chapter?

· Understanding the purpose, core concepts, and benefits of version control

· Understanding the differences between Team Foundation Version Control and Git-based repositories

· Analyzing the strengths and weaknesses of common version control products

Version control is the single most important tool you can use when developing software, regardless of the particular provider you use to give you version control functionality. Most software developers use version control tools in their day-to-day jobs, and yet, the fundamental concepts and reasoning behind version control are rarely discussed.

This chapter starts by explaining the fundamental concepts of version control and what functionality you typically find from a tool-independent viewpoint. Then we will discuss the similarities and differences between centralized and distributed version control systems. We then examine various important version control tools and analyze their strengths and weaknesses. The chapter concludes with a high-level look at the version control capabilities of Team Foundation Server, and looks at when Team Foundation Server is or is not the correct tool for your team.

What Is Version Control?

Version control is known by many names. “Source control” is frequently used, but the term “revision control” and even “software/source configuration management” (SCM) can be used to refer to the same broad set of functionality. Because a modern software project consists of much more than merely a set of source code, the term “version control” is used throughout this book, although the terms can be used interchangeably (and often are—even in the Team Foundation Server product).

Broadly speaking, version control provides the following capabilities:

· A place to store the source code, images, build scripts, and so on needed to build your software project

· The ability to track the history of changes to those files, and to view the state of the file at various points in the software life cycle

· Mechanisms and tooling to make it easy to work in parallel with a team of software developers on the same project

If you make a mistake when editing your code, a version control system lets you roll back time and get back to the state before you just accidentally deleted the past weeks' worth of work. A version control system allows you to know what code is running on someone's machine. Version control allows multiple teams of developers to work on the same files at the same time, and not get in each other's way.

Version control is so important that it is required for regulatory compliance in some industries, yet a remarkable number of organizations still do not use a version control system. Copying the folder containing your source code to another location is not a version control system—it is a backup. In the past, version control systems could be expensive, complex, and difficult to use. Today, version control is such a fundamental aspect of software development that, in its basic form, it is a commodity item and is increasingly easy to use. But even if you are a developer working on your own for a project, the safety net provided by a version control tool is worth the investment.

However, not all version control systems are the same. Even more confusing, some tools use the same words for very different activities. To thoroughly understand the nature of a version control system, you must be familiar with some core concepts.

Repository

In general, code is stored on a machine somewhere in a repository of some kind. The repository is usually represented as a tree of files, similar to the regular directory structures that everyone is familiar with in modern hierarchical file systems. However, the repository differs from a file system in one very important aspect: time. Whereas a file system is a collection of folders and files, a version control repository is a collection of folders and files and the changes made to those files over the repository's lifetime, thus allowing you to know the state of the files at any given point in time.

Additionally, a version control system must provide a number of features to make it useful. You need a way to share that version control repository with others on your team, make changes to the code, and share those changes with each other.

Note

To keep things simple, this chapter's discussion refers to the repository as if there is a single master repository on a central server somewhere allowing your team to work. This is the traditional centralized version control system that many developers are familiar with today. However, not all version control systems work that way. In a distributed version control system (DVCS) such as Git, the machines work in a peer-to-peer manner. In other words, each machine has a copy of the full repository in its own right. This has some advantages and disadvantages that will be examined later in this chapter within the larger distributed version control discussion.

Obviously, storing every version of every file can take up a lot of disk space. Version control systems frequently employ various tricks to ensure that the repository is efficient with storage. For example, Team Foundation Server will store the initial version of the file, and then store the changes between each version (known as deltas) for subsequent changes.

The “delta-fication” process in Team Foundation Version Control is actually much more complex than this. Optimizations are in place to ensure that this is done only for certain file types. It also ensures that recently accessed versions of files are cached in a ready-to-download state to avoid the computationally expensive task of rebuilding the file every time a developer requests a copy.

Working Copy

The files in the repository must be in a local area on the developer's machine. This allows the developer to work with the files, make changes to them, and debug them before he or she decides to check in or commit those changes to the repository. This local version is known as the workspace with Team Foundation Server, but in some version control systems, it can also be called a sandbox or working copy.

Note

In Team Foundation Server, a workspace is actually more than just a working copy of the file system. Check out Chapter 6 for more information on workspaces.

Using a local working copy of the repository, a developer can work in parallel with others on the team and know that the working copy of the code will not change until the developer performs a specific action. The developer will either make a change to the code or update the code from the repository by getting all the files that have changed since the last time he or she got a copy.

Working Folder Mappings

A working folder mapping is the link between the place in the repository where the files are stored and the place in your local file system where you have a working copy of that part of the repository. The terms working directory, workspace mapping, or sandbox can also be used to mean the working folder mapping.

Note

Note that the terms “workspace,” “working folder,” and “sandbox” have been used in different version control systems to mean slightly different, but similar, things, which can be confusing when you are trying to understand a new version control system. This is one of the reasons it is important to understand the core concepts now so that this discussion can use a set of agreed-upon terms throughout the rest of the book. Each version control system you use is slightly different, and once you think about the problem in the way the tool allows you to work, it is often difficult to think about version control in other ways. This change in context is a problem that people encounter when moving from one version control system to another and, therefore, one that is addressed in Chapter 6 when discussing Team Foundation Server in more detail.

Get/Clone/Pull

Once you have set up a working folder mapping, you must download the files from the repository to your local machine. In Team Foundation Version Control (TFVC) (as well as with some other version control tools) this process is known as Get. In Concurrent Version Systems (CVS) and Subversion (SVN), the same process is known as check-out—a word that means something slightly different in some version control systems like TFVC, as will be described shortly.

In a DVCS tool such as Git, you do not get individual files from the repository, but rather you clone the repository in its entirety. To initiate the creation of a local repository from a remote one, you would execute a Clone operation. This will retrieve an exact copy of the remote repository and place it on your local machine. While you are working on a local repository you might need changes from other team members. In this case you would execute a Pull request from your colleague's repository to update yours.

Add

What if you have no files in your repository? When you first start using a version control system, it is empty. In this case, you need to add some files. You select which files and folders to take from your local machine and add to the repository so that you can share them with the rest of your team. When you do this, you seldom want to add all of the local files.

For example, you might have some compiled binaries derived from your source or files that are generated by your tools in the source directory each time your code is built. You typically do not need or, indeed, want to share these files. Therefore, Team Foundation Server provides tooling to help you select which files from a folder you are actually interested in adding, and filters out those you want to exclude such as DLLs, compiled classes, and object files that are typically part of the build process and not the source code that you want to store.

Check-Out

If you want to work on a file, then some version control systems (such as Visual SourceSafe or Team Foundation Version Control when working in a mode known as a Server Workspace described later in this chapter) require you to inform the server that you are working on the file so that others on the team can know (and so that the server can check that you still have permission to edit the file). This operation is known as a check-out with Team Foundation Version Control.

In Visual SourceSafe (VSS), a single file can be edited by only one user at a time. Therefore, when a file is checked out, it is locked for editing for other users. In Team Foundation Server, the default behavior is that multiple people can edit the same file simultaneously (which is generally a best practice to allow a version control system to maximize productivity). However, you do have the option of locking the file as you check it out if you wish to prevent edits for some reason.

Note that the use of the term “check-out” is slightly different in this context as opposed to the use of the term in the context of CVS and SVN. In those systems, “check-out” means to download the files to your working copy locally. This is equivalent to a Get in Team Foundation Version Control.

A new mode of working was introduced in Team Foundation Server 2012 called a Local Workspace, and it's examined in Chapter 6. As with systems like SVN, with a Local Workspace in TFVC, you are not required to explicitly check out a file before working on it. Instead, you can edit any file that you can download. All files in your working copy are writable when you download them from version control and you just edit away. This has advantages because it means that there is much less friction when editing files locally, especially from tools outside of the main development environment. It also involves less frequent communication with the server, which makes working with the files offline much easier. However, you lose the ability to know exactly who on your team is currently working on which files.

Changeset/Commits

As you check out and edit files, you are building up a set of changes that will need to be applied to the repository when you wish to commit those changes. This set of changes to be done is called a changeset, commit, or changelist. The changeset consists of all the changes you have made to the files (for example, editing, adding, or renaming a file); in some version control systems, the changeset also contains metadata about the commit, such as which work items were associated with it or, in the case of Git, the original author of the changes that the committer is committing (which is important to track in many open source style workflows).

Check-in/Commit

At some point, you have a set of changes that you want to commit to the repository to share with your team, or draw a line in the sand as being at a point that you want to save your state with the repository. You want to commit your set of changes with the repository. This action is called a check-in in Team Foundation Server but can be called by other names (such as “commit”) by other tools.

In more modern version control systems such as Team Foundation Server, Subversion (SVN), or Git, the check-in is performed on the repository as a single atomic transaction. That is, if, for some reason, your changes could not be applied to the repository (for example, if someone else has edited a particular file with a conflicting change while you were working on it or an upload of a file failed due to network issues), then none of the changes in your changeset are checked in until you resolve any issues preventing the check-in. In addition, if someone were to get a copy of the repository while you were doing your check-in, he or she would not get your changes. Only after your check-in has completed successfully are your changes visible to the rest of the team.

When you check in, you can also provide some additional data. In most version control tools, you can provide a comment describing the reason why you made the changes (and it is best practice to leave a meaningful comment). Team Foundation Server also provides the ability to provide additional metadata about each check-in, which will be described in more detail in Chapter 6.

Push

In Git there will be times when you need to send one or more commits from your local repository to a remote repository. This action is performed by executing a Push command to that repository. This is how a developer using a Git-based team project would send his or her work to the Team Foundation Server so that team members and the automated build process can use those files. Automated builds are described in Chapter 18.

History

As mentioned previously, a version control repository is like a file system that stores all the changes made to it. This extra dimension on the file system is known as the history. For a particular file or folder, the version control system can tell you everything that has happened to that file or folder over time. The history is one of the features that makes a version control system useful by allowing you to know what code you actually shipped to a customer who received a release at a given time. But it is also useful for many other things—for example, being able to go back in time and understand why the code is as it is now, to understand who has worked on a particular area in the past, or to figure out what has changed between two particular versions when suddenly something stops working that used to work well before.

The majority of version control systems provide the ability to label or tag files with a text description. This is a way of marking the repository to leave a meaningful name to a set of files in a particular version (for example, when you have done a build that you want to deploy). The label makes it easy to find that snapshot at a later time and to see what the repository contained at that instance.

Team Foundation Version Control provides for all of this history functionality; but, in addition, it makes it very easy to see what changes occurred to a file before it had been renamed. Similarly, changes that occurred before a file was branched or merged can be easily viewed from the file's history.

Branching and Merging

A branch is a copy of a set of files in a different part of the repository. This allows two or more teams of people to work on the same project at the same time, checking in changes as they go, but without interfering with the other teams. At some point in the future, you may want some or all of the code to join up again. This is when you need to merge the changes from one branch into the other branch. When merging changes from two files in separate branches, if the same bit of code has been edited differently in both places, then this is called a conflict, and the version control system will require someone to decide what the code should be in the merged version.

In most centralized version control systems, a branch is simply another folder at a different path that contains a copy of data from elsewhere in the repository. In Team Foundation Server, a branch folder is decorated differently, and branches are a first-class object with additional metadata and behavior to a regular folder, but they still live inside the repository. In Git, a branch lives outside the path of the repository and represents the versions of the files that particular repository has in that branch.

Note

While Team Foundation Version Control logically shows branches in a different folder inside the repository, the files are not actually copied. A branch is just a pointer to where the files are stored in the repository. This saves space in the repository and allows branches to be lightweight and quick to create. When a file is first edited, the branch will contain the delta between that version of the file and the prior version on the parent branch.

There are many ways to approach branching, and you should carefully consider what branching strategy you want to adopt. Chapter 10 provides more information on this.

If all you want to do is save your work in progress so that you can store a copy of it on the server, or possibly share it with others in your team without checking in the code, then you may also want to consider the shelving features of Team Foundation Version Control. Chapter 6 provides more information on shelving and unshelving.

Centralized Versus Decentralized Version Control

In the world of version control systems, there are many options to choose from. To begin the selection process, it is important that you understand that version control systems fall into two basic categories, centralized and decentralized.

Centralized version Control

Centralized version control systems (CVCS) are structured so that there is a single, canonical copy of source code with which all team members interact. The most widely used of these systems are Visual SourceSafe (VSS), Subversion (SVN), and Team Foundation Version Control (TFVC). These systems are described further later in this chapter.

The typical usage scenario for a CVCS is that the developer performs a Get from the repository to create a local copy in his or her workspace. The developer then makes changes to the code base—adding, editing, deleting, and renaming files. All of these changes are isolated on the developer's local machine and are not visible to colleagues. When the developer is done, he or she checks in or commits the changes back to the central repository to allow other team members to retrieve those changes and integrate them with their ongoing development efforts.

This model provides a single “source of truth” for the code base as well as a single location to back up for disaster-recovery purposes. It also allows for the implementation of a central security model to restrict access to individual files, folders, or entire sub-trees.

Distributed Version Control Systems

Distributed version control systems (DVCS) have been around for more than a decade, but only in recent years have they gained widespread adoption with the creation of systems such as BitKeeper, Git, Mercurial, Veracity, and Bazaar. Git and Mercurial are probably the most well-known of these types of tools and have seen the widest adoption of DVCS to date. At the time of this writing, Git is emerging as the most important of this generation of version control systems due in no small part to the rapid rise of GitHub (http://www.github.com) as a central location for the sharing of open source projects. Git interoperability is a requirement of most modern DVCS systems, and Git's fast-import file format is now the de facto standard file format for the import and export of DVCS repositories.

Differences between Centralized and Distributed Version Control Systems

There are some common (but fundamental) differences between the way a DVCS tool operates versus the more traditional, centralized version control system tools discussed previously. The key difference is that the local developer machines have the capability of acting as a DVCS repository themselves and are peers to each other. Changes are not serialized as a set of versions in a centralized repository. Rather, they are stored as changes in the local repository, which can then be pushed or pulled into other users' repositories. While a centralized repository is not required, in many environments, it is common for a repository to act as the central hub that contains the master copy of the code on which the team performs automated builds and that is considered by the team to be the definitive version. Table 5.1 shows the strengths and weaknesses of a DVCS when compared to a CVCS.

Table 5.1 Strengths and Weaknesses of a DVCS

Strengths

Weaknesses

It has full repository support when offline from others. It also has fast local repository access.

Using developer repositories can reduce the frequency with which changes are synced with the rest of the team, leading to a loss of visibility of the progress of the teams overall.

You can easily have multiple repositories and highly flexible topologies. You can use repositories in circumstances where branches might be used in a centralized server approach, which can, therefore, help with scalability. Because all the effort required to work with the repository is performed on the client, DVCS solutions typically have more modest hardware requirements on the server.

There is no centralized backup of progress for developers until changes are pushed to a central repository.

It encourages frequent check-ins to a local repository, thus providing the capability to track those changes and see the evolution of the code.

Current DVCS solutions lack some security, auditing, and reporting capabilities common to enterprise requirements, such as the ability to control access by path in version control. Access permissions are controlled at the repository level, not at the path level.

It is well-suited to many open source project workflows. It allows participation in the project without any centralized server granting permission. It works well for large projects with many partly independent developers responsible for certain areas.

Most centralized systems (such as SVN and Team Foundation Server) allow for optional locking of files to prevent later merge conflicts. The nature of DVCS tools makes this impossible.

Because of the way DVCS systems typically track changes, and because the nature of having distributed repositories means that merges happen more frequently, DVCS merges are usually less likely to produce conflicts, compared with similar changes merged from separate branches in a centralized version control system. However, merges can still obviously conflict, and the more the code has changed between merges, the more likely it is to require effort in performing the merge.

Because the entire repository is cloned to every machine, there can be an issue moving the large repositories across the network. This is often avoided by having multiple smaller repositories rather than just a single global repository.

As each working copy of the repository is a copy of the entire repository, including history, backups of that repository are implicit in each client. This increases the disaster recovery options without requiring any centralized overhead.

At the time of this writing, the integrated tooling or the tooling on Windows is not at the same level of maturation as the most popular centralized version control systems such as Team Foundation Version Control or Subversion.

DVCS systems provide a greater number of workflows when managing file versions. While this vast degree of freedom can be overwhelming to newcomers, once a basic workflow is established in the team it is quickly understandable.

Common Version Control Products

Many version control products have been created over time, and many are in use today. The most common tools used as of this writing are Visual SourceSafe (VSS), Subversion (SVN), Team Foundation Version Control (TFVC), and Git.

In Team Foundation Server 2013, Microsoft provided the ability to select a second version control repository engine in addition to TFVC during team project creation. You can now select either TFVC or Git as your version control engine. Distributed version control systems such as Git are becoming increasingly important players in the development ecosystem, especially in the open source community. This section also looks at distributed version control systems (DVCS).

Microsoft Visual SourceSafe

Visual SourceSafe (VSS) was originally created by One Tree Software and acquired by Microsoft in 1994. Microsoft Visual SourceSafe 2005 was the final release of the product, and it was scheduled for retirement from mainstream support in 2012. Despite its age, VSS, a pioneer in its day, is still a well-used version control product. It is very easy to install and set up, largely because it uses a file system–based repository and does not require a dedicated server. The early design did present some issues, however. Check-ins into the repository were not atomic and thus caused problems in team environments. Additionally, the file system–based approach could lead to instabilities in the repository, which gave VSS a reputation for sometimes corrupting the repository. Table 5.2 shows a contrast between the strengths and weaknesses of VSS.

Table 5.2 Strengths and Weaknesses of VSS

Strengths

Weaknesses

VSS is easy to install and use.

This is an aging product; no longer actively developed.

VSS has broad support in developer tools.

It does not perform well over wide area networks (WANs).

VSS has wide adoption in the industry.

There are no atomic check-in transactions.

It has very limited branch support (through sharing features).

Team Foundation Server is seen as Microsoft's replacement product for VSS. But Team Foundation Server also addresses far more scenarios (for example, work item tracking, reporting, team builds) for which VSS was never intended.

Apache Subversion

Subversion (SVN) is an open source version control project founded by CollabNet in 2000. SVN became a top-level project in the Apache Foundation in 2010. The system was originally designed to be a successor to the older open source CVS version control project. Since that time, it has surpassed the CVS market share and expanded beyond the original goal of replacing CVS. However, SVN is still heavily influenced by that design and should be familiar to CVS users.

While SVN has a large market share today, it is being challenged by distributed version control systems, most notably Git, in the open source space. But development of SVN is still continuing, and features continue to be added. Table 5.3 shows a contrast between the strengths and weaknesses of SVN.

Table 5.3 Strengths and Weaknesses of SVN

Strengths

Weaknesses

SVN works under an open source licensing model (free to use).

Like CVS, SVN makes use of .svn directories inside the source folders to store the state of the local working copy and to allow synchronization with the server. However, it can have the effect of polluting the local source tree and can cause performance issues with very large projects or files.

SVN is in wide use by open source projects (but it is declining in favor of Git).

Renames are handled as a copy-and-delete operation in the repository, which can cause problems when merging branches.

The server works on a variety of operating systems.

Configuring authentication and performing certain administration functionality can be challenging in a Windows environment.

SVN provides broad support with developer tools on all platforms.

There is no shelving functionality.

Team Foundation Version Control

First publicly released in 2006, Microsoft Visual Studio Team Foundation Server is the reason you are reading this book, and so, by the end of this book, you will be very familiar with its functionality. Chapter 6 provides more information on the version control capabilities. However, it is worth highlighting the strengths and weaknesses of Team Foundation Version Control in this context, as shown in Table 5.4.

Table 5.4 Strengths and Weaknesses of Team Foundation Version Control

Strengths

Weaknesses

It is more than just version control and provides tight integration with the work item tracking, build, and reporting capabilities of the product.

Offline support and support for occasionally connected developers is significantly improved on previous releases of Team Foundation Server, but centralized version control tools such as TFS and SVN will never be as strong at offline support as a Distributed Version Control tool such as Git.

It has first-class Visual Studio and Eclipse integration provided by the same vendor who provides the server.

A centralized server must be set up to allow check-in of code and collaboration of team members. However, you can have a centralized server set up quickly and easily for you at http://www.visualstudio.com.

It has many features appealing to enterprise-class customers, such as centralized security administration, integration with Active Directory for authentication, and single-sign-on (SSO), as well as SharePoint integration.

The server product runs only on Windows platforms, but a client is available cross-platform.

It is highly scalable.

Shelveset support allows you to store changes on the server without committing to the main code repository.

Check-in policies govern rules that the code should pass before you are able to commit it to the repository.

Gated check-in support allows a build automation run to pass before the code is committed to the main repository.

All data is stored in a SQL Server database for security and ease of backup.

Git in TFS

The Git version control system is a free, open source DVCS that was designed and developed in 2005 by Linus Torvalds to support the development of the Linux kernel. Like all distributed version control systems, it allows for each developer to maintain a complete copy of the source repository on his or her local machine and makes it easy to share commits and entire branches between team members.

There is a large body of support for Git in modern development environments from native integration, such as in Apple's Xcode IDE, to support through plug-ins, like the EGit Eclipse plug-in, to hybrid integration such as Visual Studio 2013 provides, where you can either use command-line Git or Team Explorer integrated Git. For more information, see Chapter 7.

As stated earlier, Team Foundation Server 2013 now natively supports Git as a version control repository. This allows development teams to have the flexibility to work in a distributed fashion with each team member managing local commits while still allowing the TFS server to house the repository that is the “source of truth.” This integration opens up the ability to link commits in Git to work items in TFS. It also allows Git branches to participate in automated builds. Let's look at the strengths and weaknesses of Git in TFS inTable 5.5.

Table 5.5 Strengths and Weaknesses of Git in TFS

Strengths

Weaknesses

It is more than just version control and provides tight integration with the work item tracking, build, and reporting capabilities of the product.

The server product runs only on Windows platforms, but a client is available cross-platform.

Strong support for offline and occasionally connected development patterns with local repositories.

Does not have the ability to create shelvesets.

Makes merging of changes between branches and repositories much easier.

Gated check-in support is not available.

It has many features that appeal to enterprise-class customers, such as centralized security administration, integration with Active Directory for authentication, and single-sign-on (SSO), as well as SharePoint integration.

Does not have graphical support in Source Control Explorer, branch visualization, or changeset history tracking.

It is highly scalable.

“Source of truth” repository is defined only by convention.

All data is stored in a SQL Server database for security and ease of backup.

Security can only be set at the branch level on the server. No security control on local repositories.

Works with continuous integration automated builds.

Summary

This chapter introduced the basic concepts of version control and why it is needed. We then discussed the differences between centralized and distributed version control systems. You learned about some of the common version control tools in the market today and about their strengths and weaknesses.

Team Foundation Server is one of the leading tools in the market today. While it has some unique version control capabilities, and scales well from very small to very large teams, broadly speaking, when looking at the version control capabilities alone, it is comparable to most modern centralized version control systems in terms of feature sets, and with the addition of Git, it is a compelling alternative in the distributed version control area.

The key factor that makes many organizations choose to standardize on Team Foundation Server is the tight integration between work item tracking (which can include requirements, test cases, bugs, tasks, and so on), version control, build, and reporting features, all the way through the product. By closely binding your version control with your work item tracking, you get greater traceability. The intimate knowledge of the version control system by the build system gives rise to powerful build features, with no additional work by the administrators. The close link between builds and work items means that testers know which builds fix which bugs, what files in version control were affected, and which tests need to be re-run. It's the sum of the whole that really makes Team Foundation Server stand out from the competition.

As discussed, every version control system is different, and a developer's understanding of a version control system is key to effectively working with it. Chapter 6 delves deeper into the version control features offered by Team Foundation Server and how to work with them. The core concepts and tools will be discussed along with some help and advice in transitioning to Team Foundation Server from other version control systems.