Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Mastering C# and .NET Framework
Mastering C# and .NET Framework

Mastering C# and .NET Framework: .NET Under the hood

eBook
€8.99 €29.99
Paperback
€36.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Mastering C# and .NET Framework

Chapter 1. Inside the CLR

Since CLR is just a generic name for different tools and software based on well-known and accepted principles in computing, we'll begin with a review of some of the most important concepts of software programming that we often take for granted. So, to put things in context, this chapter reviews the most important concepts around the motivations for the creation of .NET, how this framework integrates with the Windows operating system, and what makes the so called CLR the excellent runtime it is.

In short, this chapter covers the following topics:

  • A brief, but carefully selected, dictionary of the common terms and concepts utilized in general and .NET programming
  • A rapid review of goals after the creation of .NET and the main architects behind its construction
  • Explanations of each of the main parts that compose the CLR, its tools, and how the tools work
  • A basic approach to the complexity of algorithms and how to measure it
  • A select list of the most outstanding characteristics related to the CLR that appeared in recent versions

An annotated reminder of some important computing terms

Let's check out some important concepts widely used in software construction that show up frequently in .NET programming.

Context

As Wikipedia states:

In computer science, a task context is the minimal set of data used by a task (which may be a process or thread) that must be saved to allow a task interruption at a given date, and a continuation of this task at the point it has been interrupted and at an arbitrary future date.

In other words, context is a term related to the data handled by a thread. Such data is conveniently stored and recovered by the system as required.

Practical approaches to this concept include HTTP request/response and database scenarios in which the context plays a very important role.

The OS multitask execution model

A CPU is able to manage multiple processes in a period of time. As we mentioned, this is achieved by saving and restoring (in an extremely fast manner) the context of execution with a technique called context switch.

When a thread ceases to execute, it is said to be in the Idle state. This categorization might be useful at the time of analyzing processes execution with the tools that are able to isolate threads in the Idle state:

The OS multitask execution model

Context types

In some languages, such as C#, we also find the concept of safe or secure context. In a way, this relates to the so-called thread safety.

Thread safety

A piece of code is said to be thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time. There are various strategies used in order to create thread-safe data structures, and the .NET framework is very careful about this concept and its implementations.

Actually, most of the MSDN (the official documentation) includes the indication this type is thread-safe at the bottom for those to whom it is applicable (a vast majority).

State

The state of a computer program is a technical term for all the stored information, at a given instant in time, to which the program has access. The output of a computer program at any time is completely determined by its current inputs and its state. A very important variant of this concept is the program's state.

Program state

This concept is especially important, and it has several meanings. We know that a computer program stores data in variables, which are just labeled storage locations in the computer's memory. The contents of these memory locations, at any given point in the program's execution, are called the program's state.

In object-oriented languages, it is said that a class defines its state through fields, and the values that these fields have at a given moment of execution determine the state of that object. Although it's not mandatory, it's considered a good practice in OOP programming when the methods of a class have the sole purpose of preserving the coherence and logic of its state and nothing else.

In addition, a common taxonomy of programming languages establishes two categories: imperative and declarative programming. C# or Java are examples of the former, and HTML is a typical declarative syntax (since it's not a language itself). Well, in declarative programming, sentences tend to change the state of the program while using the declarative paradigm, languages indicate only the desired result, with no specifications about how the engine will manage to obtain the results.

Serialization

Serialization is the process of translating data structures or the object state into a format that can be stored (for example, in a file or a memory buffer) or transmitted across a network connection and reconstructed later in the same or another computer environment.

So, we used to say that serializing an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object. Popular text formats emerged years ago and are now well known and accepted, such as XML and JSON, independently of other previous formats (binary included):

Serialization

Process

The OS fragments operations among several functional units. This is done by allocating different memory areas for each unit in execution. It's important to distinguish between processes and threads.

Each process is given a set of resources by the OS, which—in Windows—means that a process will have its own virtual address space allocated and managed accordingly. When Windows initializes a process, it is actually establishing a context of execution, which implies a process environment block, also known as PEB and a data structure. However, let's make this clear: the OS doesn't execute processes; it only establishes the execution context.

Thread

A thread is the functional (or working) unit of a process. And that is what the OS executes. Thus, a single process might have several threads of execution, which is something that happens very often. Each thread has its own address space within the resources previously allocated by the creation of the process. These resources are shared by all threads linked to the process:

Thread

It's important to recall that a thread only belongs to a single process, thus having access to only the resources defined by that process. When using the tools that will be suggested now, we can look at multiple threads executing concurrently (which means that they start working in an independent manner) and share resources, such as memory and data.

Different processes do not share these resources. In particular, the threads of a process share its instructions (the executable code) and its context (the values of its variables at any given moment).

Programming languages such as .NET languages, Java, or Python expose threading to the developer while abstracting the platform-specific differences in threading implementations at runtime.

Tip

Note that communication between threads is possible through the common set of resources initialized by the process creation.

Of course, there is much more written about these two concepts, which go far beyond the scope of this book (refer to Wikipedia, https://en.wikipedia.org/wiki/Thread_(computing), for more details), but the system provides us with mechanisms to check the execution of any process and also check what the threads in execution are.

If you are curious about it or just need to check whether something is going wrong, there are two main tools that I recommend: the Task Manager (included in the operating system, which you'll probably know), and—even better—one of the tools designed by the distinguished engineer and technical fellow Mark Russinowitch, available for free and composed of a set of more than 50 utilities.

Some have a Windows interface and others are console utilities, but all of them are highly optimized and configurable to monitoring and controlling the inner aspects of our operating system at any moment. They are available for free at https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx.

If you don't want to install anything else, open Task Manager (just right-click on the task bar to access it) and select the Details tab. You will see a more detailed description of every process, the amount of CPU used by each process, the memory allocated for each process, and so on. You can even right-click on one of the processes and see how there is a context menu that offers a few possibilities, including launching a new dialog window that shows some properties related to it:

Thread

SysInternals

If you really want to know how a process behaves in its entirety, the tools to use are SysInternals. If you go to the link indicated earlier, you'll see an item menu especially dedicated to process utilities. There, you have several choices to work with, but the most comprehensive are Process Explorer and Process Monitor.

Process Explorer and Process Monitor don't require installation (they're written in C++), so you can execute them directly from any device for a Windows platform.

For example, if you run Process Explorer, you'll see a fully detailed window showing every single detail of all the processes currently active in the system.

With Process Explorer, you can find out what files, registry keys, and other objects processes have opened, together with the DLLs they have loaded, who owns each process, and so on. Every thread is visible and the tool provides you with detailed information, available through a very intuitive user interface:

SysInternals

It's also very useful to check the system's general behavior at real time, since it creates graphics of activities of CPU usage, I/O, Memory, among others, as shown in the following screenshot:

SysInternals

In a similar way, Process Monitor, focuses on monitoring the filesystem, the Registry, and all processes and threads with their activities in real time, since it actually is a mixture of two previous utilities merged together: FileMon (File Monitor) and RegMon (Registry Monitor), which are not available anymore.

If you try out PM, you'll see some of the information included in PE, plus the specific information provided by PM—just conveyed in a different manner.

Static versus dynamic memory

When a program starts execution, the OS assigns a process to it by means of scheduling: the method by which work specified by some means is assigned to resources that complete the work. This means that the resources for the process are assigned, and that implies memory allocation.

As we'll see, there are mainly two types of memory allocation:

  • Fixed memory (linked to the stack), determined at compile time. Local variables are declared and used in the stack. Note that it is a contiguous block of memory allocated when the process resources are initially assigned. The allocation mechanism is very fast (although the access not so much).
  • The other is dynamic memory (the heap), which can grow as the program required it, and it's assigned at runtime. This is the place where instance variables are allocated (those that point to an instance of a class or object).

Usually, the first type is calculated at compile time since the compiler knows how much memory will be needed to allocate the variables declared depending on its type (int, double, and so on). They are declared inside functions with a syntax such as int x = 1;

The second type requires the new operator to be invoked. Let's say there is a class named Book in our code, we create an instance of such Book with an expression of this type:

Book myBook = new Book();

This instructs the runtime to allocate enough space in the heap to hold an instance of that type along with its fields; the state of the class is allocated in the heap. This means that the whole state of a program will store its state in a different memory (and, optionally, disk) locations.

Of course, there are more aspects to account for, which we'll cover in the The Stack and the Heap section in this chapter. Luckily, the IDE lets us watch and analyze all these aspects (and many more) at debug time, offering an extraordinary debugging experience.

Garbage collector

Garbage collection (GC) is a form of automatic memory management. The GC in .NET, attempts to reclaim garbage or the memory occupied by objects that are no longer in use by the program. Going back to the previous code declaration of Book, when there are no references to the Book object in the stack, the GC will reclaim that space to the system, liberating memory (it's a bit more complex, in fact, and I'll get into further detail later in this chapter—when we talk about memory management—but let's put it that way for the moment).

It's important to note that garbage collectors are not something exclusive to the .NET platform. Actually, you can find it in all platforms and programs even if you're dealing with browsers. Current JavaScript engines, for instance, such as Chrome's V8, Microsoft's Chakra—and others—use a garbage collection mechanism as well.

Concurrent computing

Concurrency or concurrent computing is a very common concept nowadays, and we'll discover it at several instances along this book. The official definition in Wikipedia (https://en.wikipedia.org/wiki/Concurrent_computing) says:

"Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process"). A concurrent system is one where a computation can advance without waiting for all other computations to complete; where more than one computation can advance at the same time."

Parallel computing

Parallel computing is a type of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved at the same time. .NET offers several variants of this type of computing, which we'll cover over the next few chapters:

Parallel computing

Imperative programming

Imperative programming is a programming paradigm that describes computation in terms of the program's state. C#, JavaScript, Java, or C++ are typical examples of imperative languages.

Declarative programming

In contrast to imperative programming, languages considered declarative describe only the desired results without explicitly listing commands or steps that must be performed. Many markup languages, such as HTML, XAML, or XSLT, fall into this category.

The evolution of .NET

Until the arrival of .NET, the Microsoft programming ecosystem had been ruled by a few classic languages, Visual Basic and C++ (with the Microsoft Foundation classes) being typical examples of this.

Note

Also known as MFC, Microsoft Foundation Classes is a library that wraps portions of the Windows API in C++ classes, including functionalities that enable them to use a default application framework. Classes are defined for many of the handle-managed Windows objects and also for predefined windows and common controls. It was introduced in 1992 with Microsoft's C/C++ 7.0 compiler for use with 16-bit versions of Windows as an extremely thin object-oriented C++ wrapper for the Windows API.

However, the big changes proposed by .NET were started using a totally different component model approach. Up until 2002, when .NET officially appeared, such a component model was COM (Component Object Model), introduced by the company in 1993. COM is the basis for several other Microsoft technologies and frameworks, including OLE, OLE automation, ActiveX, COM+, DCOM, the Windows shell, DirectX, UMDF (User-Mode Driver Framework), and Windows runtime.

Note

A device-driver development platform (Windows Driver Development Kit) first introduced with Microsoft's Windows Vista operating system is also available for Windows XP. It facilitates the creation of drivers for certain classes of devices.

At the time of writing this, COM is a competitor with another specification named CORBA (Common Object Request Broker Architecture), a standard defined by the Object Management Group (OMG), designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between systems on different operating systems, programming languages, and computing hardware. In its life cycle, it has received a lot of criticism, mainly because of poor implementations of the standard.

.NET as a reaction to the Java World

In 1995, a new model was conceived to supersede COM and the unwanted effects related to it, especially versions and the use of the Windows Registry on which COM depends to define accessible interfaces or contracts; a corruption or modified fragment of the registry could indicate that a component was not accessible at runtime. Also, in order to install applications, elevated permissions were required, since the Windows Registry is a sensible part of the system.

A year later, various quarters of Microsoft started making contacts with some of the most distinguished software engineers, and these contacts remained active over the years. These included architects such as Anders Hejlsberg (who became the main author of C# and the principal architect of .NET framework), Jean Paoli (one of the signatures in the XML Standard and the former ideologist of AJAX technologies), Don Box (who participated in the creation of SOAP and XML Schemas), Stan Lippman (one of the fathers of C++, who was working at the time at Disney), Don Syme (the architect for generics and the principal author of the F# language), and so on.

The purpose of this project was to create a new execution platform, free from the caveats of COM and one that was able to hold a set of languages to execute in a secure and extensible manner. The new platform should be able to program and integrate the new world of web services, which had just appeared—based on XML—along with other technologies. The initial name of the new proposal was Next Generation Windows Services (NGWS).

By late 2000, the first betas of .NET framework were released, and the first version appeared on February 13, 2002. Since then, .NET has been always aligned with new versions of the IDE (Visual Studio). The current version of the classic .NET framework at the time of writing this is 4.6.1, but we will get into more detail on this later in the chapter.

An alternative .NET appeared in 2015 for the first time. In the //BUILD/ event, Microsoft announced the creation and availability of another version of .NET, called .NET Core.

The open source movement and .NET Core

Part of an idea for the open source movement and .NET Core comes from a deep change in the way software creation and availability is conceived in Redmond nowadays. When Satya Nadella took over as the CEO at Microsoft, they clearly shifted to a new mantra: mobile-first, cloud-first. They also redefined themselves as a company of software and services.

This meant embracing the open source idea with all its consequences. As a result, a lot of the NET Framework has already been opened to the community, and this movement will continue until the whole platform is opened, some say. Besides, a second purpose (clearly stated several times at the //BUILD/ event) was to create a programming ecosystem powerful enough to allow anyone to program any type of application for any platform or device. So, they started to support Mac OX and Linux as well as several tools to build applications for Android and iOS.

However, the implications run deeper. If you want to build applications for Mac OS and Linux, you need a different Common Language Runtime (CLR) that is able to execute in these platforms without losing out on performance. This is where .NET Core comes into play.

At the time writing this, Microsoft has published several (ambitious) improvements to the .NET ecosystem, mainly based on two different flavors of .NET:

The open source movement and .NET Core

The first one is the version that was last available—.NET (.NET framework 4.6.x)—and the second one is the new version, intended to allow compilations that are valid not only for Windows platforms, but also for Linux and Mac OSes.

NET Core is the generic name for a new open source version of the CLR made available in 2015 (updated last November to version 1.1) intended to support multiple flexible .NET implementations. In addition, the team is working on something called .NET Native, which compiles to native code in every destination platform.

However, let's keep on going with the main concepts behind the CLR, from a version-independent point of view.

Note

The whole project is available on GitHub at https://github.com/dotnet/coreclr.

Common Language Runtime

To address some of the problems of COM and introduce the bunch of new capabilities that were requested as part of the new platform, a team at Microsoft started to evolve prior ideas (and the names associated with the platform as well). So, the framework was soon renamed to Component Object Runtime (COR) prior to the first public beta, when it was finally given the name of Common Language Runtime in order to drive the fact that the new platform was not associated with a single language.

Actually, there are dozens of compilers available for use with the .NET framework, and all of them generate a type intermediate code, which—in turn—is converted into native code at execution time, as shown in the following figure:

Common Language Runtime

The CLR, as well as COM, focuses on contracts between components, and these contracts are based on types, but that's where the similarities end. Unlike COM, the CLR establishes a well-defined form to specify contracts, which is generally known as metadata.

Also, the CLR includes the possibility of reading metadata without any knowledge of the underlying file format. Furthermore, such metadata is extensible by means of custom attributes, which are strongly typed themselves. Other interesting information included in the metadata includes the version information (remember, there should be no dependencies of the Registry) and component dependencies.

Besides, for any component (called assembly), the presence of metadata is mandatory, which means that it's not possible to deploy the access of a component without reading its metadata. In the initial versions, implementations of security were mainly based on some evidence included in the metadata. Furthermore, such metadata is available for any other program inside or outside the CLR through a process called Reflection.

Another important difference is that .NET contracts, above all, describe the logical structure of types. There are no in-memory representations, reading order sequences, alignment or parameter conventions, among other things, as Don Box explains in detail in his magnificent Essential .NET (http://www.amazon.com/Essential-NET-Volume-Language-Runtime/dp/0201734117).

Common Intermediate Language

The way these previous conventions and protocols are resolved in CLR is by means of a technique called contract virtualization. This implies that most of the code (if not all) written for the CLR doesn't contain the machine code but an intermediate language called Common Intermediate Language (CIL), or just Intermediate Language (IL).

CLR never executes CIL directly. Instead, CIL is always translated into native machine code prior to its execution by means of a technique called JIT (Just-In-Time) compilation. This is to say that the JIT process always adapts the resulting executable code to the destination machine (independent from the developer). There are several modes of performing a JIT process, and we'll look at them in more detail later in this chapter.

Thus, CLR is what we might call a type-centered framework. For CLR, everything is a type, an object, or a value.

Managed execution

Another critical factor in the behavior of CLR is the fact that programmers are encouraged to forget about the explicit management of memory and the manual management of threads (especially associated with languages such as C and C++) to adopt the new way of execution that the CLR proposes: managed execution.

Under managed execution, CLR has complete knowledge of everything that happens in its execution context. This includes every variable, method, type, event, and so on. This encourages and fosters productivity and eases the path to debugging in many ways.

Managed execution

Additionally, CLR supports the creation of runtime code (or generative programming) by means of a utility called CodeDOM. With this feature, you can emit code in different languages and compile it (and execute it) directly in the memory.

All this drives us to the next logical questions: which languages are available to be used with this infrastructure, which are the common points among them, how is the resulting code assembled and prepared for execution, what are the units of stored information (as I said, they're called assemblies), and finally, how is all this information organized and structured into one of these assemblies?

Components and languages

Every execution environment has a notion of software components. For CLR, such components must be written in a CLI-compliant language and compiled accordingly. You can read a list of CLI languages on Wikipedia. But the question is what is a CLI-compliant language?

CLI stands for Common Language Infrastructure, and it's a software specification standardized by ISO and ECMA describing the executable code and a runtime environment that allows multiple high-level languages to be used on different computer platforms without being rewritten for specific architectures. The .NET framework and the free and open source Mono are implementations of CLI.

Note

Note that the official sites for these terms and entities are as follows:

ISO: http://www.iso.org/iso/home.html

ECMA: http://www.ecma-international.org/

MONO: http://www.mono-project.com/

CLI languages: https://en.wikipedia.org/wiki/List_of_CLI_languages

The most relevant points in the CLI would be as follows (according to Wikipedia):

  • First, to substitute COM, metadata is key and provides information on the architecture of assemblies, such as a menu or an index of what you can find inside. Since it doesn't depend on the language, any program can read this information.
  • That established, there should be a common set of rules to comply in terms of data types and operations. This is the Common Type System (CTS). All languages that adhere to CTS can work with a set of rules.
  • For minimal interoperation between languages, there is another set of rules, and this should be common to all programming languages in this group, so a DLL made with one language and then compiled can be used by another DLL compiled in a different CTS language, for example.
  • Finally, we have a Virtual Execution System, which is responsible for running this application and many other tasks, such as managing the memory requested by the program, organizing execution blocks, and so on.

With all this in mind, when we use a .NET compiler (from now on, compiler), we generate a byte stream, usually stored as a file in the local filesystem or on a web server.

Structure of an assembly file

Files generated by a compilation process are called assemblies, and any assembly follows the basic rules of any other executable file in Windows and adds a few extensions and information suitable and mandatory for the execution in a managed environment.

In short, we understand that an assembly is just a set of modules containing the IL code and metadata, which serve as the primary unit of a software component in CLI. Security, versioning, type resolution, processes (application domains), and so on, all work on a per-assembly basis.

The significance of this implies changes in the structure of executable files. This leads to a new file architecture represented in the following figure:

Structure of an assembly file

Note that a PE file is one that conforms to the Portable/Executable format: a file format for executables, object code, DLLs, FON (Font) files, and others used in 32-bit and 64-bit versions of Windows operating systems. It was first introduced by Microsoft in Windows NT 3.1, and all later versions of Windows support this file structure.

This is why we find a PE/COFF header in the format, which contains compatible information required by the system. However, from the point of view of a .NET programmer, what really matters is that an assembly holds three main areas: the CLR header, the IL code, and a section with resources (Native Image Section in the figure).

Tip

A detailed description of the PE format is available at http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx.

Program execution

Among the libraries linked with CLR, we found a few responsible for loading assemblies in the memory and starting and initializing the execution context. They're generally referenced as CLR Loader. Together with some other utilities, they provide the following:

  • Automatic memory management
  • Use of garbage collector
  • Metadata access to find information on types
  • Loading modules
  • Analyzing managed libraries and programs
  • A robust exception management subsystem to enable programs to communicate and respond to failures in structured ways
  • Native and legacy code interoperability
  • A JIT compilation of managed code into native code
  • A sophisticated security infrastructure

This loader uses OS services to facilitate the loading, compilation, and execution of an assembly. As we've mentioned previously, CLR serves as an execution abstraction for .NET languages. To achieve this, it uses a set of DLLs, which acts as a middle layer between the OS and the application program. Remember that CLR itself is a collection of DLLs, and these DLLs work together to define the virtual execution environment. The most relevant ones are as follows:

  • mscoree.dll (sometimes called shim because it is simply a facade in front of the actual DLLs that the CLR comprises)
  • clr.dll
  • mscorsvr.dll (multiprocessor) or mscorwks.dll (uniprocessor)

In practice, one of the main roles of mscoree.dll is to select the appropriate build (uniprocessor or multiprocessor) based on any number of factors, including (but not limited to) the underlying hardware.

The clr.dll is the real manager, and the rest are utilities for different purposes. This library is the only one of the CLRs that is located at $System.Root$, as we can find through a simple search:

Program execution

My system is showing two versions (there are some more), each one ready to launch programs compiled for 32-bit or 64-bit versions. The rest of the DLLs are located at another place: a secure set of directories generally called Global Assembly Cache (GAC).

Actually, the latest edition of Windows 10 installs files for all versions of such GAC, corresponding to versions 1.0, 1.1, 2.0, 3.0, 3.5, and 4.0, although several are just placeholders with minimum information, and we only find complete versions of .NET 2.0, .NET 3.5 (only partially), and .NET 4.0.

Also, note that these placeholders (for the versions not fully installed) admit further installations if some old software requires them to. This is to say that the execution of a .NET program relies on the version indicated in its metadata and nothing else.

You can check which versions of .NET are installed in a system using the CLRver.exe utility, as shown in the following figure:

Program execution

Internally, several operations take place before execution. When we launch a .NET program, we'll proceed just as usual, as if it were just another standard executable of Windows.

Behind the scenes, the system will read the header in which it will be instructed to launch mscore.dll, which—in turn—will start the whole running process in a managed environment. Here, we'll omit all the intricacies inherent to this process since it goes far beyond the scope of this book.

Metadata

We've mentioned that the key aspect of the new programming model is the heavy reliance on metadata. Furthermore, the ability to reflect against metadata enables programming techniques in which programs are generated by other programs, not humans, and this is where CodeDOM comes into play.

We'll cover some aspects of CodeDOM and its usages when dealing with the language, and we'll look at how the IDE itself uses this feature frequently every time it creates source code from a template.

In order to help the CLR find the various pieces of an assembly, every assembly has exactly one module whose metadata contains the assembly manifest: an additional piece of CLR metadata that acts as a directory of adjunct files that contain additional type definitions and code. Furthermore, CLR can directly load modules that contain an assembly manifest.

So, what is the aspect of a manifest in a real program and how can we examine its content? Fortunately, we have a bunch of .NET utilities (which, technically, don't belong to CLR but to the .NET framework ecosystem) that allow us to visualize this information easily.

Introducing metadata with a basic Hello World

Let's build a typical Hello World program and analyze its contents once it is compiled so that we can inspect how it's converted into Intermediate Language (IL) and where the meta-information that we're talking about is.

Along the course of this book, I'll use Visual Studio 2015 Community Edition Update 1 (or higher if an updated version appears) for reasons that I'll explain later. You can install it for free; it's a fully capable version with tons of project types, utilities, and so on.

Note

Visual Studio 2015 CE update 1 is available at https://www.visualstudio.com/vs/community/.

The only requirement is to register for free in order to get a developer's license that Microsoft uses for statistical purposes—that's all.

After launching Visual Studio, in the main menu, select New Project and go to the Visual C# templates, where the IDE offers several project types, and select a console application, as shown in the following screenshot:

Introducing metadata with a basic Hello World

Visual Studio will create a basic code structure composed of several references to libraries (more about that later) as well as a namespace block that includes the program class. Inside that class, we will find an application entry point in a fashion similar to what we would find in C++ or Java languages.

To produce some kind of output, we're going to use two static methods of the Console class: WriteLine, which outputs a string adding a carriage return, and ReadLine, which forces the program to stop until the user introduces a character and presses the return key so that we can see the output that is produced.

After cleaning these references that we're not going to use, and including the couple of sentences mentioned previously, the code will look like this:

using System;
namespace ConsoleApplication1
{
  class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Hello! I'm executing in the CLR context.");
      Console.ReadLine();
    }
  }
}

To test it, we just have to press F5 or the Start button and we'll see the corresponding output (nothing amazing, so we're not including the capture).

At the time of editing the code, you will have noticed several useful characteristics of the IDE's editor: the colorizing of sentences (distinguishing the different purposes: classes, methods, arguments, literals, and so on); IntelliSense, which offers what makes sense to write for every class' member; Tooltips, indicating every return type for methods; the value type for literals or constants; and the number of references made to every member of the program that could be found in your code.

Technically, there are hundreds of other useful features, but that's something we will have the chance to test starting from the next chapter, when we get into the C# aspects and discover how to prove them.

As for this little program, it's a bit more interesting to check what produced such output, which we'll find in the Bin/Debug folder of our project. (Remember to press the Show all files button at the head of Solution Explorer, by the way):

Introducing metadata with a basic Hello World

As we can see, two executables are generated. The first one is the standalone executable that you can launch directly from its folder. The other, with the .vshost prefix before the extension, is the one Visual Studio uses at debug time and that contains some extra information required by the IDE. Both produce the same results.

Once we have an executable, it is time to link the .NET tool – that will let us view the metadata that we're talking about – to Visual Studio.

To do this, we go to the Tools | External Tools option in the main menu, and we'll see a configuration dialog window, presenting several (and already tuned) external tools available; press the New button and change the title to IL Disassembler, as shown in the following screenshot:

Introducing metadata with a basic Hello World

Next, we need to configure the arguments that we're going to pass to the new entry: the name of the tool and the required parameters.

You'll notice that there are several versions of this tool. These depend on your machine.

For our purposes, it will suffice to include the following information:

  • The root of the tool (named ILDASM.exe, and located in my machine at C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6.1 Tools)
  • The path of the executable generated, for which I'm using a predefined macro expressed by $targetpath

Given that our program is already compiled, we can go back to the Tools menu and find a new entry for IL Disassembler. Once launched, a window will appear, showing the IL code of our program, plus a reference called Manifest (which shows the metadata), and we can also double-click to show another window with this information, as shown in the following screenshot:

Introducing metadata with a basic Hello World

Note

Note that I've modified ILDASM's font size for clarity.

The information included in the manifest comes from two sources: the IDE itself, configured to prepare the assembly for execution (we can view most of the lines if we take a more detailed look at the window's content), and customizable information that we can embed in the executable's manifest, such as descriptions, the assembly title, the company information, trademark, culture, and so on. We'll explore how to configure that information in the next chapter.

In the same manner, we can keep on analyzing the contents of every single node shown in the main ILDASM window. For instance, if we want to see the IL code linked to our Main entry point, the tool will show us another window where we can appreciate the aspect of the IL code (note the presence of the text cil managed next to the declaration of main):

Introducing metadata with a basic Hello World

As I pointed out in the screenshot, entries with the prefix IL_ will be converted to the machine code at execution time. Note the resemblance of these instructions with the Assembly language.

Also, keep in mind that this concept has not changed since the first version of .NET: main concepts and procedures to generate CIL and machine code are, basically, the same as they used to be.

PreJIT, JIT, EconoJIT, and RyuJIT

I have already mentioned that the process of converting this IL code into machine code is undertaken by another piece of the .NET framework, generically known as Just-In-Time Compiler (JIT). However, since the very beginning of .NET, this process can be executed in at least three different ways, which is why there are three JIT-suffixed names.

To simplify the details of these processes, we'll say that the default method of compilation (and the preferred one in general terms) is the JIT compilation (let's call it Normal JIT):

  • In the Normal JIT mode, the code is compiled as required (on demand) and not thrown away but cached for a later use. In this fashion, as the application keeps on running, any code required for execution at a later time that is already compiled is just retrieved from the cached area. The process is highly optimized and the performance penalty is negligible.
  • In the PreJIT mode, .NET operates in a different manner. To operate using PreJIT, you need a utility called ngen.exe (which stands for native generation) to produce native machine code previous to the first execution. The code is then converted and .exe is rewritten into the machine code, which gives some optimization, especially at start time.
  • As for the EconoJIT mode, it's used mainly in applications deployed for low-memory devices, such as mobiles, and it's pretty similar to NormalJIT with the difference that the compiled code is not cached in order to save memory.

In 2015, Microsoft continued to develop a special project called Roslyn, which is a set of tools and services to provide extra functionalities to the process of code management, compilation, and deployment, among others. In connection with this project (which will be treated in depth in Chapter 4, Comparing Approaches for Programming), another JIT appeared, called RyuJIT, which has been made available since the beginning as an open source project and is now included in the latest version of V. Studio by default (remember, V. Studio 2015 Update 1).

Now, let me quote what the .NET team says about their new compiler:

"RyuJIT is a new, next-generation x64 compiler twice as fast as the one before, meaning apps compiled with RyuJIT start up to 30% faster (Time spent in the JIT compiler is only one component of startup time, so the app doesn't start twice as fast just because the JIT is twice as fast.) Moreover, the new JIT still produces great code that runs efficiently throughout the long run of a server process.

This graph compares the compile time ("throughput") ratio of JIT64 to RyuJIT on a variety of code samples. Each line shows the multiple of how much faster RyuJIT is than JIT64, so higher numbers are better."

PreJIT, JIT, EconoJIT, and RyuJIT

They finish by saying that RyuJIT will be the basis for all their JIT compilers in the future: x86, ARM, MDIL, and whatever else comes along.

Common Type System

In the .NET framework, the Common Type System (CTS) is the set of rules and specifications established to define, use, and manage the data types used by any .NET application in a language-independent manner.

We must understand that types are the building blocks of any CLR program. Programming languages such as C#, F#, and VB.NET have several constructs for expressing types (for example, classes, structs, enums, and so on), but ultimately, all of these constructs map down to a CLR type definition.

Also, note that a type can declare private and non-private members. The latter form, sometimes known as the contract of the type (since it exposes the usable part of that type), is what we can access by programming techniques. This is the reason why we highlighted the importance of metadata in the CLR.

The common type system is much broader than what most programming languages can handle. In addition to the CTS, a subdivision named CLI selects a subset of the CTS that all languages compatible with CLI must endure. This subset is called Common Language Specification (CLS), and component writers are recommended to make their components' functionalities accessible through CLS-compatible types and members.

Naming conventions, rules, and type access modes

As for the naming rules for a type, this is what applies: any CLR type name has three parts: the assembly name, an optional namespace prefix, and a local name. In the previous example, ConsoleApplication1 was the assembly name, and it was the same as the namespace (but we could have changed it without problems). Program was the name of the only type available, which happened to be a class in this case. So, the whole name of this class was ConsoleApplication1.ConsoleApplication1.Program.

Namespaces are optional prefixes that help us define logical divisions in our code. Their purpose is to avoid confusion and the eventual overriding of members as well as allowing a more organized distribution of the application's code.

For example, in a typical application (not the demo shown earlier), a namespace would describe the whole solution, which might be separated into domains (different areas in which the application is divided, and they sometimes correspond to individual projects in the solution), and every domain would most likely contain several classes, and each class would contain several members. When you're dealing with solutions that hold, for instance, 50 projects, such logical divisions are very helpful in order to keep things under control.

As for the way that a member of a type can be accessed, each member manages how it can be used as well as how the type works. So, each member has its own access modifier (for example, private, public, or protected) that controls how it should be reached and whether that member is visible to other members. If we don't specify any access modifier, it is assumed that it is private.

Besides, you can establish whether an instance of the type is required to reference a member, or you can just reference such a member by its whole name without having to call the constructor and get an instance of the type. In such cases, we prefix the declaration of these members with the static keyword.

Members of a type

Basically, a type admits three kinds of members: fields, methods, and nested types. By nested type, we understand just another type that is included as part of the implementation of the declaring type. All other type members (for example, properties and events) are simply methods that have been extended with additional metadata.

I know, you might be thinking, so, properties are methods? Well, yes; once compiled, the resulting code turns into methods. They convert into name_of_class.set_method(value) and name_of_class.get_method() methods in charge of assigning or reading the values linked to the method's name.

Let's review this with a very simple class that defines a couple of methods:

class SimpleClass
{
  public string data { get; set; }
  public int num { get; set; }
}

Well, once compiled, we can check out the resulting IL code using IL dissasembler as we did earlier, obtaining the following view:

Members of a type

As we can see, the compiler declares data and num as instances of the string and int classes, respectively, and it defines the corresponding methods to access these properties.

How does the CLR manage the memory space occupied by a type at runtime? If you remember, we highlighted the importance of the concept of state at the beginning of this chapter. The significance is clear here: the kind of members defined in the type will determine the memory allocation required.

Also, the CLR will guarantee that these members are initialized to their default values in case we indicate it in the declaring sentences: for numeric types, the default value is zero; for Boolean types, it's false, and for object references, the value is null.

We can also categorize types depending on their memory allocation: value types are stored in the stack, while reference types will use the heap. A deeper explanation of this will be provided in the next chapter, since the new abilities of Visual Studio 2015 allow us to analyze everything that happens at runtime in great detail with our code under a bunch of different points of view.

A quick tip on the execution and memory analysis of an assembly in Visual Studio 2015

All the concepts reviewed up until here are directly available using the new debugging tools, as shown in the following screenshot, which displays the execution threads of the previous program stopped in a breakpoint:

A quick tip on the execution and memory analysis of an assembly in Visual Studio 2015

Note the different icons and columns of the information provided by the tool. We can distinguish known and unknown threads, if they are named (or not), their location, and even ThreadID, which we can use in conjunction with SysInternals tools if we need some extra information that's not included here:

A quick tip on the execution and memory analysis of an assembly in Visual Studio 2015

The same features are available for memory analysis. It even goes beyond the runtime periods, since the IDE is able to capture and categorize the usage of the memory required by the runtime in the application execution and keep it ready for us if we take a snapshot of the managed memory.

In this way, we can review it further and check out the possible bottlenecks and memory leaks. The preceding screenshot shows the managed memory used by the previous application at runtime.

A review of the capabilities of debugging found in Visual Studio 2015 will be covered in depth along the different chapters in this book, since there are many different scenarios in which an aid like this will be helpful and clear.

The stack and the heap

A quick reminder of these two concepts might be helpful since it transcends the .NET framework, and it's something that's common to many languages and platforms.

To start with, let's remember a few concepts related to processes that we saw at the beginning of this chapter: when a program starts execution, it initializes resources based on the metadata that the CLR reads from the assembly's manifest (as shown in the figure given in the The structure of an assembly file section). These resources will be shared with all the threads that such a process launches.

When we declare a variable, a space in the stack in allocated. So, let's start with the following code:

class Program
{
  static void Main(string[] args)
  {
    Book b;
    b.Title = "The C# Programming Language";
    Console.WriteLine(b.Title);
    Console.ReadLine();
  }
}

class Book
{
  public string Title;
  public int Pages;
}

If we try to compile this, we'll obtain a compilation error message indicating the use of non-assigned variable b. The reason is that in memory, we just have a declared variable and it's assigned to null, since we didn't instantiate b.

However, if we use the constructor of the class (the default one, since the class has no explicit constructor), changing the line to Book b = new Book();, then our code compiles and executes properly.

Therefore, the role of the new operator is crucial here. It indicates to the compiler that it has to allocate space for a new instance of the Book object, call the constructor, and—as we'll discover soon—initialize the object's fields to their default value types.

So, what's in the stack memory at the moment? Well, we just have a declaration called b, whose value is a memory address: exactly the address where StackAndHeap.Book is declared in the Heap (which I anticipate will be 0x2525910).

However, how in the world will I know this address and what's going on inside the execution context? Let's take a look at the inner workings of this small application as Visual Studio offers different debugging windows available in this version of the IDE. To do this, we'll mark a breakpoint in line 14, Console.ReadLine();, and relaunch the application so that it hits the breakpoint.

Once here, there's plenty of information available. In the Diagnostics Tools window (also new in this version of the IDE), we can watch the memory in use, the events, and the CPU usage. In the Memory Usage tab, we can take a snapshot of what's going on (actually, we can take several snapshots at different moments of execution and compare them).

Once the snapshot is ready, we'll look at the time elapsed, the size of objects, and the Heap size (along with some other options to improve the experience):

The stack and the heap

Note that we can choose to view the Heap sorted by the object size or the heap size. Also, if we choose one of these, a new window appears, showing every component actually in the execution context.

If we want to check exactly what our code is doing, we can filter by the name of the desired class (Book, in this case) in order to get an exclusive look at this object, its instances, the references to the object alive in the moment of execution, and a bunch of other details.

Of course, if we take a look at the Autos or Locals windows, we'll discover the actual values of these members as well:

The stack and the heap

As we can see in the Autos window, the object has initialized the remaining values (those not established by code) using the default value for that type (0 for integer values). This level of detail in the analysis of executables really helps in cases where bugs are fuzzy or only happen occasionally.

We can even see the actual memory location of every member by clicking on the StackAndHeap.Book entry:

The stack and the heap

Perhaps you're wondering, can we even see further? (I mean the actual assembly code produced by the execution context). The answer, again, is yes; we can right-click on the instance, select Add Watch, and we'll be adding an inspection point directly to that memory position, as shown in the following figure:

The stack and the heap

Of course, the assembly code is available as well, as long as we have enabled it by navigating to Tools | Options | Debugger in the IDE. Also, in this case, you should enable Enable Address Level Debugging in the same dialog box. After this, just go to Debug | Windows | Dissasembly, and you will be shown the window with the lowest level (executable) code marking the breakpoint, line numbers, and the translation of such code into the original C# statement:

The stack and the heap

What happens when the reference to the Book object is reassigned or nulled (and the program keeps going on)? The memory allocated for Book remains in the memory as an orphan, and it's then when garbage collector comes into play.

Garbage collection

Basically, garbage collection is the process of reclaiming memory from the system. Of course, this memory shouldn't be in use; that is, the space occupied by the objects allocated in Heap should not have any variable pointing to them in order to be cleared.

Among the numerous classes included in .NET framework, there's one that's specially dedicated to this process. This means that the garbage collection of objects is not just an automatic process undertaken by CLR but a true, executable object that can even be used in our code (GC is the name, by the way, and we will deal with it in some cases when we try to optimize execution in the other chapters).

Actually, we can see this in action in a number of ways. For example, let's say that we create a method that concatenates strings in a loop and doesn't do anything else with them; it just notifies the user when the process is finished:

static void GenerateStrings()
{
  string initialString = "Initial Data-";
  for (int i = 0; i < 5000; i++)
  {
    initialString += "-More data-";
  }
  Console.WriteLine("Strings generated");
}

There's something to remember here. Since strings are immutable (which means that they cannot be changed, of course), the process has to create new strings in every loop. This means a lot of memory that the process will use and that can be reclaimed since every new string has to be created anew, and the previous one is useless.

We can use CLR Profiler to see what happens in CLR when running this application. You can download CLR Profiler from http://clrprofiler.codeplex.com/, and once unzipped, you'll see two versions (32 and 64 bits) of the tool. This tool show us a more detailed set of statistics, which include GC interventions. Once launched, you'll see a window like this:

Garbage collection

Ensure that you check the allocations and calls checkboxes before launching the application using Start Desktop App. After launching (if the application has no stops and is running at a stretch), without breaks, you'll be shown a new statistical window pointing to various summaries of execution.

Each of these summaries lead to a different window in which you can analyze (even with statistical graphics) what happened at runtime in more detail as well as how garbage collector intervened when required.

The following figure shows the main statistical window (note the two sections dedicated to GC statistics and garbage collection handle statistics:

Garbage collection

The screenshot shows two GC-related areas. The first one indicates three kinds of collections, named Gen 0, Gen 1, and Gen 2. These names are simply short names for generations.

This is because GC marks objects depending on their references. Initially, when the GC starts working, these objects with no references are cleaned up. Those still connected are marked as Gen 1. The second review of the GC is initially similar, but if it discovers that there are objects marked Gen 1 that still hold references, they're marked as Gen 2, and those from Gen 0 with any references are promoted to Gen 1. The process goes on while the application is under execution.

This is the reason we can often read that the following principles apply to objects that are subject to recollection:

  • Newest objects are usually collected soon (they're normally created in a function call and are out of the scope when the function finishes)
  • The oldest objects commonly last more (often because they hold references from global or static classes)

The second area shows the number of handles created, destroyed, and surviving (surviving due to garbage collector, of course).

The first one (Time Line) will, in turn, show statistics including the precise execution times in which GC operated, as well as the .NET types implied:

Garbage collection

As you can see, the figure shows a bunch of objects collected and/or promoted to other generations as the program goes on.

This is, of course, much more complex than that. The GC has rules to operate with different frequencies depending on the generation. So, Gen 0 is visited more frequently that Gen 1 and much less than Gen 2.

Furthermore, in the second window, we see all the mechanisms implicit in the execution, allowing us different levels of details so that we can have the whole picture with distinct points of view:

Garbage collection

This is a proof of some of the characteristics of GC. First of all, a de-referenced object is not immediately collected, since the process happens periodically, and there are many factors that influence this frequency. On the other hand, not all orphans are collected at the same time.

One of the reasons for this is that the collection mechanism itself is computationally expensive, and it affects performance, so the recommendation, for most cases, is to just let GC do its work the way it is optimized to do.

Are there exceptions to this rule? Yes; the exceptions are in those cases where you have reserved a lot of resources and you want to make sure that you clean them up before you exit the method or sequence in which your program operates. This doesn't mean that you call GC in every turn of a loop execution (due to the performance reasons we mentioned).

One of the possible solutions in these cases is implementing the IDisposable interface. Let's remember that you can see any member of the CLR by pressing Ctrl + Alt + J or selecting Object Explorer in the main menu.

We'll be presented with a window containing a search box in order to filter our member, and we'll see all places where such a member appears:

Garbage collection

Note

Note that this interface is not available for .NET Core Runtime.

So, we would redefine our class to implement IDisposable (which means that we should write a Dispose() method to invoke the GC inside it). Or, even better, we can follow the recommendations of the IDE and implement Dispose Pattern, which is offered to us as an option as soon as we indicate that our program implements this interface, as shown in the following screenshot:

Garbage collection

Also, remember that, in cases where we have to explicitly dispose a resource, another common and more suggested way is the using block within the context of a method. A typical scenario is when you open a file using some of the classes in the System.IO namespace, such as File. Let's quickly look at it as a reminder.

Imagine that you have a simple text file named Data.txt and you want to open it, read its content, and present it in the console. A possible way to do this rapidly would be by using the following code:

class Program2
{
  static void Main(string[] args)
  {
    var reader = File.OpenText("Data.txt");
    var text = reader.ReadToEnd();
    Console.WriteLine(text);
    Console.Read();
  }
}

What's the problem with this code? It works, but it's using an external resource, since the OpenText method returns an StreamReader object, which we later use to read the contents, and it's not explicitly closed. We should always remember to close those objects that we open and take some time to process.

One of the possible side effects consists of preventing other processes from accessing the file we opened.

So, the best and suggested solution for these cases is to include the declaration of the conflicting object within a using block, as follows:

string text;
using (var reader = File.OpenText("Data.txt"))
{
  text = reader.ReadToEnd();
}
Console.WriteLine(text);
Console.Read();

In this way, garbage collector is automatically invoked to liberate the resources managed by StreamReader, and there's no need to close it explicitly.

Finally, there's always another way of forcing an object to die, that is, using the corresponding finalizer (a method preceded by the ~ sign, which is right opposite to a destructor). It's not a recommended way to destroy objects, but it has been there since the very beginning (let's remember that Hejlsberg inspired many features of the language in C++). And, by the way, the advanced pattern of implementing IDispose includes this option for more advanced collectable scenarios.

Implementing algorithms with the CLR

So far, we've seen some of the more important concepts, techniques, and tools available and related to CLR. In other words, we've seen how the engine works and how the IDE and other tools gives us support to control and monitor what's going on behind the scenes.

Let's dig into some of the more typical structures and algorithms that we'll find in everyday programming so that we can understand the resources that .NET framework puts in our hands to solve common problems a bit better.

We've mentioned that .NET framework installs a repository of DLLs that offer a large number of functionalities. These DLLs are organized by namespaces, so they can be used individually or in conjunction with others.

As it happens with other frameworks such as J2EE, in .NET, we will use the object-oriented programming paradigm as a suitable approach to programming problems.

Data structures, algorithms, and complexity

In the initial versions of .NET (1.0, 1.1), we could use several types of constructions to deal with collections of elements. All modern languages include these constructs as typical resources, and some of these you should know for sure: arrays, stacks, and queues are typical examples.

Of course, the evolution of .NET has produced many novelties, starting with generics, in version 2.0, and other types of similar constructions, such as dictionaries, ObservableCollections, and others in a long list.

But the question is, are we using these algorithms properly? What happens when you have to use one of these constructions and push it to the limits? And to cope with these limits, do we have a way to find out and measure these implementations so that we can use the most appropriate one in every situation?

These questions take us to the measure of complexity. The most common approach to the problem nowadays relies on a technique called Big O Notation or Asymptotic Analysis.

Big O Notation

Big O Notation (Big Omicron Notation) is a variant of a mathematical discipline that describes the limiting behavior of a function when a value leans toward a particular value or toward infinity. When you apply it to computer science, it's used to classify algorithms by how they respond to changes in the input size.

We understand "how they respond" in two ways: response in time (often the most important) as well as response in space, which could lead to memory leaks and other types of problems (eventually including DoS attacks and other threats).

Tip

One of the most exhaustive lists of links to explanations of the thousands of algorithms cataloged up to date is published by NIST (National Institute of Standards and Technology) at https://xlinux.nist.gov/dads/.

The way to express the response in relation to the input (the O notation) consists in a formula such as O([formula]), where formula is a mathematical expression that indicates the growth, that is the number of times the algorithm executes, as the input grows. Many algorithms are of type O(n), and they are called linear because the growth is proportional to the number of inputs. In other words, such growth would be represented by a straight line (although it is never exact).

A typical example is the analysis of sorting algorithms, and NIST mentions a canonical case: quicksort is O(n log n) on average, and bubble offers O(n²). This means that on a desktop computer, a quicksort implementation can beat a bubble one, which is running on a supercomputer when the numbers to be sorted grow beyond a certain point.

Note

As an example, in order to sort 1,000,000 numbers, the quicksort takes 20,000,000 steps on average, while the bubble sort takes 1,000,000,000,000 steps!

The following graphic shows the growth in time of four classical sorting algorithms (bubble, insertion, selection, and shell). As you can see in the graph, the behavior is quite linear until the number of elements passes 25,000, in which the elements differ noticeably. The shell algorithm wins and has a factor of a worst case complexity of O(n^1.5). Note that quicksort has a smaller factor (n log n).

Unfortunately, there's no mechanical procedure to calculate the Big-O, and the only procedures that can be found deal with a, more or less, empirical approach.

However, we can use some well-defined tables that categorize the algorithms and give us the O(formula) to get an idea of what we can obtain out of its usage, such as the one published by Wikipedia, which is accessible at http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions:

Big O Notation

From the point of view of .NET framework, we can use all collections linked to the System.Collections.Generics namespace that guarantee optimized performance for a vast majority of situations.

An approach to performance in the most common sorting algorithms

You will find in DEMO01-04 a .NET program that compares three classical algorithms (bubble, merge, and heap) to the one implemented in List<T> collections using integers. Of course, this approach is a practical, everyday approach and not a scientific one, for which the generated numbers should be uniformly randomly generated (refer to Rasmus Faber's answer to this question at http://stackoverflow.com/questions/609501/generating-a-random-decimal-in-c/610228#610228).

Besides that, another consideration should be made for the generators themselves. For practical purposes such as testing these algorithms, generators included in .NET framework do their job pretty well. However, if you need or are curious about a serious approach, perhaps the most documented and tested one is the Donald Knuth's Spectral Test, published in the second volume of his world's famous The Art of Computer Programming, Volume 2: Seminumerical Algorithms (2nd Edition), by Knuth, Donald E., published by Addison-Wesley.

That said, the random generator class included in .NET can give us good enough results for our purposes. As for the sorting methods targeted here, I've chosen the most commonly recommended ones in comparison with to extremes: the slowest one (bubble with an O(n²) in performance) and the one included in the System.Collections.Generic namespace for the List<T> class (which is, internally, a quick sort). In the middle, a comparison is made between the heap and merge methods—all of them considered O(n log n) in performance.

The previously mentioned demo follows recommended implementations with some updates and improvements for the user interface, which is a simple Windows Forms application, so you can test these algorithms thoroughly.

Also, note that you should execute these tests several times with different amounts of inputs to get a real glimpse of these methods' performance, and that .NET framework is built with optimized sorting methods for integers, strings, and other built-in types, avoiding the cost of calling delegates for comparisons, and so on. So, in comparison with built-in types, typical sorting algorithms are going to be much slower normally.

For example, for 30,000 integers, we obtain the following results:

An approach to performance in the most common sorting algorithms

As you can see, the results of bubble (even being an optimized bubble method) are far worse when the total numbers go beyond 10,000. Of course, for smaller numbers, the difference decreases, and if the routine does not exceed 1,000, it's negligible for most practical purposes.

As an optional exercise for you, we leave the implementation of these algorithms for string sorting.

Tip

You can use some of these routines to quickly generate strings:

int rndStringLength = 14; //max:32-> Guid limit
Guid.NewGuid().ToString("N").Substring(0, rndStringLength);

This one is suggested by Ranvir at http://stackoverflow.com/questions/1122483/random-string-generator-returning-same-string:

public string RandomStr()
{
  string rStr = Path.GetRandomFileName();
  rStr = rStr.Replace(".", ""); // Removing the "."
  return rStr;
}

Remember that, for such situations, you should use generic versions of the merge and heap algorithms so that an invocation can be made to the same algorithm independently of the input values.

Relevant features appearing in versions 4.5x, 4.6, and .NET Core 1.0 and 1.1

Among the new features that we can find in the latest versions of .NET framework and which we have not mentioned yet, some relate to the CLR (as well as many others that will be covered in the following chapters), and among those that relate to the core of .NET, we can find the ones mentioned in the next few sections.

.NET 4.5.x

We can summarize the main improvements and new features that appeared in .NET 4.5 in the following points:

  • Reduction of system restarts
  • Arrays larger than 2 gigabytes (GB) on 64-bit platforms
  • An improvement of background garbage collection for servers (with implications in performance and memory management)
  • JIT compilation in the background, optionally available on multicore processors (to improve the application performance, obviously)
  • New console (System.Console) support for Unicode (UTF-16) encoding
  • An improvement in performance when retrieving resources (especially useful for desktop applications)
  • The possibility of customizing the reflection context so that it overrides the default behavior
  • New asynchronous features were added to C# and Visual Basic languages in order to add a task-based model to perform asynchronous operations
  • Improved support for parallel computing (performance analysis, control, and debugging)
  • The ability to explicitly compact the large object heap (LOH) during garbage collection

.NET 4.6 (aligned with Visual Studio 2015)

In .NET 4.6, new features and improvements are not many, but they're important:

  • 64-bit JIT compiler for managed code (formerly called RyuJIT in beta versions).
  • Assembly loader improvements (working in conjunction with NGEN images; decreases the virtual memory and saves the physical memory).
  • Many changes in Base Class Libraries (BCLs):
  • .NET Native, a new technology that compiles apps to native code rather than IL. They produce apps characterized by faster startup and execution times, among other advantages.

    Note

    .NET Native has major improvements at runtime, but it has a few drawbacks as well, among some other considerations that may affect the way applications behave and should be coded. We'll talk about this in greater depth in other chapters.

  • Open source .NET framework packages (such as Immutable Collections, SIMD APIs and networking APIs, which are now available on GitHub)

.NET Core 1.0

.NET Core is a new version of .NET intended to execute in any operating system (Windows, Linux, MacOS), that can be used in device, cloud, and embedded/IoT scenarios.

It uses a new set of libraries, and –as Rich Lander mentions in the official documentation guide (https://docs.microsoft.com/en-us/dotnet/articles/core/) the set of characteristics that best define this version are:

.NET Core 1.1

Added support for Linus Mint 18, Open Suse 42.1, MacOS 10.12, and Windows Server 2016, with side-by-side installation.

New API's (more than 1000) and bug fixes.

New documentation available at https://docs.microsoft.com/en-us/dotnet/.

A new version of ASP.NET Core 1.1.

At the end of this book, we'll cover .NET Core so you can have an idea of its behavior and is advantages, specially in the cross-platform area.

Summary

CLR is the heart of .NET framework, and we have reviewed some of the most important concepts behind its architecture, design, and implementation in order to better understand how our code works and how it can be analyzed in the search for possible problems.

So, overall, in this chapter, we saw an annotated (with commentaries, graphics, and diagrams) reminder of some important terms and concepts of computing that we will find within the book, and with this foundation, we went through a brief introduction to the motivations that rely on .NET framework's creation along with its fathers.

Next, we covered the what's inside CLR and how we can view it in action using tools provided by CLR itself and others available in Visual Studio 2015 from the Update 1.

The third point was a basic review of the complexity of algorithms, the Big O Notation and the way in which we can measure it in practice by testing some sorting methods implemented in C# in order to finish with a short list of the most relevant features the latest versions of .NET offer and that we will cover in different chapters of this book.

In the next chapter, we will dig into the substance of the C# language from the very beginning (don't miss Hejlsberg's true reasons for the creation of delegates) and how it has evolved to simplify and consolidate programming techniques with generics, lambda expressions, anonymous types, and the LINQ syntax.

Left arrow icon Right arrow icon

Key benefits

  • Uniquely structured content to help you understand what goes on under the hood of .NET’s managed code platform to master .NET programming
  • Deep dive into C# programming and how the code executes via the CLR
  • Packed with hands-on practical examples, you’ll understand how to write applications to make full use of the new features of .NET 4.6, .NET Core and C# 6/7

Description

Mastering C# and .NET Framework will take you in to the depths of C# 6.0/7.0 and .NET 4.6, so you can understand how the platform works when it runs your code, and how you can use this knowledge to write efficient applications. Take full advantage of the new revolution in .NET development, including open source status and cross-platform capability, and get to grips with the architectural changes of CoreCLR. Start with how the CLR executes code, and discover the niche and advanced aspects of C# programming – from delegates and generics, through to asynchronous programming. Run through new forms of type declarations and assignments, source code callers, static using syntax, auto-property initializers, dictionary initializers, null conditional operators, and many others. Then unlock the true potential of the .NET platform. Learn how to write OWASP-compliant applications, how to properly implement design patterns in C#, and how to follow the general SOLID principles and its implementations in C# code. We finish by focusing on tips and tricks that you'll need to get the most from C# and .NET. This book also covers .NET Core 1.1 concepts as per the latest RTM release in the last chapter.

Who is this book for?

This book was written exclusively for .NET developers. If you’ve been creating C# applications for your clients, at work or at home, this book will help you develop the skills you need to create modern, powerful, and efficient applications in C#. No knowledge of C# 6/7 or .NET 4.6 is needed to follow along—all the latest features are included to help you start writing cross-platform applications immediately. You will need to be familiar with Visual Studio, though all the new features in Visual Studio 2015 will also be covered.

What you will learn

  • Understand C# core concepts in depth, from sorting algorithms to the Big O notation
  • Get up to speed with the latest changes in C# 6/7
  • Interface SQL Server and NoSQL databases with .NET
  • Learn SOLID principles and the most relevant GoF Patterns with practical examples in C# 6.0
  • Defend C# applications against attacks
  • Use Roslyn, a self-hosted framework to compile and advanced edition in both C# and Visual basic .NET languages
  • Discern LINQ and associated Lambda expressions, generics, and delegates
  • Design a .NET application from the ground up
  • Understand the internals of a .NET assembly
  • Grasp some useful advanced features in optimization and parallelism
Estimated delivery fee Deliver to Switzerland

Standard delivery 10 - 13 business days

€11.95

Premium delivery 3 - 6 business days

€16.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 15, 2016
Length: 560 pages
Edition : 1st
Language : English
ISBN-13 : 9781785884375
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Switzerland

Standard delivery 10 - 13 business days

€11.95

Premium delivery 3 - 6 business days

€16.95
(Includes tracking information)

Product Details

Publication date : Dec 15, 2016
Length: 560 pages
Edition : 1st
Language : English
ISBN-13 : 9781785884375
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 111.97
Functional C#
€41.99
Mastering C# and .NET Framework
€36.99
.NET Design Patterns
€32.99
Total 111.97 Stars icon
Banner background image

Table of Contents

14 Chapters
1. Inside the CLR Chevron down icon Chevron up icon
2. Core Concepts of C# and .NET Chevron down icon Chevron up icon
3. Advanced Concepts of C# and .NET Chevron down icon Chevron up icon
4. Comparing Approaches for Programming Chevron down icon Chevron up icon
5. Reflection and Dynamic Programming Chevron down icon Chevron up icon
6. SQL Database Programming Chevron down icon Chevron up icon
7. NoSQL Database Programming Chevron down icon Chevron up icon
8. Open Source Programming Chevron down icon Chevron up icon
9. Architecture Chevron down icon Chevron up icon
10. Design Patterns Chevron down icon Chevron up icon
11. Security Chevron down icon Chevron up icon
12. Performance Chevron down icon Chevron up icon
13. Advanced Topics Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Half star icon Empty star icon Empty star icon 2.7
(3 Ratings)
5 star 33.3%
4 star 0%
3 star 0%
2 star 33.3%
1 star 33.3%
Jeffrey L. Armbruster Oct 24, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Covers a lot of C# .Net framework topics. Not in excruciating detail. But enough to get you using the topic code. Posadas, the author, is brave to cover so much material. Well-worth the price and your time. Excellent for self-taught developers and as a classroom reference. Great book!Highly recommended.
Amazon Verified review Amazon
Håvard Oct 07, 2018
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
I haven’t read this Book from the first to the last page, but dived in to the chapters that seemed most interesting.The Book starts with an suprisingly deep coverage of the CLR. That chapters and others with it represents what are irrelevant details for me. The Book says that no previous knowledge of C# is needed. I do have some knowledge, but I’m often having trouble understanding what the author is trying to tell me.Two stars may be harsh, but that represents the value this book has had for me. The pedagogical techniques didn’t come through to me.
Amazon Verified review Amazon
screenjockey Mar 09, 2018
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
As of March 9, 2019, the Kindle version is awful - the table of contents is horked, hyperlinks are limited, etc Now to figure out how to return it ...
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela