An annotated reminder of some important computing terms
Let's check out some important concepts widely used in software construction that show up frequently in .NET programming.
Context
As Wikipedia states:
In computer science, a task context is the minimal set of data used by a task (which may be a process or thread) that must be saved to allow a task interruption at a given date, and a continuation of this task at the point it has been interrupted and at an arbitrary future date.
In other words, context is a term related to the data handled by a thread. Such data is conveniently stored and recovered by the system as required.
Practical approaches to this concept include HTTP request/response and database scenarios in which the context plays a very important role.
The OS multitask execution model
A CPU is able to manage multiple processes in a period of time. As we mentioned, this is achieved by saving and restoring (in an extremely fast manner) the context of execution with a technique called context switch.
When a thread ceases to execute, it is said to be in the Idle state. This categorization might be useful at the time of analyzing processes execution with the tools that are able to isolate threads in the Idle state:
Context types
In some languages, such as C#, we also find the concept of safe or secure context. In a way, this relates to the so-called thread safety.
Thread safety
A piece of code is said to be thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time. There are various strategies used in order to create thread-safe data structures, and the .NET framework is very careful about this concept and its implementations.
Actually, most of the MSDN (the official documentation) includes the indication this type is thread-safe at the bottom for those to whom it is applicable (a vast majority).
State
The state of a computer program is a technical term for all the stored information, at a given instant in time, to which the program has access. The output of a computer program at any time is completely determined by its current inputs and its state. A very important variant of this concept is the program's state.
Program state
This concept is especially important, and it has several meanings. We know that a computer program stores data in variables, which are just labeled storage locations in the computer's memory. The contents of these memory locations, at any given point in the program's execution, are called the program's state.
In object-oriented languages, it is said that a class defines its state through fields, and the values that these fields have at a given moment of execution determine the state of that object. Although it's not mandatory, it's considered a good practice in OOP programming when the methods of a class have the sole purpose of preserving the coherence and logic of its state and nothing else.
In addition, a common taxonomy of programming languages establishes two categories: imperative and declarative programming. C# or Java are examples of the former, and HTML is a typical declarative syntax (since it's not a language itself). Well, in declarative programming, sentences tend to change the state of the program while using the declarative paradigm, languages indicate only the desired result, with no specifications about how the engine will manage to obtain the results.
Serialization
Serialization is the process of translating data structures or the object state into a format that can be stored (for example, in a file or a memory buffer) or transmitted across a network connection and reconstructed later in the same or another computer environment.
So, we used to say that serializing an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object. Popular text formats emerged years ago and are now well known and accepted, such as XML and JSON, independently of other previous formats (binary included):
Process
The OS fragments operations among several functional units. This is done by allocating different memory areas for each unit in execution. It's important to distinguish between processes and threads.
Each process is given a set of resources by the OS, which—in Windows—means that a process will have its own virtual address space allocated and managed accordingly. When Windows initializes a process, it is actually establishing a context of execution, which implies a process environment block, also known as PEB and a data structure. However, let's make this clear: the OS doesn't execute processes; it only establishes the execution context.
Thread
A thread is the functional (or working) unit of a process. And that is what the OS executes. Thus, a single process might have several threads of execution, which is something that happens very often. Each thread has its own address space within the resources previously allocated by the creation of the process. These resources are shared by all threads linked to the process:
It's important to recall that a thread only belongs to a single process, thus having access to only the resources defined by that process. When using the tools that will be suggested now, we can look at multiple threads executing concurrently (which means that they start working in an independent manner) and share resources, such as memory and data.
Different processes do not share these resources. In particular, the threads of a process share its instructions (the executable code) and its context (the values of its variables at any given moment).
Programming languages such as .NET languages, Java, or Python expose threading to the developer while abstracting the platform-specific differences in threading implementations at runtime.
Tip
Note that communication between threads is possible through the common set of resources initialized by the process creation.
Of course, there is much more written about these two concepts, which go far beyond the scope of this book (refer to Wikipedia, https://en.wikipedia.org/wiki/Thread_(computing), for more details), but the system provides us with mechanisms to check the execution of any process and also check what the threads in execution are.
If you are curious about it or just need to check whether something is going wrong, there are two main tools that I recommend: the Task Manager (included in the operating system, which you'll probably know), and—even better—one of the tools designed by the distinguished engineer and technical fellow Mark Russinowitch, available for free and composed of a set of more than 50 utilities.
Some have a Windows interface and others are console utilities, but all of them are highly optimized and configurable to monitoring and controlling the inner aspects of our operating system at any moment. They are available for free at https://technet.microsoft.com/en-us/sysinternals/bb545021.aspx.
If you don't want to install anything else, open Task Manager (just right-click on the task bar to access it) and select the Details tab. You will see a more detailed description of every process, the amount of CPU used by each process, the memory allocated for each process, and so on. You can even right-click on one of the processes and see how there is a context menu that offers a few possibilities, including launching a new dialog window that shows some properties related to it:
SysInternals
If you really want to know how a process behaves in its entirety, the tools to use are SysInternals. If you go to the link indicated earlier, you'll see an item menu especially dedicated to process utilities. There, you have several choices to work with, but the most comprehensive are Process Explorer and Process Monitor.
Process Explorer and Process Monitor don't require installation (they're written in C++), so you can execute them directly from any device for a Windows platform.
For example, if you run Process Explorer, you'll see a fully detailed window showing every single detail of all the processes currently active in the system.
With Process Explorer, you can find out what files, registry keys, and other objects processes have opened, together with the DLLs they have loaded, who owns each process, and so on. Every thread is visible and the tool provides you with detailed information, available through a very intuitive user interface:
It's also very useful to check the system's general behavior at real time, since it creates graphics of activities of CPU usage, I/O, Memory, among others, as shown in the following screenshot:
In a similar way, Process Monitor, focuses on monitoring the filesystem, the Registry, and all processes and threads with their activities in real time, since it actually is a mixture of two previous utilities merged together: FileMon (File Monitor) and RegMon (Registry Monitor), which are not available anymore.
If you try out PM, you'll see some of the information included in PE, plus the specific information provided by PM—just conveyed in a different manner.
Static versus dynamic memory
When a program starts execution, the OS assigns a process to it by means of scheduling: the method by which work specified by some means is assigned to resources that complete the work. This means that the resources for the process are assigned, and that implies memory allocation.
As we'll see, there are mainly two types of memory allocation:
- Fixed memory (linked to the stack), determined at compile time. Local variables are declared and used in the stack. Note that it is a contiguous block of memory allocated when the process resources are initially assigned. The allocation mechanism is very fast (although the access not so much).
- The other is dynamic memory (the heap), which can grow as the program required it, and it's assigned at runtime. This is the place where instance variables are allocated (those that point to an instance of a class or object).
Usually, the first type is calculated at compile time since the compiler knows how much memory will be needed to allocate the variables declared depending on its type (int
, double
, and so on). They are declared inside functions with a syntax such as int x = 1;
The second type requires the new
operator to be invoked. Let's say there is a class named Book
in our code, we create an instance of such Book
with an expression of this type:
Book myBook = new Book();
This instructs the runtime to allocate enough space in the heap to hold an instance of that type along with its fields; the state of the class is allocated in the heap. This means that the whole state of a program will store its state in a different memory (and, optionally, disk) locations.
Of course, there are more aspects to account for, which we'll cover in the The Stack and the Heap section in this chapter. Luckily, the IDE lets us watch and analyze all these aspects (and many more) at debug time, offering an extraordinary debugging experience.
Garbage collector
Garbage collection (GC) is a form of automatic memory management. The GC in .NET, attempts to reclaim garbage or the memory occupied by objects that are no longer in use by the program. Going back to the previous code declaration of Book
, when there are no references to the Book
object in the stack, the GC will reclaim that space to the system, liberating memory (it's a bit more complex, in fact, and I'll get into further detail later in this chapter—when we talk about memory management—but let's put it that way for the moment).
It's important to note that garbage collectors are not something exclusive to the .NET platform. Actually, you can find it in all platforms and programs even if you're dealing with browsers. Current JavaScript engines, for instance, such as Chrome's V8, Microsoft's Chakra—and others—use a garbage collection mechanism as well.
Concurrent computing
Concurrency or concurrent computing is a very common concept nowadays, and we'll discover it at several instances along this book. The official definition in Wikipedia (https://en.wikipedia.org/wiki/Concurrent_computing) says:
"Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially (one completing before the next starts). This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process"). A concurrent system is one where a computation can advance without waiting for all other computations to complete; where more than one computation can advance at the same time."
Parallel computing
Parallel computing is a type of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved at the same time. .NET offers several variants of this type of computing, which we'll cover over the next few chapters:
Imperative programming
Imperative programming is a programming paradigm that describes computation in terms of the program's state. C#, JavaScript, Java, or C++ are typical examples of imperative languages.
Declarative programming
In contrast to imperative programming, languages considered declarative describe only the desired results without explicitly listing commands or steps that must be performed. Many markup languages, such as HTML, XAML, or XSLT, fall into this category.