Building pipelines
Before constructing pipelines, let’s briefly recap two key concepts from the previous chapter: currying and partial application. These techniques are fundamental to creating flexible, reusable function components that serve as excellent pipeline building blocks.
Currying, as we learned, transforms a function that takes multiple arguments into a sequence of functions, each accepting a single argument. Here’s an example:
Func<int, int, int> add = (a, b) => a + b; Func<int, Func<int, int>> curriedAdd = a => b => a + b;
Partial application, on the other hand, involves fixing a number of arguments to a function, producing another function with fewer parameters:
Func<int, int, int> multiply = (a, b) => a * b; Func<int, int> triple = x => multiply(3, x);
These concepts naturally lead on to pipeline construction. By currying functions or partially applying them, we create specialized, single-purpose functions that can be easily composed into pipelines. This approach allows us to do the following:
- Break down complex operations into simpler, more manageable pieces
- Reuse these pieces across different pipelines or contexts
- Create more expressive and readable code by chaining these specialized functions
For instance, consider a pipeline for processing game data:
var processGameData = LoadData() .Then(ValidateData) .Then(TransformData) .Then(SaveData);
Each step in this pipeline could be a curried or partially applied function, allowing for easy customization and reuse. As we explore pipeline construction further, remember how currying and partial application can be leveraged to create more flexible and powerful pipelines.
Now, let’s move on to building pipelines.
Pipelines process data through a sequence of processing steps, each represented by a function. This approach is particularly useful for tasks that require multiple transformations, validations, or computations. You most probably have already encountered pipelines while using LINQ to manipulate collections.
Let’s consider a real-world scenario: an Extract, Transform, Load (ETL) process for publishing manuscripts. This process involves several steps:
- Extracting (querying) the manuscript from a database
- Validating its content
- Transforming it into the required format
- Loading (submitting) it for publication
Each step can be represented as a function, and we can use a pipeline to streamline this process. To do this, let’s create a method that applies a sequence of functions to an initial value, passing the result of each function to the next, and name it Pipe
:
public static T Pipe<T>(this T source, params Func<T, T>[] funcs) { return funcs.Aggregate(source, (current, func) => func(current)); }
Let’s consider book manuscript processing: querying the manuscript from a database, validating its content, transforming it into the required format, and finally submitting it for publication:
public class Manuscript { public string Content { get; set; } public bool IsValid { get; set; } public string FormattedContent { get; set; } } public Manuscript Query(Manuscript manuscript) { // Simulate querying the manuscript from a database manuscript.Content = "Original manuscript content."; return manuscript; } public Manuscript Validate(Manuscript manuscript) { // Simulate validating the manuscript manuscript.IsValid = !string.IsNullOrWhiteSpace(manuscript.Content); return manuscript; } public Manuscript Transform(Manuscript manuscript) { // Simulate transforming the manuscript content if (manuscript.IsValid) { manuscript.FormattedContent = manuscript.Content.ToUpper(); } return manuscript; } public Manuscript Submit(Manuscript manuscript) { // Simulate submitting the manuscript for publication if (manuscript.IsValid) { Console.WriteLine($"Manuscript submitted: {manuscript.FormattedContent}"); } else { Console.WriteLine("Manuscript validation failed. Submission aborted."); } return manuscript; }
Here’s how we might execute this flow without using the Pipe
method:
public void ExecutePublishingFlow(Manuscript manuscript) { manuscript = Submit( Transform( Validate( Query( manuscript)))); }
Now, using the Pipe
method, our code becomes 10 times better:
public void ExecutePublishingFlow(Manuscript manuscript) { manuscript .Pipe(Query) .Pipe(Validate) .Pipe(Transform) .Pipe(Submit); }
Of course, it adds a bit of overhead, and the program might work noticeably more slowly, but it’s so much easier and faster to read!