Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
The Definitive Guide to Power Query (M)

You're reading from   The Definitive Guide to Power Query (M) Mastering complex data transformation with Power Query

Arrow left icon
Product type Paperback
Published in Mar 2024
Publisher Packt
ISBN-13 9781835089729
Length 758 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (3):
Arrow left icon
Greg Deckler Greg Deckler
Author Profile Icon Greg Deckler
Greg Deckler
Melissa de Korte Melissa de Korte
Author Profile Icon Melissa de Korte
Melissa de Korte
Rick de Groot Rick de Groot
Author Profile Icon Rick de Groot
Rick de Groot
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Introducing M 2. Working with Power Query/M FREE CHAPTER 3. Accessing and Combining Data 4. Understanding Values and Expressions 5. Understanding Data Types 6. Structured Values 7. Conceptualizing M 8. Working with Nested Structures 9. Parameters and Custom Functions 10. Dealing with Dates, Times, and Durations 11. Comparers, Replacers, Combiners, and Splitters 12. Handling Errors and Debugging 13. Iteration and Recursion 14. Troublesome Data Patterns 15. Optimizing Performance 16. Enabling Extensions 17. Other Books You May Enjoy
18. Index

The characteristics of M

M is a programming language that serves as the backbone of Power Query, enabling users to extract, clean, and reshape data from various sources, such as databases, spreadsheets, web pages, and more. However, unlike general-purpose programming languages such as C, C#, Java, and Python, which are designed for a wide variety of applications, M is a domain-specific language, specifically designed for data ingest and manipulation. As such, M provides a rich set of functions, operators, and expressions that allow you to perform complex data transformations, calculations, and aggregations. Let’s understand this better by taking a look at the characteristics of M from both formal and informal perspectives.

Formal classification

Programming languages are classified according to a number of properties, such as pure/impure, lower-order/higher order, statically/dynamically typed, strongly/weakly typed, eager/lazy evaluation, and imperative/functional.

Microsoft has described M as:

  • Mostly pure: A programming language is said to be pure if it provides referential integrity. In other words, any expression can be replaced with that expression’s value without changing the program’s behavior or meaning.

    An impure programming language allows side effects, which are actions that cause changes outside the scope of a function’s return value. In the case of M, it is commonly used for data transformation and retrieval tasks, which often involve interacting with external data sources, performing operations on data, and producing output. These actions constitute side effects because they affect the state of the data source or produce output beyond the function’s return value.

    While M provides functional programming constructs and supports immutability, allowing for the creation of pure functions, the language is not purely functional due to its impure nature. It embraces a combination of functional and imperative programming paradigms to facilitate efficient and practical data manipulation and retrieval.

  • Higher-order: For lower-order languages, such as machine code or assembly language, each programming statement corresponds to a single instruction for the computer, while in higher-order languages, each statement corresponds to multiple instructions for the computer.

    Higher-order languages typically allow such things as functions, objects, and modules to be used as values within a program. Higher-order languages often treat functions as first-class citizens. Specifically, this means that functions can be assigned to variables, passed as arguments to other functions, and returned as values from functions.

    Power Query M supports higher-order programming by allowing you to define and manipulate functions as values. You can assign functions to variables, pass functions as arguments to other functions, and return functions as results. This enables you to create more modular and flexible code by abstracting and reusing function logic.

    With higher-order programming capabilities, M allows you to apply transformations and computations dynamically based on input parameters, control flow, and data characteristics. You can write functions that operate on other functions, enabling powerful data manipulation and transformation scenarios.

    For example, you can use higher-order functions in M to dynamically apply a series of transformations to a dataset, based on user-defined criteria, or to create reusable function pipelines for data processing.

    By providing higher-order programming features, M empowers developers to write expressive and modular code, making it easier to work with complex data transformations and customize the behavior of functions to suit specific requirements.

  • Dynamically typed: Dynamically typed languages perform type checking at runtime instead of at compile time, as is the case with statically typed languages. Type checking is simply the process of ensuring that things such as parameters passed to a function are of the correct type, such as text, a number, or a date.
  • Weakly typed: While closely related to the property of statically/dynamically typed, strongly typed and weakly typed refer to something quite different. Strongly typed languages are extremely sensitive to type compatibility and require explicit type definitions for variables before being used. Conversely, weakly typed languages like M do not require explicit type definitions, and some even perform automatic type conversion.

    M is not as weakly typed as programming languages such as Python, since variables are immutable once calculated. Thus, the weak type definition for M generally refers to the ability to use variables whose data types have not been explicitly specified.

    Consider the following Python code:

    a = 42
    a = "Hello World"
    print(a)
    

    This code would not generate an error in Python, even though two different data types are assigned to the variable a. However, similar code is not possible in M, since variables, once calculated, are immutable (i.e., cannot be changed).

    The flexibility in data typing allows Power Query M to handle a wide range of data sources and perform various data transformations effectively. It simplifies the process of working with heterogeneous datasets that may contain different data types and structures.

    It is important to note that even though Power Query M is weakly typed, it still performs type checking during execution (runtime) to ensure the consistency of operations. If a particular operation is not compatible with the inferred type of a value, an error may occur at runtime.

    Overall, the weakly typed nature of Power Query M strikes a balance between flexibility and data integrity, providing users with a versatile language for data transformation tasks.

  • Partially lazy: In general, M follows an eager evaluation strategy, meaning that when you define transformations or computations, those transformations and computations are performed immediately as you apply them to the data. This eager evaluation approach ensures that data transformations occur promptly and that the results are readily available for further processing or analysis. Power Query M is designed to efficiently handle data manipulation and retrieval tasks, focusing on immediate evaluation to provide real-time feedback on transformations.

    The term partially lazy refers to a specific feature within M called lazy evaluation, which is distinct from the overall evaluation strategy of the language. In M, lazy evaluation is applied to expressions within some specific constructs, specifically List, Record, and Table expressions as well as the let expression. These constructs allow you to define expressions that are evaluated only when needed, providing a form of on-demand or lazy evaluation within those contexts.

    M also allows you to define optional arguments for functions. These optional arguments are evaluated lazily, meaning they are not computed unless explicitly used within the function body. Lazy evaluation of optional arguments helps optimize performance by avoiding unnecessary computations for optional values that are not actually used within the function. It ensures that the computations for optional arguments are deferred until their values are required within the function’s execution.

    M also supports conditional branching constructs like if-then-else statements. Only the branch that matches a condition is evaluated, while the other branch is not computed, resulting in lazy evaluation. This is in contrast to an eager evaluation strategy where both branches are evaluated regardless of the condition’s outcome. This form of lazy evaluation within the if-then-else construct allows for efficient computation by avoiding unnecessary evaluations of expressions in the non-matching branch.

    It is important to note that while M has these partially lazy features, the overall evaluation strategy of the language remains predominantly eager. Most expressions in Power Query M are eagerly evaluated, ensuring that data transformations occur promptly, and results are immediately available for further processing. Thus, M is primarily an eager programming language, but it incorporates partial lazy evaluation in specific constructs, such as for the List, Record, Table, and let expressions, as well as optional function arguments and conditional branching. These partially lazy evaluations offer flexibility and optimize performance within those contexts.

  • Functional: M incorporates many functional programming concepts and features, making it a functional programming language. These functional concepts and features include immutability, higher-order functions, function composition, pure functions, and recursion.

    M encourages immutability, meaning that data values are not modified in place but transformed into new values. This promotes the functional programming principle of avoiding side effects.

    M supports higher-order functions, allowing functions to be treated as first-class values. You can pass functions as arguments to other functions, return functions from functions, and store functions in variables.

    M facilitates function composition, enabling you to combine multiple functions to create more complex transformations. This composability is a characteristic of functional programming.

    M promotes the use of pure functions, which have no side effects and produce the same output for the same input. Pure functions make code more predictable and easier to troubleshoot.

    M supports recursion. While recursion is not as extensively supported as in some other functional languages, M does offer limited support for recursive functions, allowing developers to solve problems through recursive techniques.

In terms of comparison with other languages, M is perhaps most similar to F#, a programming language developed and implemented by Don Syme of Microsoft Research, Cambridge, UK.

Now that we have covered the formal classification of the M language, let’s next take a look at some more informal characteristics of M.

Informal characteristics of M

More informally, here are some key characteristics and features of M:

  • Functional language: M is a functional language, meaning it is based on the concept of functions as the primary building blocks for data transformations. Functions in M can be combined, nested, and composed to perform intricate data manipulations. M provides over 700 built-in functions for common operations, as well as the ability to create custom functions tailored to your specific needs. Many of these functions, as well as custom functions, are covered in later chapters.

    The extensive collection of built-in functions in M provides users with powerful tools to handle diverse data transformation scenarios. These functions are designed to simplify common data manipulation tasks and enable users to efficiently transform and shape their data.

  • Expressive and readable syntax: The syntax of M is designed to be intuitive and easy to read, making it accessible to both beginners and experienced programmers. M expressions are written in a clear and concise manner, facilitating the creation of complex data transformations without sacrificing readability. The syntax follows a step-by-step approach, allowing you to define a series of sequential transformations to be applied to your data. Later chapters demonstrate the expressive and readable syntax of M with specific examples.
  • Data types and values: M supports various data types, including primitive data types such as text, numbers, dates, and duration, as well as structured data types such as lists, tables, and records. It provides powerful functions to work with these data types, helping you to manipulate and transform data at a granular level. M also allows you to define and work with variables, constants, and parameters to store and reuse intermediate results during the data transformation process. Data types and values are covered in depth in Chapter 4, Understanding Values and Expressions, and Chapter 5, Understanding Data Types.
  • Integration with Power Query Editor: M seamlessly integrates with the Power Query Editor, providing a user-friendly interface for interacting with and developing M code. The Power Query Editor allows you to visually build data transformation steps, preview the results, and generate M code automatically. It provides a robust development environment where you can write, debug, and refine your M expressions. See Chapter 2, Working with Power Query/M, for more on this subject.
  • Extensibility and customization: One of the standout features of M is its extensibility. While M offers a wide range of built-in transformations, M also allows you to go beyond these capabilities and create custom transformations to suit your very own specific needs. You can define your own functions, write reusable code snippets, and create advanced data manipulation logic using M. This level of customization empowers you to handle complex data scenarios that are not covered by standard transformations accessible through the user interface. Chapter 16, Enabling Extensions, demonstrates M’s flexibility and extensibility.
  • Performance optimization: M is optimized for performance, enabling efficient data processing, even with large datasets. The Power Query engine intelligently evaluates and optimizes M expressions to minimize data loads and transformations, resulting in faster and more efficient data processing.

One specific performance optimization technique is called streaming semantically and is a property of List and Table expressions. Streaming semantically involves the repeated enumeration of table rows or list items. Instead of iterating through the table or list for each data transformation, streaming semantically each row of the table or item in the list is evaluated for all of the data transformations, and the results are collected as part of the output for the expression. Streaming semantically enable the transformation of datasets that do not fit within available memory.

Another performance optimization technique is called query folding. However, query folding is not a property of the M language itself. Instead, query folding is used within Power Query to push or fold data transformations back to source data systems. In essence, the transformation expressions within M are translated to equivalent transformation statements available within the source systems, such as SQL Server. This pushes the processing of transformations back to the source systems instead of the client system executing the M query. This can improve performance and efficiency by minimizing the data transfer and reducing the amount of data processed by Power Query.

By understanding the underlying principles of M and its performance considerations, you can write optimized code and improve the overall performance of your data workflows. Chapter 15, Optimizing Performance, provides more information on performance optimization, with specific examples.

In summary, M is a versatile and expressive language specifically designed for data transformation and manipulation within Power Query. Its functional nature, extensive set of functions, and integration with the Power Query editor make it a powerful tool for extracting, cleaning, and reshaping data from diverse sources.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image