You're reading from Mastering Julia Enhance your analytical and programming skills for data modeling and processing with Julia

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781805129790

Length 506 pages

Edition 2nd Edition

Languages

Julia

Concepts

Programming Language

Author (1):

Malcolm Sherrington

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: The Julia Environment

2. Chapter 2: Developing in Julia FREE CHAPTER

3. Chapter 3: The Julia Type System

4. Chapter 4: The Three Ms

5. Chapter 5: Interoperability

6. Chapter 6: Working with Data

7. Chapter 7: Scientific Programming

8. Chapter 8: Visualization

9. Chapter 9: Database Access

10. Chapter 10: Networks and Multitasking

11. Chapter 11: Julia’s Back Pages

12. Index

Why subscribe?

13. Other Books You May Enjoy

Characters and strings

The simplest character-based variables consist of ASCII and Unicode characters.

A single character is delimited by single quotes, whereas a string uses double quotes or, in some cases, triple-double quotes (“””), which is discussed in this section.

A string can be viewed as a one-dimensional array of characters and can be indexed and manipulated in a similar fashion as an array of numeric values:

julia> s = "Hi there, Blue Eyes!"
"Hi there, Blue Eyes!"
julia> length(s)
20
julia> s[11]
'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)
julia> s[end]
'!': ASCII/Unicode U+0021 (category Po: Punctuation, other)

Hint—Try evaluating the following list comprehension: [s[i] for i = length(s):-1:1].

Characters

Observe that Julia has a built-in Char type to represent a character.

A character occupies 32 bits, not 8, which is why it can hold a Unicode character. Have a look at the following example:

# All the following represent the ASCII character capital-A
julia> c = 'A';
julia> c = Char(65);
julia> c = '\U0041'
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

Julia supports Unicode code, as we see here:

julia> c = '\Uc041'
'': Unicode U+c041 (category Lo: Letter, other)

As such, we can output characters from a variety of different alphabets—for example, Chinese:

 julia> '\U7537'
'男': Unicode U+7537 (category Lo: Letter, other)

It is possible to specify a character code of '\Uffff' but char conversion does not check that every value is valid. However, Julia provides an isvalid() function that can be applied to characters:

julia> c = '\Udff3'; isvalid(c)
false

Julia uses the special C-like syntax for certain ASCII control characters such as '\b', '\t', '\n', '\r', and 'f' for backspace, tab, newline, carriage-return, and form-feed, respectively.

The backslash acts as an escape character, so Int('\s') => 115, whereas Int('\t') => 9.

If more than one character is supplied between the single quotes, this raises an error:

julia> 'Hello'
ERROR: syntax: character literal contains multiple characters

Strings

The type of string we are most familiar with comprises a list of ASCII characters that, as we have observed, are normally delimited with double quotes, as in the following example:

julia> s = "Hello there, Blue Eyes";
julia> typeof(s)
String

The following points are worth noting:

The built-in concrete type used for strings (and string literals) is String
This supports the full range of Unicode characters via UTF-8 encoding
All string types are subtypes of the AbstractString abstract type, so when defining a function expecting a string argument, you should declare the type as AbstractString in order to accept any string type

A transcode() function can be used to convert to/from other Unicode encodings:

julia> s = "αβγ";
julia> transcode(UInt16, s)
3-element Vector{UInt16}:
 0x03b1
 0x03b2
 0x03b3

In Julia (as in Java), strings are immutable—that is, the value of a String object cannot be changed. To construct a different string value, you construct a new string from parts of other strings. Let’s look at this in more detail:

ASCII strings are indexable, so from s as defined previously: s[14:17] # => "Blue".
The values in the range are inclusive, and if we wish, we can change the increment to s[14:2:17] => "Bu" or reverse the slice to s[17:–1:14] => "eulB".
Omitting the end of the range is equivalent to running to the end of the string: s[14:] => "Blue Eyes".
However, s[:14] is somewhat unexpected and gives the character 'B', not the string up to and including B. This is because ':' defines a “symbol”, and for a literal, :14 is equivalent to 14, so s[:14] is the same as s[14] and not s[1:14].
The final character in a string can be indexed using the notation end, so in this case, s[end] is equal to the 's' character.

Strings allow for special characters such as \n, \t, and so on.

If we wish to include the double quotes, we can escape them, but Julia provides a """ delimiter.

So, s = "This is the double quote \" character" and s = """This is the double quote " character""" are equivalent:

julia> s = "This is a double quote \" character."; println(s);
This is a double quote " character.

Strings also provide the “$" convention when displaying the value of a variable:

julia> age = 21; s = "I've been $age for many years now!"
I've been 21 for many years now!

Concatenation of strings can be done using the $ convention, but Julia also uses the '*' operator (rather than '+' or some other symbol):

julia> s = "Who are you?";
julia> t = " said the Caterpillar."
julia> s*t or "$s$t" # => "Who are you? said the Caterpillar."

Note

Here’s how a Unicode string can be formed by concatenating a series of characters:

julia> '\U7537'*'\U4EBA'

“男人’’

Regular expressions

Regular expressions (regexes) came to prominence with their inclusion in Perl programming.

There is an old Perl programmer’s adage: “I had a problem and decided to solve it using regular expressions; now, I have two problems.”

Regexes are used for pattern matching; numerous books have been written on them, and support is available in a variety of programming languages post-Perl, notably Java and Python. Julia supports regexes via a special form of string prefixed with r.

Suppose we define an empat pattern as follows:

julia> empat = r"^\S+@\S+\.\S+$"
julia> typeof(empat)
Regex

The following example will give a clue to what the pattern is associated with:

julia> occursin(empat, "fred.flintstone@bedrock.net")
true
julia> occursin(empat, "Fredrick Flintstone@bedrock.net")
false

The pattern is for a valid (simple) email address, and in the second case, the space in Fredrick Flintstone is not valid (because it contains a space!), so the match fails.

Since we may wish to know not only whether a string matches a certain pattern but also how it is matched, Julia has a match() function:

julia> m = match(r"@bedrock","barney,rubble@bedrock.net")
RegexMatch(„@bedrock")

If this matches, the function returns a RegexMatch object; otherwise, it returns Nothing:

julia> m.match
"@bedrock"
julia> m.offset
14
julia> m.captures
0-element Array{Union{Nothing,SubString{String}},1}

A detailed discussion of regexes is beyond the scope of this book.

The following link provides a good online source for all things regex, including an excellent cheat sheet via the Quick Reference page: https://www.rexegg.com.

In addition, there are a number of books on the subject, and a free PDF can be downloaded from the following link:

https://www.academia.edu/22080976/Regular_expressions_cookbook_2nd_edition.

Version strings

Version numbers can be expressed with non-standard string literals as v“…”.

These literals create VersionNumber objects that follow the specifications of “semantic versioning” and therefore are composed of major, minor, and patch numeric values, followed by pre-release and build alpha-numeric annotations.

So, a full specification typically would be “v1.9.1-rc1”, where the major version is “1”, minor version “9”, patch level “1”, and release candidate “1”.

Currently, only the major version needs to be provided, and the others will assume default values; for example, “v1” is equivalent to “v1.0.0”.

(The release candidate has no default, so needs to be explicitly defined.)

Byte array literals

Another special form is the b“…” byte array literal, which permits string notation to express arrays of UInt8 values.

These are the rules for byte array literals:

ASCII characters and ASCII escape sequences produce a single byte
\x and octal escape sequences produce a byte corresponding to the escape value
Unicode escape sequences produce a sequence of bytes encoding that code points in UTF-8

Consider the following two examples:

julia> A = b"HEX:\xefcc"
7-element Base.CodeUnits{UInt8,String}:
[0x48,0x45,0x58,0x3a,0xef,0x63,0x63]
julia> B = b"\u2200 x \u2203 y"
11-element Base.CodeUnits{UInt8,String}:
0xe2
0x88
0x80
0x20
0x78
0x20
0xe2
0x88
0x83
0x20
0x79

Here, the first three elements represent the \u2200 code, then 0x20,0x78,0x20 correspond to <space>x<space>, followed by three more elements for the \u2203 code, and finally, 0x20, 0x79, which represents <space>y.