Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Tcl/Tk: Handling String Expressions

Save for later
  • 11 min read
  • 02 Mar 2011

article-image

Tcl/Tk 8.5 Programming Cookbook


tcltk-handling-string-expressions-img-0

Over 100 great recipes to effectively learn Tcl/Tk 8.5

  • The quickest way to solve your problems with Tcl/Tk 8.5
  • Understand the basics and fundamentals of the Tcl/Tk 8.5 programming language
  • Learn graphical User Interface development with the Tcl/Tk 8.5 Widget set
  • Get a thorough and detailed understanding of the concepts with a real-world address book application
  • Each recipe is a carefully organized sequence of instructions to efficiently learn the features and capabilities of the Tcl/Tk 8.5 language


When I first started using Tcl, everything I read or researched stressed the mantra "Everything is a string". Coming from a hard-typed coding environment, I was used to declaring variable types and in Tcl this was not needed. A set command could—and still does—create the variable and assigns the type on the fly. For example, set variable "7" and set variable 7 will both create a variable containing 7. However, with Tcl, you can still print the variable containing a numeric 7 and add 1 to the variable containing a string representation of 7.

It still holds true today that everything in Tcl is a string. When we explore the TK Toolkit and widget creation, you will rapidly see that widgets themselves have a set of string values that determine their appearance and/or behavior.

As a pre-requisite for the recipes in this article, launch the Tcl shell as appropriate for your operating system. You can access Tcl from the command line to execute the commands.

As with everything else we have seen, Tcl provides a full suite of commands to assist in handling string expressions. However due to the sheer number of commands and subsets, I won't be listing every item individually in the following section. Instead we will be creating numerous recipes and examples to explore in the following sections. A general list of the commands is as follows:

CommandDescriptionstringThe string command contains multiple keywords allowing for manipulation and data gathering functions.appendAppends to a string variable.formatFormat a string in the same manner as C sprint.regexpRegular Expression matching.regsubPerforms substitution, based on Regular Expression matching.scanParses a string using conversion specifiers in the same manner as C sscanf.substPerform backslash, command, and variable substitution on a string.

Using the commands listed in the table, a developer can address all their needs as applies to strings. In the following sections, we will explore these commands as well as many subsets of the string command.

Appending to a string


Creating a string in Tcl using the set command is the starting point for all string commands. This will be the first command for most, if not all of the following recipes. As we have seen previously, entering a set variable value on the command line does this. However, to fully implement strings within a Tcl script, we need to interact with these strings from time to time, for example, with an open channel to a file or HTTP pipe. To accomplish this, we will need to read from the channel and append to the original string.

To accomplish appending to a string, Tcl provides the append command. The append command is as follows:

append variable value value value...

How to do it…


In the following example, we will create a string of comma-delimited numbers using the for control construct. Return values from the commands are provided for clarity. Enter the following command:

% set var 0
0 
% for {set x 1} {$x<=10}{$x<=10} {incr x} {
append var , $x
}
%puts $var
0,1,2,3,4,5,6,7,8,9,10

How it works…


The append command accepts a named variable to contain the resulting string and a space delimited list of strings to append. As you can see, the append command accepted our variable argument and a string containing the comma. These values were used to append to original variable (containing a starting value of 0). The resulting string output with the puts command displays our newly appended variable complete with commas.

Formatting a string


Strings, as we all know, are our primary way of interacting with the end-user. Whether presented in a message box or simply directed to the Tcl shell, they need to be as fluid as possible, in the values they present. To accomplish this, Tcl provides the format command. This command allows us to format a string with variable substitution in the same manner as the ANSI C sprintf procedure. The format command is as follows:

format string argument argument argument...


The format command accepts a string containing the value to be formatted as well as % conversion specifiers. The arguments contain the values to be substituted into the final string. Each conversion specifier may contain up to six (6) sections—an XPG2 position specifier, a set of fags, minimum field width, a numeric precision specifier, size modifier, and a conversion character. The conversion specifiers are as follows:

SpecifierDescriptiond or iFor converting an integer to a signed decimal string.uFor converting an integer to an unsigned decimal string.oFor converting an integer to an unsigned octal sting.x or XFor converting an integer to an unsigned hexadecimal string.
The lowercase x is used for lowercase hexadecimal notations.
The uppercase X will contain the uppercase hexadecimal notations.cFor converting an integer to the Unicode character it represents.sNo conversion is performed.fFor converting the number provided to a signed decimal string of the form xxx.yyy, where the number of y's is determined with the precision of 6 decimal places (by default).e or EIf the uppercase E is used, it is utilized in the string in place of the lowercase e.g or GIf the exponent is less than -4 or greater than or equal to the precision, then this is used for converting the number utilized for the %e or %E; otherwise for converting in the same manner as %f.%The % sign performs no conversion; it merely inserts a % character into the string.

There are three differences between the Tcl format and the ANSI C sprintf procedure:

  • The %p and %n conversion switches are not supported.
  • The % conversion for %c only accepts an integer value.
  • Size modifiers are ignored for formatting of floating-point values.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime

How to do it…


In the following example, we format a long date string for output on the command line. Return values from the commands are provided for clarity. Enter the following command:

% set month May
May
% set weekday Friday
Friday
% set day 5
5
% set extension th
th
%set year 2010
2010
%puts [format "Today is %s, %s %d%s %d" $weekday $month $day $extension 
$year]
Today is Friday, May 5th 2010

How it works…


The format command successfully replaced the desired conversion fag delimited regions with the variables assigned.

Matching a regular expression within a string


Regular expressions provide us with a powerful method to locate an arbitrarily complex pattern within a string. The regexp command is similar to a Find function in a text editor. You search for a defined string for the character or the pattern of characters you are looking for and it returns a Boolean value that indicates success or failure and populates a list of optional variables with any matched strings. The -indices and -inline options must be used to modify the behavior, as indicated by this statement. But it doesn't stop there; by providing switches, you can control the behavior of regexp. The switches are as follows:

SwitchBehavior-aboutNo actual matching is made. Instead regexp returns a list containing information about the regular expression where the first element is a subexpression count and the second is a list of property names describing various attributes about the expression.-expandedAllows the use of expanded regular expression, wherein whitespaces and comments are ignored.-indicesReturns a list of two decimal strings, containing the indices in the string to match for the first and last characters in the range-lineEnables the newline-sensitive matching similar to passing the -linestop and -lineanchor switches.

-linestop

Changes the behavior of [^] bracket expressions and the "." character so that they stop at newline characters.-lineanchorChanges the behavior of ^ and $ (anchors) so that they match both the beginning and end of a line.-nocaseTreats uppercase characters in the search string as lowercase.-allCauses the command to match as many times as possible and returns the count of the matches found.-inline

Causes regexp to return a list of the data that would otherwise have been placed in match variables.
Match variables may NOT be used if -inline is specified.

 

-startAllows us to specify a character index from which searching should start.--Denotes the end of switches being passed to regexp.
Any argument following this switch will be treated as an expression, even if they start with a "-".

Now that we have a background in switches, let's look at the command itself:

regexp switches expression string submatchvar submatchvar...


The regexp command determines if the expression matches part or all of the string and returns a 1 if the match exists or a 0 if it is not found. If the variables (submatchvar) (for example myNumber or myData) are passed after the string, they are used as variables to store the returned submatchvar. Keep in mind that if the –inline switch has been passed, no return variables should be included in the command.

Getting ready


To complete the following example, we will need to create a Tcl script file in your working directory. Open the text editor of your choice and follow the next set of instructions.

How to do it…


A common use for regexp is to accept a string containing multiple words and to split it into its constituent parts. In the following example, we will create a string containing an IP address and assign the values to the named variables. Enter the following command:

% regexp "([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})"  
  $ip all first second third fourth
% puts "$all n$first n$second n$third n$fourth"
192.168.1.65
192
168
1
65

How it works…


As you can see, the IP Address has been split into its individual octet values. What regexp has done is match the groupings of decimal characters [0-9] of a varying length of 1 to 3 characters {1, 3} delimited by a "." character. The original IP address is assigned to the first variable (all) while the octet values are assigned to the remaining variables (first, second, third and fourth).

Performing character substitution on a string


If regexp is a Find function, then regsub is equivalent to Find and Replace. The regsub command accepts a string and using Regular Expression pattern matching, it locates and, if desired, replaces the pattern with the desired value. The syntax of regsub is similar to regexp as are the switches. However, additional control over the substitution is added. The switches are as listed next:

SwitchDescription-allCauses the command to perform substitution for each match found
The & and n sequences are handled for each substitution-expandedAllows use of expanded regular expression wherein whitespace and comments are ignored-lineEnables newline sensitive matching similar to passing the -linestop and -lineanchor switches-linestopChanges the behavior of [^] bracket expressions so that they stop at newline characters-lineanchorChanges the behavior of ^ and $ (anchors) so that they match both the beginning and end of a line-nocaseTreats Upper Case characters in the search string as Lower Case-startAllows specification of a character offset in the string from which to start matching

Now that we have a background in switches as they apply to the regsub command, let's look at the command:

regsub switches expression string substitution variable


The regsub command matches the expression against the string provided and either copies the string to the variable or returns the string if a variable is not provided. If a match is located, the portion of the string that matched is replaced by substitution. Whenever a substitution contains an & or a character, it is replaced with the portion of the string that matches the expression. If the substitution contains the switch "n" (where n represents a numeric value between 1 and 9), it is replaced with the portion of the string that matches with the nth sub-expression of the expression. Additional backslashes may be used in the substitution to prevent interpretation of the &, , n, and the backslashes themselves. As both the regsub command and the Tcl interpreter perform backslash substitution, you should enclose the string in curly braces to prevent unintended substitution.

How to do it…


In the following example, we will substitute every instance of the word one, which is a word by itself, with the word three. Return values from the commands are provided for clarity. Enter the following command:

% set original "one two one two one two"
one two one two one two


% regsub -all {one} $original three new
3

% puts $new
three two three two three two

How it works…


As you can see, the value returned from the regsub command lists the number of matches found. The string original has been copied into the string new, with the substitutions completed. With the addition of additional switches, you can easily parse a lengthy string variable and perform bulk updates. I have used this to rapidly parse a large text file prior to importing data into a database.