Replacing the content of a string using regular expressions
In the previous two recipes, we looked at how to match a regular expression on a string or a part of a string and iterate through matches and submatches. The regular expression library also supports text replacement based on regular expressions. In this recipe, we will learn how to use std::regex_replace()
to perform such text transformations.
Getting ready
For general information about regular expressions support in C++11, refer to the Verifying the format of a string using regular expressions recipe, earlier in this chapter.
How to do it...
In order to perform text transformations using regular expressions, you should perform the following:
- Include
<regex>
and<string>
and the namespacestd::string_literals
for C++14 standard user-defined literals for strings:#include <regex> #include <string> using namespace std::string_literals;
- Use the
std::regex_replace()
algorithm with a replacement string as the third argument. Consider this example: replace all words composed of exactly three characters that are eithera
,b
, orc
with three hyphens:auto text{"abc aa bca ca bbbb"s}; auto rx = std::regex{ R"(\b[a|b|c]{3}\b)"s }; auto newtext = std::regex_replace(text, rx, "---"s);
- Use the
std::regex_replace()
algorithm with match identifiers prefixed with a$
for the third argument. For example, replace names in the format "lastname, firstname" with names in the format "firstname lastname", as follows:auto text{ "bancila, marius"s }; auto rx = std::regex{ R"((\w+),\s*(\w+))"s }; auto newtext = std::regex_replace(text, rx, "$2 $1"s);
How it works...
The std::regex_replace()
algorithm has several overloads with different types of parameters, but the meaning of the parameters is as follows:
- The input string on which the replacement is performed.
- An
std::basic_regex
object that encapsulates the regular expression used to identify the parts of the strings to be replaced. - The string format used for replacement.
- Optional matching flags.
The return value is, depending on the overload used, either a string or a copy of the output iterator provided as an argument. The string format used for replacement can either be a simple string or a match identifier, indicated with a $
prefix:
$&
indicates the entire match.$1
,$2
,$3
, and so on indicate the first, second, and third submatches, and so on.$`
indicates the part of the string before the first match.$'
indicates the part of the string after the last match.
In the first example shown in the How to do it... section, the initial text contains two words made of exactly three a
, b
, and c
characters, abc
and bca
. The regular expression indicates an expression of exactly three characters between word boundaries. This means a subtext, such as bbbb
, will not match the expression. The result of the replacement is that the string text will be --- aa --- ca bbbb
.
Additional flags for the match can be specified for the std::regex_replace()
algorithm. By default, the matching flag is std::regex_constants::match_default
, which basically specifies ECMAScript as the grammar used for constructing the regular expression. If we want, for instance, to replace only the first occurrence, then we can specify std::regex_constants::format_first_only
. In the following example, the result is --- aa bca ca bbbb
as the replacement stops after the first match is found:
auto text{ "abc aa bca ca bbbb"s };
auto rx = std::regex{ R"(\b[a|b|c]{3}\b)"s };
auto newtext = std::regex_replace(text, rx, "---"s,
std::regex_constants::format_first_only);
The replacement string, however, can contain special indicators for the whole match, a particular submatch, or the parts that were not matched, as explained earlier. In the second example shown in the How to do it... section, the regular expression identifies a word of at least one character, followed by a comma and possible white spaces, and then another word of at least one character. The first word is supposed to be the last name, while the second word is supposed to be the first name. The replacement string is in the $2 $1
format. This is an instruction that's used to replace the matched expression (in this example, the entire original string) with another string formed of the second submatch, followed by a space and then the first submatch.
In this case, the entire string was a match. In the following example, there will be multiple matches inside the string, and they will all be replaced with the indicated string. In this example, we are replacing the indefinite article a when preceding a word that starts with a vowel (this, of course, does not cover words that start with a vowel sound) with the indefinite article an:
auto text{"this is a example with a error"s};
auto rx = std::regex{R"(\ba ((a|e|i|u|o)\w+))"s};
auto newtext = std::regex_replace(text, rx, "an $1");
The regular expression identifies the letter a as a single word (\b
indicates a word boundary, so \ba
means a word with a single letter, a), followed by a space and a word of at least two characters starting with a vowel. When such a match is identified, it is replaced with a string formed of the fixed string an, followed by a space and the first subexpression of the match, which is the word itself. In this example, the newtext
string will be this is an example with an error
.
Apart from the identifiers of the subexpressions ($1
, $2
, and so on), there are other identifiers for the entire match ($&
), the part of the string before the first match ($`
), and the part of the string after the last match ($'
). In the last example, we change the format of a date from dd.mm.yyyy
to yyyy.mm.dd
, but also show the matched parts:
auto text{"today is 1.06.2016!!"s};
auto rx =
std::regex{R"((\d{1,2})(\.|-|/)(\d{1,2})(\.|-|/)(\d{4}))"s};
// today is 2016.06.1!!
auto newtext1 = std::regex_replace(text, rx, R"($5$4$3$2$1)");
// today is [today is ][1.06.2016][!!]!!
auto newtext2 = std::regex_replace(text, rx, R"([$`][$&][$'])");
The regular expression matches a one- or two-digit number followed by a dot, hyphen, or slash; followed by another one- or two-digit number; then a dot, hyphen, or slash; and lastly a four-digit number.
For newtext1
, the replacement string is $5$4$3$2$1
; this means year, followed by the second separator, then month, the first separator, and finally day. Therefore, for the input string today is 1.06.2016!
, the result is today is 2016.06.1!!
.
For newtext2
, the replacement string is [$`][$&][$']
; this means the part before the first match, followed by the entire match, and finally the part after the last match, are in square brackets. However, the result is not [!!][1.06.2016][today is ]
as you perhaps might expect at first glance, but today is [today is ][1.06.2016][!!]!!
. The reason for this is that what is replaced is the matched expression, and, in this case, that is only the date (1.06.2016
). This substring is replaced with another string formed of all the parts of the initial string.
See also
- Verifying the format of a string using regular expressions to familiarize yourself with the C++ library support for working with regular expressions
- Parsing the content of a string using regular expressions to learn how to perform multiple matches of a pattern in a text