Creating a library of string helpers
The string types from the standard library are a general-purpose implementation that lacks many helpful methods, such as changing the case, trimming, splitting, and others that may address different developer needs. Third-party libraries that provide rich sets of string functionalities exist. However, in this recipe, we will look at implementing several simple, yet helpful, methods you may often need in practice. The purpose is rather to see how string methods and standard general algorithms can be used for manipulating strings, but also to have a reference to reusable code that can be used in your applications.
In this recipe, we will implement a small library of string utilities that will provide functions for the following:
- Changing a string into lowercase or uppercase
- Reversing a string
- Trimming white spaces from the beginning and/or the end of the string
- Trimming a specific set of characters from the beginning and/or the end of the string
- Removing occurrences of a character anywhere in the string
- Tokenizing a string using a specific delimiter
Before we start with the implementation, let's look at some prerequisites.
Getting ready
The string library we will be implementing should work with all the standard string types; that is, std::string
, std::wstring
, std::u16string
, and std::u32string
.
To avoid specifying long names such as std::basic_string<CharT, std::char_traits<CharT>, std::allocator<CharT>>
, we will use the following alias templates for strings and string streams:
template <typename CharT>
using tstring =
std::basic_string<CharT, std::char_traits<CharT>,
std::allocator<CharT>>;
template <typename CharT>
using tstringstream =
std::basic_stringstream<CharT, std::char_traits<CharT>,
std::allocator<CharT>>;
To implement these string helper functions, we need to include the header <string>
for strings and <algorithm>
for the general standard algorithms we will use.
In all the examples in this recipe, we will use the standard user-defined literal operators for strings from C++14, for which we need to explicitly use the std::string_literals
namespace.
How to do it...
- To convert a string to lowercase or uppercase, apply the
tolower()
ortoupper()
functions to the characters of a string using the general-purpose algorithmstd::transform()
:template<typename CharT> inline tstring<CharT> to_upper(tstring<CharT> text) { std::transform(std::begin(text), std::end(text), std::begin(text), toupper); return text; } template<typename CharT> inline tstring<CharT> to_lower(tstring<CharT> text) { std::transform(std::begin(text), std::end(text), std::begin(text), tolower); return text; }
- To reverse a string, use the general-purpose algorithm
std::reverse()
:template<typename CharT> inline tstring<CharT> reverse(tstring<CharT> text) { std::reverse(std::begin(text), std::end(text)); return text; }
- To trim a string, at the beginning, end, or both, use the
std::basic_string
methodsfind_first_not_of()
andfind_last_not_of()
:template<typename CharT> inline tstring<CharT> trim(tstring<CharT> const & text) { auto first{ text.find_first_not_of(' ') }; auto last{ text.find_last_not_of(' ') }; return text.substr(first, (last - first + 1)); } template<typename CharT> inline tstring<CharT> trimleft(tstring<CharT> const & text) { auto first{ text.find_first_not_of(' ') }; return text.substr(first, text.size() - first); } template<typename CharT> inline tstring<CharT> trimright(tstring<CharT> const & text) { auto last{ text.find_last_not_of(' ') }; return text.substr(0, last + 1); }
- To trim characters in a given set from a string, use overloads of the
std::basic_string
methodsfind_first_not_of()
andfind_last_not_of()
, which take a string parameter that defines the set of characters to look for:template<typename CharT> inline tstring<CharT> trim(tstring<CharT> const & text, tstring<CharT> const & chars) { auto first{ text.find_first_not_of(chars) }; auto last{ text.find_last_not_of(chars) }; return text.substr(first, (last - first + 1)); } template<typename CharT> inline tstring<CharT> trimleft(tstring<CharT> const & text, tstring<CharT> const & chars) { auto first{ text.find_first_not_of(chars) }; return text.substr(first, text.size() - first); } template<typename CharT> inline tstring<CharT> trimright(tstring<CharT> const &text, tstring<CharT> const &chars) { auto last{ text.find_last_not_of(chars) }; return text.substr(0, last + 1); }
- To remove characters from a string, use
std::remove_if()
andstd::basic_string::erase()
:template<typename CharT> inline tstring<CharT> remove(tstring<CharT> text, CharT const ch) { auto start = std::remove_if( std::begin(text), std::end(text), [=](CharT const c) {return c == ch; }); text.erase(start, std::end(text)); return text; }
- To split a string based on a specified delimiter, use
std::getline()
to read from anstd::basic_stringstream
initialized with the content of the string. The tokens extracted from the stream are pushed into a vector of strings:template<typename CharT> inline std::vector<tstring<CharT>> split (tstring<CharT> text, CharT const delimiter) { auto sstr = tstringstream<CharT>{ text }; auto tokens = std::vector<tstring<CharT>>{}; auto token = tstring<CharT>{}; while (std::getline(sstr, token, delimiter)) { if (!token.empty()) tokens.push_back(token); } return tokens; }
How it works...
To implement the utility functions from the library, we have two options:
- Functions would modify a string passed by a reference
- Functions would not alter the original string but return a new string
The second option has the advantage that it preserves the original string, which may be helpful in many cases. Otherwise, in those cases, you would first have to make a copy of the string and alter the copy. The implementation provided in this recipe takes the second approach.
The first functions we implemented in the How to do it... section were to_upper()
and to_lower()
. These functions change the content of a string either to uppercase or lowercase. The simplest way to implement this is using the std::transform()
standard algorithm. This is a general-purpose algorithm that applies a function to every element of a range (defined by a begin and end iterator) and stores the result in another range for which only the begin iterator needs to be specified. The output range can be the same as the input range, which is exactly what we did to transform the string. The applied function is toupper()
or tolower()
:
auto ut{ string_library::to_upper("this is not UPPERCASE"s) };
// ut = "THIS IS NOT UPPERCASE"
auto lt{ string_library::to_lower("THIS IS NOT lowercase"s) };
// lt = "this is not lowercase"
The next function we considered was reverse()
, which, as the name implies, reverses the content of a string. For this, we used the std::reverse()
standard algorithm. This general-purpose algorithm reverses the elements of a range defined by a begin and end iterator:
auto rt{string_library::reverse("cookbook"s)}; // rt = "koobkooc"
When it comes to trimming, a string can be trimmed at the beginning, end, or both sides. Because of that, we implemented three different functions: trim()
for trimming at both ends, trimleft()
for trimming at the beginning of a string, and trimright()
for trimming at the end of a string. The first version of the function trims only spaces. In order to find the right part to trim, we use the find_first_not_of()
and find_last_not_of()
methods of std::basic_string
. These return the first and last characters in the string that are not of the specified character. Subsequently, a call to the substr()
method of std::basic_string
returns a new string. The substr()
method takes an index in the string and a number of elements to copy to the new string:
auto text1{" this is an example "s};
// t1 = "this is an example"
auto t1{ string_library::trim(text1) };
// t2 = "this is an example "
auto t2{ string_library::trimleft(text1) };
// t3 = " this is an example"
auto t3{ string_library::trimright(text1) };
Sometimes, it can be useful to trim other characters and then spaces from a string. In order to do that, we provided overloads for the trimming functions that specify a set of characters to be removed. That set is also specified as a string. The implementation is very similar to the previous one because both find_first_not_of()
and find_last_not_of()
have overloads that take a string containing the characters to be excluded from the search:
auto chars1{" !%\n\r"s};
auto text3{"!! this % needs a lot\rof trimming !\n"s};
auto t7{ string_library::trim(text3, chars1) };
// t7 = "this % needs a lot\rof trimming"
auto t8{ string_library::trimleft(text3, chars1) };
// t8 = "this % needs a lot\rof trimming !\n"
auto t9{ string_library::trimright(text3, chars1) };
// t9 = "!! this % needs a lot\rof trimming"
If removing characters from any part of the string is necessary, the trimming methods are not helpful because they only treat a contiguous sequence of characters at the start and end of a string. For that, however, we implemented a simple remove()
method. This uses the std:remove_if()
standard algorithm.
Both std::remove()
and std::remove_if()
work in a way that may not be very intuitive at first. They remove elements that satisfy the criteria from a range defined by a first and last iterator by rearranging the content of the range (using move assignment). The elements that need to be removed are placed at the end of the range, and the function returns an iterator to the first element in the range that represents the removed elements. This iterator basically defines the new end of the range that was modified. If no element was removed, the returned iterator is the end iterator of the original range. The value of this returned iterator is then used to call the std::basic_string::erase()
method, which actually erases the content of the string defined by two iterators. The two iterators in our case are the iterator returned by std::remove_if()
and the end of the string:
auto text4{"must remove all * from text**"s};
auto t10{ string_library::remove(text4, '*') };
// t10 = "must remove all from text"
auto t11{ string_library::remove(text4, '!') };
// t11 = "must remove all * from text**"
The last method we implemented, split()
, splits the content of a string based on a specified delimiter. There are various ways to implement this. In this implementation, we used std::getline()
. This function reads characters from an input stream until a specified delimiter is found and places the characters in a string. Before starting to read from the input buffer, it calls erase()
on the output string to clear its content. Calling this method in a loop produces tokens that are placed in a vector. In our implementation, empty tokens were skipped from the result set:
auto text5{"this text will be split "s};
auto tokens1{ string_library::split(text5, ' ') };
// tokens1 = {"this", "text", "will", "be", "split"}
auto tokens2{ string_library::split(""s, ' ') };
// tokens2 = {}
Two examples of text splitting are shown here. In the first example, the text from the text5
variable is split into words and, as mentioned earlier, empty tokens are ignored. In the second example, splitting an empty string produces an empty vector of token
.
See also
- Creating cooked user-defined literals to learn how to create literals of user-defined types
- Creating type aliases and alias templates in Chapter 1, Learning Modern Core Language Features, to learn about aliases for types