You're reading from Polished Ruby Programming Build better software with more intuitive, maintainable, scalable, and high-performance Ruby code

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781801072724

Length 434 pages

Edition 1st Edition

Languages

Ruby

Concepts

Programming Language

Author (1):

Jeremy Evans

View More author details

Table of Contents (23) Chapters

Preface

1. Section 1: Fundamental Ruby Programming Principles

2. Chapter 1: Getting the Most out of Core Classes FREE CHAPTER

3. Chapter 2: Designing Useful Custom Classes

4. Chapter 3: Proper Variable Usage

5. Chapter 4: Methods and Their Arguments

6. Chapter 5: Handling Errors

7. Chapter 6: Formatting Code for Easy Reading

8. Section 2: Ruby Library Programming Principles

9. Chapter 7: Designing Your Library

10. Chapter 8: Designing for Extensibility

11. Chapter 9: Metaprogramming and When to Use It

12. Chapter 10: Designing Useful Domain-Specific Languages

13. Chapter 11: Testing to Ensure Your Code Works

14. Chapter 12: Handling Change

15. Chapter 13: Using Common Design Patterns

16. Chapter 14: Optimizing Your Library

17. Section 3: Ruby Web Programming Principles

18. Chapter 15: The Database Is Key

19. Chapter 16: Web Application Design Principles

20. Chapter 17: Robust Web Application Security

21. Assessments

22. Other Books You May Enjoy

Understanding how symbols differ from strings

One of the most useful but misunderstood aspects of Ruby is the difference between symbols and strings. One reason for this is there are certain methods of Ruby that deal with symbols, but will still accept strings, or perform string-like operations on a symbol. Another reason is due to the popularity of Rails and its pervasive use of ActiveSupport::HashWithIndifferentAccess, which allows you to use either a string or a symbol for accessing the same data. However, symbols and strings are very different internally, and serve completely different purposes. However, Ruby is focused on programmer happiness and productivity, so it will often automatically convert a string to a symbol if it needs a symbol, or a symbol to a string if it needs a string.

A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string.

A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby calls ID, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having an ID type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby uses ID values to reference local variables, instance variables, class variables, constants, and method names.

Say you run Ruby code as follows:

foo.add(bar)

Ruby will parse this code, and for foo, add, and bar, it will look up whether it already has an ID associated with the identifier. If it already has an ID, it will use it; otherwise, it will create a new ID value and associate it with the identifier. This happens during parsing and the ID values are hardcoded into the VM instructions.

Say you run Ruby code as follows:

method = :add
foo.send(method, bar)

Ruby will parse this code, and for method, add, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, or create a new ID value to associate with the identifier if it does not exist. This approach is slightly slower as Ruby will create a local variable and there is additional indirection as send has to look up the method to call dynamically. However, there are no calls at runtime to look up an ID value.

Say you run Ruby code as follows:

method = "add"
foo.send(method, bar)

Ruby will parse this code, and for method, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, also creating the ID if it doesn't exist. However, during parsing, Ruby does not create an ID value for add because it is a string and not a symbol. However, when send is called at runtime, method is a string value, and send needs a symbol. So, Ruby will dynamically look up and see whether there is an ID associated with the add identifier, raising a NoMethodError if it does not exist. This ID lookup will happen every time the send method is called, making this code even slower.

So, while it looks like symbols and strings are as interchangable as the method argument to send, this is only because Ruby tries to be friendly to the programmer and accept either. The send method needs to work with an ID, and it is better for performance to use a symbol, which is Ruby's representation of an ID, as opposed to a string, which Ruby must perform substantial work on to convert to an ID.

This not only affects Kernel#send but also affects most similar methods where identifiers are passed dynamically, such as Module#define_method, Kernel#instance_variable_get, and Module#const_get. The general principle when using these methods in Ruby code is always to pass symbols to them, since it results in better performance.

The previous examples show that when Ruby needs a symbol, it will often accept a string and convert it for the programmer's convenience. This allows strings to be treated as symbols in certain cases. There are opposite cases, where Ruby allows symbols to be treated as strings for the programmer's convenience.

For example, while symbols represent integers attached to a series of characters or bytes, Ruby allows you to perform operations on symbols such as <, >, and <=>, as if they were strings, where the result does not depend on the symbol's integer value, but on the string value of the name attached to the symbol. Again, this is Ruby doing so for the programmer's convenience. For example, consider the following line of code:

object.methods.sort

This results in a list sorted by the name of the method, since that is the most useful for the programmer. In this case, Ruby needs to operate on the string value of the symbol, which has similar performance issues as when Ruby needs to convert a string to a symbol internally.

There are many other methods on Symbol that operate on the internal string associated with the symbol. Some methods, such as downcase, upcase, and capitalize, return a symbol by internally operating on the string associated with the symbol, and then converting the resulting value back to a symbol. For example, symbol.downcase basically does symbol.to_s.downcase.to_sym. Other methods, such as [], size, and match, operate on the string associated with the symbol, such as symbol.size being shorthand for symbol.to_s.size.

In all of these cases, it is possible to determine what Ruby natively wants. If Ruby needs an internal identifier, it will natively want a symbol, and only accept a string by converting it. If Ruby needs to operate on text, it will natively want a string, and only accept a symbol by converting it.

So, how does the difference between a symbol and string affect your code? The general principle is to be like Ruby, and use symbols when you need an identifier in your code, and strings when you need text or data. For example, if you need to accept a configuration value that can only be one of three options, it's probably best to use a symbol:

def switch(value)
  case value
  when :foo
    # foo
  when :bar
    # bar
  when :baz
    # baz
  end
end

However, if you are dealing with text or data, you should accept a string and not a symbol:

def append2(value)
  value.gsub(/foo/, "bar")
end

You should consider whether you want to be as flexible as many Ruby core methods, and automatically convert a string to a symbol or vice versa. If you are internally treating symbols and strings differently, you should definitely not perform automatic conversion. However, if you are only dealing with one of the types, then you have to decide how to handle it. Automatically converting the type is worse for performance, and results in less flexible internals, since you need to keep supporting both types for backward compatibility. Not automatically converting the type is better for performance, and results in more flexible internals, since you are not obligated to support both types. However, it means that users of your code will probably get errors if they pass in a type that is not expected. Therefore, it is important to understand the trade-off inherent in the decision of whether to convert both types. If you aren't sure which trade-off is better, start by not automatically converting, since you can always add automatic conversion later if needed.

In this section, you learned the important difference between symbols and strings, and when it is best to use each. In the next section, you'll learn how best to use Ruby's core collection classes.

You're reading from Polished Ruby Programming Build better software with more intuitive, maintainable, scalable, and high-performance Ruby code

Table of Contents (23) Chapters

Understanding how symbols differ from strings

Authors (1)

Other recommended products

Personalised recommendations for you