Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Polished Ruby Programming

You're reading from   Polished Ruby Programming Build better software with more intuitive, maintainable, scalable, and high-performance Ruby code

Arrow left icon
Product type Paperback
Published in Jul 2021
Publisher Packt
ISBN-13 9781801072724
Length 434 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Jeremy Evans Jeremy Evans
Author Profile Icon Jeremy Evans
Jeremy Evans
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Section 1: Fundamental Ruby Programming Principles
2. Chapter 1: Getting the Most out of Core Classes FREE CHAPTER 3. Chapter 2: Designing Useful Custom Classes 4. Chapter 3: Proper Variable Usage 5. Chapter 4: Methods and Their Arguments 6. Chapter 5: Handling Errors 7. Chapter 6: Formatting Code for Easy Reading 8. Section 2: Ruby Library Programming Principles
9. Chapter 7: Designing Your Library 10. Chapter 8: Designing for Extensibility 11. Chapter 9: Metaprogramming and When to Use It 12. Chapter 10: Designing Useful Domain-Specific Languages 13. Chapter 11: Testing to Ensure Your Code Works 14. Chapter 12: Handling Change 15. Chapter 13: Using Common Design Patterns 16. Chapter 14: Optimizing Your Library 17. Section 3: Ruby Web Programming Principles
18. Chapter 15: The Database Is Key 19. Chapter 16: Web Application Design Principles 20. Chapter 17: Robust Web Application Security 21. Assessments 22. Other Books You May Enjoy

Learning how best to use arrays, hashes, and sets

Ruby's collection classes are one of the reasons why it is such a joy to program in Ruby. In most cases, the choice of collection class to use is fairly straightforward. If you need a simple list of values that you are iterating over, or using the collection as a queue or a stack, you generally use an array. If you need a mapping of one or more objects to one or more objects, then you generally use a hash. If you have a large list of objects and want to see whether a given object is contained in it, you generally use a set.

In some cases, it's fine to use either an array or a hash. Often, when iterating over a small list, you could use the array approach:

[[:foo, 1], [:bar, 3], [:baz, 7]].each do |sym, i|
  # ...
end

Or, you could use the hash approach:

{foo: 1, bar: 3, baz: 7}.each do |sym, i|
  # ...
end

Since you are not indexing into the collection, the simpler approach from a design perspective is to use an array. However, because the hash approach is syntactically simpler, the idiomatic way to handle this in Ruby is to use a hash.

For more complex mapping cases, you often want to use a hash, but you may need to decide how to structure the hash. This is especially true when you are using complex keys. Let's take a deeper look at the differences between arrays, hashes, and sets by working through an example that implements an in-memory database.

Implementing an in-memory database

While many programmers often use a SQL database for data storage, there are many cases when you need to build a small, in-memory database using arrays, hashes, and sets. Often, even when you have the main data stored in a SQL database, it is faster to query the SQL database to retrieve the information, and use that to build an in-memory database for the specific class or method you are designing. This allows you to query the in-memory database with similar speed as a hash or array lookup, orders of magnitude faster than a SQL database query.

Let's say you have a list of album names, track numbers, and artist names, where you can have multiple artists for the same album and track. You want to design a simple lookup system so that given an album name, you can find all artists who worked on any track of the album, and given an album name and track number, you can find the artists who worked on that particular track.

In the following examples, you should assume that album_infos is an arbitrary object that has each method that yields the album name, track number, and artist. However, if you would like to have some sample data to work with:

album_infos = 100.times.flat_map do |i|
  10.times.map do |j|
    ["Album #{i}", j, "Artist #{j}"]
  end
end

One approach for handling this is to populate two hashes, one keyed by album name, and one keyed by an array of the album name and track number. Populating these two hashes is straightforward, by setting the value for the key to an empty array if the key doesn't exist, and then appending the artist name. Then you need to make sure the artist values are unique for the hash keyed just by album name:

album_artists = {}
album_track_artists = {}
album_infos.each do |album, track, artist|
  (album_artists[album] ||= []) << artist
  (album_track_artists[[album, track]] ||= []) << artist
end
album_artists.each_value(&:uniq!)

With this approach, looking up values is fairly straightforward, and just involves looking in the appropriate hash with the appropriate key:

lookup = ->(album, track=nil) do
  if track
    album_track_artists[[album, track]]
  else
    album_artists[album]
  end
end

An alternative approach would be to use a nested hash approach, with each album having a hash of tracks:

albums = {}
album_infos.each do |album, track, artist|
  ((albums[album] ||= {})[track] ||= []) << artist
end

With this approach, looking up values is more complex, especially in the case where a track number is not provided, and you have to dynamically create the list:

lookup = ->(album, track=nil) do
  if track
    albums.dig(album, track)
  else
    a = albums[album].each_value.to_a
    a.flatten!
    a.uniq!
    a
  end
end

In general, the first approach using multiple hashes is going to take significantly more memory than the second approach if there is a large number of albums, but it will have a much better lookup performance for albums. The first approach will also take much more time to populate the data structure. The second approach is much lighter on memory and has better lookup performance for albums with tracks as it avoids an array allocation, but will exhibit a far more inferior performance for albums.

Each of these approaches does not depend on the types of objects that album_infos.each yields. You probably made the reasonable assumption that album and artist would be strings, and track would be a number. Let's say you knew in advance that the track number was an integer between 1 and 99. You could use that information to design a different approach. You could still have a single of hash keyed by album name, with a value being an array containing arrays of artist names for each track. Since tracks only go from 1 to 99, you could use the 0 index in the array to store all artist names for the album. Populating this combination of hash and array of arrays isn't too difficult:

albums = {}
album_infos.each do |album, track, artist|
   album_array = albums[album] ||= [[]]
   album_array[0] << artist
   (album_array[track] ||= []) << artist  
end
albums.each_value do |array|
  array[0].uniq!
end

This approach is more memory-efficient than either of the previous approaches, and looking up values is very simple and never allocates an object:

lookup = ->(album, track=0) do
  albums.dig(album, track)
end

Compared to the previous two approaches, this approach uses about the same amount of memory as the nested hash approach. It takes slightly more time to populate compared to the nested hash approach. It is almost as fast as the two hash approach in terms of lookup performance for albums, and is the fastest approach for lookup performance by albums with tracks.

Maybe the needs of your application change, and now you need a feature that allows users to enter a list of artist names, and will return an array with only the artist names that the application knows are on one of the albums. One way to handle this is to store the artists in an array:

album_artists = album_infos.flat_map(&:last)
album_artists.uniq!

The lookup can use an array intersection to determine the values:

lookup = ->(artists) do
  album_artists & artists 
end

The problem with this approach is that Array#& uses a linear search of the array, so this approach is very slow for a large number of artists.

A better performing approach would use a hash, keyed by the artist name:

album_artists = {}
album_infos.each do |_, _, artist|
  album_artists[artist] ||= true
end

The lookup can use the hash to filter the values in the submitted array:

lookup = ->(artists) do
  artists.select do |artist|
    album_artists[artist]
  end
end

This approach performs much better. The code isn't as simple, though it isn't too bad. However, it would be nicer to have simpler code that performed as well. Thankfully, the Ruby Set class can meet this need. Like BigDecimal, Set is not currently a core Ruby class. Set is in the standard library, and you can load it via require 'set'. However, Set may be moved from the standard library to a core class in a future version of Ruby. Using a set is pretty much as simple as using an array in terms of populating the data structure:

album_artists = Set.new(album_infos.flat_map(&:last))

You don't need to manually make the array unique, because the set automatically ignores duplicate values. The lookup code can stay exactly the same as the array case:

lookup = ->(artists) do
  album_artists & artists 
end

Of the three approaches, the hash approach is the fastest to populate and the fastest to look up. The Set approach is much faster to look up than the array approach, but still significantly slower than hash. Set is actually implemented using a hash internally, so in general, it will perform worse than using a hash directly. As a general rule, you should only use a set for code that isn't performance-sensitive and you would like to use a nicer API. For any performance-sensitive code, you should prefer using a hash directly.

In this section, you learned about Ruby's core collection of classes, arrays, hashes, and sets. In the next section, you'll learn about Struct, one of Ruby's underappreciated core classes.

You have been reading a chapter from
Polished Ruby Programming
Published in: Jul 2021
Publisher: Packt
ISBN-13: 9781801072724
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image