A Common-Sense Guide to Data Structures and Algorithms

A Common-Sense Guide to Data Structures and Algorithms

Level Up Your Core Programming Skills

Jay Wengrow

Although a hash table with one hundred cells is great for avoiding collisions, we’d be using up one hundred cells to store just five pieces of data, and that’s a poor use of memory. This is the balancing act that a hash table must perform. A good hash table strikes a balance of avoiding collisions while not consuming lots of memory. To accomplish this, computer scientists have developed the following rule of thumb: for every seven data elements stored in a hash table, it should have ten cells. So, if you’re planning on storing fourteen elements, you’d want to have twenty available cells, and so on and so forth. This ratio of data to cells is called the load factor. Using this terminology, we’d say that the ideal load factor is 0.7 (7 elements / 10 cells). If you initially stored seven pieces of data in a hash, the computer might allocate a hash with ten cells. When you begin to add more data, though, the computer will expand the hash by adding more cells and changing the hash function so that the new data will be distributed evenly across the new cells. Luckily, most of the internals of a hash table are managed by the computer language you’re using. It decides how big the hash table needs to be, what hash function to use, and when it’s time to expand the hash table.
1136