String
Tips
Ask about input character set and case sensitivity. Usually the characters are limited to lowercase Latin characters, for example a to z.
When you need to compare strings where the order isn’t important (like anagram), you may consider using a HashMap as a counter. If your language has a built-in Counter class like Python, ask to use that instead.
If you need to keep a counter of characters, a common mistake is to say that the space complexity required for the counter is O(n). The space required for a counter is O(1) not O(n). This is because the upper bound is the range of characters, which is usually a fixed constant of 26. The input set is just lowercase Latin characters.
Common data structures for looking up strings efficiently are
Common string algorithms are
- Rabin Karp for efficient searching of substring using a rolling hash
- KMP for efficient searching of substring
Strings are sequences
A string is a sequence of characters. Many tips that apply to arrays also apply to strings.
Are there duplicate characters in the string, would it affect the answer?
When using an index to iterate through characters, be careful not to go out of bounds.
Be mindful about slicing or concatenating strings in your code. Typically, slicing and concatenating strings require O(n) time. Use start and end indices to demarcate a substring where possible.
Sometimes you can traverse the string from the right rather than from the left.
Master the sliding window technique that applies to many substring problems.
When you are given two strings to process, it is common to have one index per string (pointer) to traverse/compare the both of them. For example, we use the same approach to merge two sorted arrays.
Common question topics
Many string questions fall into one of these buckets.
Non-repeating Characters
- Use a 26-bit bitmask to indicate which lower case latin characters are inside the string.
To determine if two strings have common characters, perform & on the two bitmasks. If the result is non-zero, mask_a & mask_b > 0
, then the two strings have common characters.
Anagram
An anagram is word switch or word play. It is the result of re-arranging the letters of a word or phrase to produce a new word or phrase, while using all the original letters only once. In interviews, usually we are only bothered with words without spaces in them.
To determine if two strings are anagrams, there are a few plausible approaches:
- Sorting both strings should produce the same resulting string. This takes O(nlgn) time and O(lgn) space.
- If we map each character to a prime number and we multiply each mapped number together, anagrams should have the same multiple (prime factor decomposition). This takes O(n) time and O(1) space.
- Frequency counting of characters will help to determine if two strings are anagrams. This also takes O(n) time and O(1) space.
Palindrome
A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward as forward, such as madam or racecar.
Here are ways to determine if a string is a palindrome:
- Reverse the string and it should be equal to itself.
- Have two pointers at the start and end of the string. Move the pointers inward till they meet. At any point in time, the characters at both pointers should match.
The order of characters within the string matters, so HashMaps are usually not helpful.
When a question is about counting the number of palindromes, a common trick is to have two pointers that move outward, away from the middle. Note that palindromes can be even or odd length. For each middle pivot position, you need to check it twice: Once that includes the character and once without the character.
- For substrings, you can terminate early once there is no match.
- For subsequences, use dynamic programming as there are overlapping subproblems. Check out this question.
Corner cases
- Empty string
- String with 1 or 2 characters
- String with repeated characters
- Strings with only one distinct character