Solving String Challenges in Python
Python String Minatures¶
I was reading a blog post at http://beyondloom.com/blog/strings.html, where a number of string manipulation challenges were posted. They were supposed to be solved by APL
style one-liners in K, but I deciced to redo them in Python
Define the Environment¶
from collections import Counter, defaultdict
from typing import List
from itertools import permutations
from itertools import islice
from itertools import compress
from itertools import cycle
Given a string x and a character y, how many times does y occur in x?¶
def str_count(string: str)->Counter:
return Counter(string)
#end str_count
c = str_count('Australia')
c['A'], c['a'], c['b']
(1, 2, 0)
Solved by using the Counter object from the collections
package
Given a string x, is it identical when read forwards and backwards?¶
def is_palindrome(string:str)->bool:
return string == ''.join(reversed(string))
#end is_palindrome
is_palindrome('racecar'), is_palindrome('palindrame')
(True, False)
In fact the concept of reversing a string gets very complicated in dealing with general Unicode strings (some languages have different shapes for the same letter at the end, middle, or front of words, for example)
Given a string x, produce a list of characters which appear more than once in x.¶
def non_uniques2(string:str):
cnt = Counter(string)
res = [k for k in cnt if cnt[k]>1]
return res
#end non_unique
non_uniques2('applause')
['a', 'p']
Create a Counter dictionary, and iterate over all keys
Given strings x and y, do both strings contain the same letters, possibly in a different order?¶
def same_chars(str1:str, str2:str)->bool:
return sorted(str1)==sorted(str2)
#end same_chars
same_chars('tab', 'bat'), same_chars('cat', 'dog')
(True, False)
Given a string x, find all the characters which occur exactly once, in the order they appear.¶
def uniques(string:str)->List[str]:
cnt = Counter(string)
single = [s for s in string if cnt[s] == 1]
return single
#end non_unique
uniques('applause')
['l', 'u', 's', 'e']
Given strings x and y, is x a rotation of the characters in y?¶
def is_rotation(str1:str, str2:str)->bool:
if( len(str1) != len(str2)):
return False
elif( str1 == str2):
return True
elif( sorted(str1) != sorted(str2)):
return False
else:
for i in range(1, len(str1)):
if str1 ==str2[-i:] + str2[:-i] :
return True
#end if
#end for
return False
#end if
#end is_rotation
str1 = 'foobar'
str2 = 'barfoo'
is_rotation('foobar', 'barfoo'),is_rotation('cat', 'bat')
(True, False)
I use some cheap heuristic checks to short-circuit the test loop for strings that are certain to fail
Given a list of strings x, sort the strings by length, ascending.¶
def sort_by_len(str_list:List[str])->List[str]:
return sorted(str_list, key=len)
#end sort_by_len
sort_by_len(['a', 'bb', 'ccccccc', 'xxx', ''])
['', 'a', 'bb', 'xxx', 'ccccccc']
Given a string x, identify the character which occurs most frequently¶
def most_pop_char(string:str)->str:
return Counter(string).most_common(1)[0][0]
#end most_pop_char
most_pop_char('a bb ccc dddd xxxx, yyyyyyy')
'y'
Given a string x consisting of words (one or more non-space characters) which are separated by spaces, reverse the order of the characters in each word.¶
def reverse_words(string:str)->str:
words = string.split()
rev_words = [''.join(reversed(w)) for w in words]
rev_string =' '.join(rev_words)
return rev_string
#end reverse_words
reverse_words('this is a very short sentance Hurray'), reverse_words('zoop')
('siht si a yrev trohs ecnatnes yarruH', 'pooz')
Given a string x and a boolean vector y of the same length, extract the characters of x corresponding to a 1 in y.¶
def extract(string:str, mask:List[bool])->str:
res = [c for c, m in zip(string, mask) if m]
return ''.join(res)
#end extract
print(extract('abcdef', [1,1,0,0,1,0]))
print(''.join(list(compress('abcdef', [1,1,0,0,1,0]))))
abe abe
Python Collections
has method to do just this, as shown above
Given a string x and a boolean vector y, spread the characters of x to the positions of 1s in y, filling intervening characters with underscores¶
def spread2(string:str, mask:List[bool])->str:
chars = islice(string, None)
res = [ next(chars) if m else '_' for m in mask]
return ''.join(res)
#end spread2
spread2('abc', [1,0,0,1,0,0,1]), spread2('abcdef', [1,0,1,0,0,1,1, 0,0,0,0,1,0,1]),
('a__b__c', 'a_b__cd____e_f')
def spread3(string:str, mask:List[bool])->str:
chars = cycle(string)
res = [ next(chars) if m else '_' for m in mask]
return ''.join(res)
#end spread3
spread3('abc', [1,0,1,0,0, 1,1,1,1,1,0]),
('a_b__cabca_',)
We have two solutions, the second one catering for the case where the input string is shorter than the mask. We create iterators from the input string, to allow easier access from inside the list comprehension
Given a string x, remove all the vowels entirely.¶
def de_vowel(string:str)->str:
res = [c for c in string if not c in 'aeiou']
return ''.join(res)
#end de_vowel
de_vowel('cat'), de_vowel('A very short sentence')
('ct', 'A vry shrt sntnc')
Given a string x, replace all the vowels (a, e, i, o, u, or y) with underscores.¶
def de_vowel2(string:str)->str:
res = [c if not c in 'aeiouy' else '_' for c in string]
return ''.join(res)
#end de_vowel2
de_vowel2('cat'), de_vowel2('This is a short sentence')
('c_t', 'Th_s _s _ sh_rt s_nt_nc_')
Given a string x consisting of words separated by spaces (as above), and a string y, replace all words in x which are the same as y with a series of xes.¶
def hide_words(string:str, secret:str)->str:
words = string.split()
new_words = [w if not w==secret else '*'*len(w) for w in words]
new_string =' '.join(new_words)
return new_string
#end hide_words
hide_words('this is a very short sentance, this is Hurray', 'this'), hide_words('zoop', 'cat')
('**** is a very short sentance, **** is Hurray', 'zoop')
Given a string x, generate a list of all possible reorderings of the characters in x¶
def all_perms(word:str)->List[str]:
ps = permutations(word)
return [''.join(p) for p in ps]
#end all_perms
all_perms('xyz')
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']
Conclusions¶
Most of the above could be compressed even more, with the loss of some readibility. I would claim that most are very close to one-liners