Textonyms: Difference between revisions
(→{{header|Python}}: Refactored.) |
(J) |
||
Line 33:
Use a word list and keypad mapping other than English.
=={{header|J}}==
<lang J>require'regex strings web/gethttp'
strip=:dyad define
(('(?s)',x);'') rxrplc y
)
fetch=:monad define
txt=. '.*<pre>' strip '</pre>.*' strip gethttp y
cutopen tolower txt-.' '
)
keys=:noun define
2 abc
3 def
4 ghi
5 jkl
6 mno
7 pqrs
8 tuv
9 wxyz
)
reporttext=:noun define
There are #{0} words in #{1} which can be represnted by the Textonyms mapping.
They require #{2} digit combinations to represent them.
#{3} digit combinations represent Textonyms.
)
report=:dyad define
x rplc (":&.>y),.~('#{',":,'}'"_)&.>i.#y
)
textonymrpt=:dyad define
'digits letters'=. |:>;,&.>,&.>/&.>/"1 <;._1;._2 x
valid=. (#~ */@e.&letters&>) fetch y NB. ignore illegals
reps=. {&digits@(letters&i.)&.> valid NB. reps is digit seq
reporttext report (#valid);y;(#~.reps);+/(1<#)/.~reps
)</lang>
Required example:
<lang J> keys textonymrpt 'http://rosettacode.org/wiki/Textonyms/wordlist'
There are 13085 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping.
They require 11932 digit combinations to represent them.
661 digit combinations represent Textonyms.</lang>
In this example, the intermediate results in textonymrpt would look like this (just looking at the first 5 elements of the really big values:
<lang J> digits
22233344455566677778889999
letters
abcdefghijklmnopqrstuvwxyz
5{.valid
┌─┬──┬───┬───┬──┐
│a│aa│aaa│aam│ab│
└─┴──┴───┴───┴──┘
5{.reps
┌─┬──┬───┬───┬──┐
│2│22│222│226│22│
└─┴──┴───┴───┴──┘</lang>
=={{header|Perl}}==
|
Revision as of 02:24, 8 February 2015
When entering text on a phone's digital pad it is possible that a particular combination of digits corresponds to more than one word. Such are called textonyms.
Assuming the keys are as follows:
2 -> ABC 3 -> DEF 4 -> GHI 5 -> JKL 6 -> MNO 7 -> PQRS 8 -> TUV 9 -> WXYZ
The task is to write a program that finds textonyms in a list of words such as Textonyms/wordlist or [1].
The task should produce a report:
There are #{0} words in #{1} which can be represnted by the Textonyms mapping. They require #{2} digit combinations to represent them. #{3} digit combinations represent Textonyms.
Where:
#{0} is the number of words in the list which can be represnted by the Textonyms mapping. #{1} is the URL of the wordlist being used. #{2} is the number of digit combinations required to represent the words in #{0}. #{3} is the number of #{2} which represent more than one word.
At your discretion show a couple of examples of your solution displaying Textonys. e.g.
2748424767 -> "Briticisms", "criticisms"
Extra credit:
Use a word list and keypad mapping other than English.
J
<lang J>require'regex strings web/gethttp'
strip=:dyad define
(('(?s)',x);) rxrplc y
)
fetch=:monad define
txt=. '.*
' strip '
.*' strip gethttp y
cutopen tolower txt-.' '
)
keys=:noun define
2 abc 3 def 4 ghi 5 jkl 6 mno 7 pqrs 8 tuv 9 wxyz
)
reporttext=:noun define There are #{0} words in #{1} which can be represnted by the Textonyms mapping. They require #{2} digit combinations to represent them.
- {3} digit combinations represent Textonyms.
)
report=:dyad define
x rplc (":&.>y),.~('#{',":,'}'"_)&.>i.#y
)
textonymrpt=:dyad define
'digits letters'=. |:>;,&.>,&.>/&.>/"1 <;._1;._2 x valid=. (#~ */@e.&letters&>) fetch y NB. ignore illegals reps=. {&digits@(letters&i.)&.> valid NB. reps is digit seq reporttext report (#valid);y;(#~.reps);+/(1<#)/.~reps
)</lang>
Required example:
<lang J> keys textonymrpt 'http://rosettacode.org/wiki/Textonyms/wordlist' There are 13085 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping. They require 11932 digit combinations to represent them. 661 digit combinations represent Textonyms.</lang>
In this example, the intermediate results in textonymrpt would look like this (just looking at the first 5 elements of the really big values:
<lang J> digits 22233344455566677778889999
letters
abcdefghijklmnopqrstuvwxyz
5{.valid
┌─┬──┬───┬───┬──┐ │a│aa│aaa│aam│ab│ └─┴──┴───┴───┴──┘
5{.reps
┌─┬──┬───┬───┬──┐ │2│22│222│226│22│ └─┴──┴───┴───┴──┘</lang>
Perl
This uses a file named "words.txt" as the dictionary. <lang perl>sub find { my @m = qw/$ $ abc def ghi jkl mno pqrs tvu wxyz/; (my $r = shift) =~ s{(\d)}{[$m[$1]]}g; grep /^$r$/i, split ' ', `cat words.txt`; # cats don't run on windows }
print join("\n", $_, find($_)), "\n\n" for @ARGV</lang> Usage:
./textnym.pl 7353284667 7361672 7353284667 rejections selections 736672 senora
Incidentially, it sort of supports wildcards:
./textnym.pl '5432.*' 5432.* liechtenstein
Python
<lang python>from collections import defaultdict import urllib.request
CH2NUM = {ch: str(num) for num, chars in enumerate('abc def ghi jkl mno pqrs tuv wxyz'.split(), 2) for ch in chars} URL = 'http://www.puzzlers.org/pub/wordlists/unixdict.txt'
def getwords(url):
return urllib.request.urlopen(url).read().decode("utf-8").lower().split()
def mapnum2words(words):
number2words = defaultdict(list) reject = 0 for word in words: try: number2words[.join(CH2NUM[ch] for ch in word)].append(word) except KeyError: # Reject words with non a-z e.g. '10th' reject += 1 return dict(number2words), reject
def interactiveconversions():
global inp, ch, num while True: inp = input("\nType a number or a word to get the translation and textonyms: ").strip().lower() if inp: if all(ch in '23456789' for ch in inp): if inp in num2words: print(" Number {0} has the following textonyms in the dictionary: {1}".format(inp, ', '.join( num2words[inp]))) else: print(" Number {0} has no textonyms in the dictionary.".format(inp)) elif all(ch in CH2NUM for ch in inp): num = .join(CH2NUM[ch] for ch in inp) print(" Word {0} is{1} in the dictionary and is number {2} with textonyms: {3}".format( inp, ( if inp in wordset else "n't"), num, ', '.join(num2words[num]))) else: print(" I don't understand %r" % inp) else: print("Thank you") break
if __name__ == '__main__':
words = getwords(URL) print("Read %i words from %r" % (len(words), URL)) wordset = set(words) num2words, reject = mapnum2words(words) morethan1word = sum(1 for w in num2words if len(w) > 1) maxwordpernum = max(len(values) for values in num2words.values()) print("""
There are {0} words in {1} which can be represnted by the Textonyms mapping. They require {2} digit combinations to represent them. {3} digit combinations represent Textonyms.\ """.format(len(words) - reject, URL, len(num2words), morethan1word))
print("\nThe numbers mapping to the most words map to %i words each:" % maxwordpernum) maxwpn = sorted((key, val) for key, val in num2words.items() if len(val) == maxwordpernum) for num, wrds in maxwpn: print(" %s maps to: %s" % (num, ', '.join(wrds)))
interactiveconversions()</lang>
- Output:
Read 25104 words from 'http://www.puzzlers.org/pub/wordlists/unixdict.txt' There are 24978 words in http://www.puzzlers.org/pub/wordlists/unixdict.txt which can be represnted by the Textonyms mapping. They require 22903 digit combinations to represent them. 22895 digit combinations represent Textonyms. The numbers mapping to the most words map to 9 words each: 269 maps to: amy, any, bmw, bow, box, boy, cow, cox, coy 729 maps to: paw, pax, pay, paz, raw, ray, saw, sax, say Type a number or a word to get the translation and textonyms: rosetta Word rosetta is in the dictionary and is number 7673882 with textonyms: rosetta Type a number or a word to get the translation and textonyms: code Word code is in the dictionary and is number 2633 with textonyms: bode, code, coed Type a number or a word to get the translation and textonyms: 2468 Number 2468 has the following textonyms in the dictionary: ainu, chou Type a number or a word to get the translation and textonyms: 3579 Number 3579 has no textonyms in the dictionary. Type a number or a word to get the translation and textonyms: Thank you
Ruby
<lang ruby> Textonyms = Hash.new {|n, g| n[g] = []} File.open("Textonyms.txt") do |file|
file.each_line {|line| Textonyms[(n=line.chomp).gsub(/a|b|c|A|B|C/, '2').gsub(/d|e|f|D|E|F/, '3').gsub(/g|h|i|G|H|I/, '4').gsub(/p|q|r|s|P|Q|R|S/, '7') .gsub(/j|k|l|J|K|L/, '5').gsub(/m|n|o|M|N|O/, '6').gsub(/t|u|v|T|U|V/, '8').gsub(/w|x|y|z|W|X|Y|Z/, '9')] += [n] }
end </lang>
- Output:
puts "There are #{Textonyms.inject(0){|n,g| n+g[1].length}} words in #{"http://rosettacode.org/wiki/Textonyms/wordlist"} which can be represnted by the Textonyms mapping." puts "They require #{Textonyms.length} digit combinations to represent them." There are 132916 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping. They require 117868 digit combinations to represent them.
puts Textonymes["7353284667"] rejections selections
puts Textonymes["736672"] remora senora