Textonyms: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|Python}}: Refactored.)
(J)
Line 33: Line 33:


Use a word list and keypad mapping other than English.
Use a word list and keypad mapping other than English.

=={{header|J}}==

<lang J>require'regex strings web/gethttp'

strip=:dyad define
(('(?s)',x);'') rxrplc y
)

fetch=:monad define
txt=. '.*<pre>' strip '</pre>.*' strip gethttp y
cutopen tolower txt-.' '
)

keys=:noun define
2 abc
3 def
4 ghi
5 jkl
6 mno
7 pqrs
8 tuv
9 wxyz
)

reporttext=:noun define
There are #{0} words in #{1} which can be represnted by the Textonyms mapping.
They require #{2} digit combinations to represent them.
#{3} digit combinations represent Textonyms.
)

report=:dyad define
x rplc (":&.>y),.~('#{',":,'}'"_)&.>i.#y
)

textonymrpt=:dyad define
'digits letters'=. |:>;,&.>,&.>/&.>/"1 <;._1;._2 x
valid=. (#~ */@e.&letters&>) fetch y NB. ignore illegals
reps=. {&digits@(letters&i.)&.> valid NB. reps is digit seq
reporttext report (#valid);y;(#~.reps);+/(1<#)/.~reps
)</lang>

Required example:

<lang J> keys textonymrpt 'http://rosettacode.org/wiki/Textonyms/wordlist'
There are 13085 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping.
They require 11932 digit combinations to represent them.
661 digit combinations represent Textonyms.</lang>

In this example, the intermediate results in textonymrpt would look like this (just looking at the first 5 elements of the really big values:

<lang J> digits
22233344455566677778889999
letters
abcdefghijklmnopqrstuvwxyz
5{.valid
┌─┬──┬───┬───┬──┐
│a│aa│aaa│aam│ab│
└─┴──┴───┴───┴──┘
5{.reps
┌─┬──┬───┬───┬──┐
│2│22│222│226│22│
└─┴──┴───┴───┴──┘</lang>


=={{header|Perl}}==
=={{header|Perl}}==

Revision as of 02:24, 8 February 2015

Textonyms is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

When entering text on a phone's digital pad it is possible that a particular combination of digits corresponds to more than one word. Such are called textonyms.

Assuming the keys are as follows:

    2 -> ABC
    3 -> DEF
    4 -> GHI
    5 -> JKL
    6 -> MNO
    7 -> PQRS
    8 -> TUV
    9 -> WXYZ  

The task is to write a program that finds textonyms in a list of words such as Textonyms/wordlist or [1].

The task should produce a report:

There are #{0} words in #{1} which can be represnted by the Textonyms mapping.
They require #{2} digit combinations to represent them.
#{3} digit combinations represent Textonyms.

Where:

#{0} is the number of words in the list which can be represnted by the Textonyms mapping.
#{1} is the URL of the wordlist being used.
#{2} is the number of digit combinations required to represent the words in #{0}.
#{3} is the number of #{2} which represent more than one word.

At your discretion show a couple of examples of your solution displaying Textonys. e.g.

 2748424767 -> "Briticisms", "criticisms"

Extra credit:

Use a word list and keypad mapping other than English.

J

<lang J>require'regex strings web/gethttp'

strip=:dyad define

 (('(?s)',x);) rxrplc y

)

fetch=:monad define

txt=. '.*

' strip '

.*' strip gethttp y

 cutopen tolower txt-.' '

)

keys=:noun define

2 abc
3 def
4 ghi
5 jkl
6 mno
7 pqrs
8 tuv
9 wxyz

)

reporttext=:noun define There are #{0} words in #{1} which can be represnted by the Textonyms mapping. They require #{2} digit combinations to represent them.

  1. {3} digit combinations represent Textonyms.

)

report=:dyad define

 x rplc (":&.>y),.~('#{',":,'}'"_)&.>i.#y

)

textonymrpt=:dyad define

 'digits letters'=. |:>;,&.>,&.>/&.>/"1 <;._1;._2 x
 valid=. (#~ */@e.&letters&>) fetch y NB. ignore illegals
 reps=. {&digits@(letters&i.)&.> valid NB. reps is digit seq
 reporttext report (#valid);y;(#~.reps);+/(1<#)/.~reps

)</lang>

Required example:

<lang J> keys textonymrpt 'http://rosettacode.org/wiki/Textonyms/wordlist' There are 13085 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping. They require 11932 digit combinations to represent them. 661 digit combinations represent Textonyms.</lang>

In this example, the intermediate results in textonymrpt would look like this (just looking at the first 5 elements of the really big values:

<lang J> digits 22233344455566677778889999

  letters

abcdefghijklmnopqrstuvwxyz

  5{.valid

┌─┬──┬───┬───┬──┐ │a│aa│aaa│aam│ab│ └─┴──┴───┴───┴──┘

  5{.reps

┌─┬──┬───┬───┬──┐ │2│22│222│226│22│ └─┴──┴───┴───┴──┘</lang>

Perl

This uses a file named "words.txt" as the dictionary. <lang perl>sub find { my @m = qw/$ $ abc def ghi jkl mno pqrs tvu wxyz/; (my $r = shift) =~ s{(\d)}{[$m[$1]]}g; grep /^$r$/i, split ' ', `cat words.txt`; # cats don't run on windows }

print join("\n", $_, find($_)), "\n\n" for @ARGV</lang> Usage:

./textnym.pl 7353284667 7361672
7353284667
rejections
selections

736672
senora

Incidentially, it sort of supports wildcards:

./textnym.pl '5432.*'
5432.*
liechtenstein

Python

<lang python>from collections import defaultdict import urllib.request

CH2NUM = {ch: str(num) for num, chars in enumerate('abc def ghi jkl mno pqrs tuv wxyz'.split(), 2) for ch in chars} URL = 'http://www.puzzlers.org/pub/wordlists/unixdict.txt'


def getwords(url):

return urllib.request.urlopen(url).read().decode("utf-8").lower().split()

def mapnum2words(words):

   number2words = defaultdict(list)
   reject = 0
   for word in words:
       try:
           number2words[.join(CH2NUM[ch] for ch in word)].append(word)
       except KeyError:
           # Reject words with non a-z e.g. '10th'
           reject += 1
   return dict(number2words), reject

def interactiveconversions():

   global inp, ch, num
   while True:
       inp = input("\nType a number or a word to get the translation and textonyms: ").strip().lower()
       if inp:
           if all(ch in '23456789' for ch in inp):
               if inp in num2words:
                   print("  Number {0} has the following textonyms in the dictionary: {1}".format(inp, ', '.join(
                       num2words[inp])))
               else:
                   print("  Number {0} has no textonyms in the dictionary.".format(inp))
           elif all(ch in CH2NUM for ch in inp):
               num = .join(CH2NUM[ch] for ch in inp)
               print("  Word {0} is{1} in the dictionary and is number {2} with textonyms: {3}".format(
                   inp, ( if inp in wordset else "n't"), num, ', '.join(num2words[num])))
           else:
               print("  I don't understand %r" % inp)
       else:
           print("Thank you")
           break


if __name__ == '__main__':

   words = getwords(URL)
   print("Read %i words from %r" % (len(words), URL))
   wordset = set(words)
   num2words, reject = mapnum2words(words)
   morethan1word = sum(1 for w in num2words if len(w) > 1)
   maxwordpernum = max(len(values) for values in num2words.values())
   print("""

There are {0} words in {1} which can be represnted by the Textonyms mapping. They require {2} digit combinations to represent them. {3} digit combinations represent Textonyms.\ """.format(len(words) - reject, URL, len(num2words), morethan1word))

   print("\nThe numbers mapping to the most words map to %i words each:" % maxwordpernum)
   maxwpn = sorted((key, val) for key, val in num2words.items() if len(val) == maxwordpernum)
   for num, wrds in maxwpn:
       print("  %s maps to: %s" % (num, ', '.join(wrds)))
   interactiveconversions()</lang>
Output:
Read 25104 words from 'http://www.puzzlers.org/pub/wordlists/unixdict.txt'

There are 24978 words in http://www.puzzlers.org/pub/wordlists/unixdict.txt which can be represnted by the Textonyms mapping.
They require 22903 digit combinations to represent them.
22895 digit combinations represent Textonyms.

The numbers mapping to the most words map to 9 words each:
  269 maps to: amy, any, bmw, bow, box, boy, cow, cox, coy
  729 maps to: paw, pax, pay, paz, raw, ray, saw, sax, say

Type a number or a word to get the translation and textonyms: rosetta
  Word rosetta is in the dictionary and is number 7673882 with textonyms: rosetta

Type a number or a word to get the translation and textonyms: code
  Word code is in the dictionary and is number 2633 with textonyms: bode, code, coed

Type a number or a word to get the translation and textonyms: 2468
  Number 2468 has the following textonyms in the dictionary: ainu, chou

Type a number or a word to get the translation and textonyms: 3579
  Number 3579 has no textonyms in the dictionary.

Type a number or a word to get the translation and textonyms: 
Thank you

Ruby

<lang ruby> Textonyms = Hash.new {|n, g| n[g] = []} File.open("Textonyms.txt") do |file|

 file.each_line {|line|
   Textonyms[(n=line.chomp).gsub(/a|b|c|A|B|C/, '2').gsub(/d|e|f|D|E|F/, '3').gsub(/g|h|i|G|H|I/, '4').gsub(/p|q|r|s|P|Q|R|S/, '7')
                    .gsub(/j|k|l|J|K|L/, '5').gsub(/m|n|o|M|N|O/, '6').gsub(/t|u|v|T|U|V/, '8').gsub(/w|x|y|z|W|X|Y|Z/, '9')] += [n]
 }

end </lang>

Output:
puts "There are #{Textonyms.inject(0){|n,g| n+g[1].length}} words in #{"http://rosettacode.org/wiki/Textonyms/wordlist"} which can be represnted by the Textonyms mapping."
puts "They require #{Textonyms.length} digit combinations to represent them."

There are 132916 words in http://rosettacode.org/wiki/Textonyms/wordlist which can be represnted by the Textonyms mapping.
They require 117868 digit combinations to represent them.
puts Textonymes["7353284667"]

rejections
selections
puts Textonymes["736672"]

remora
senora