Count occurrences of a substring: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|C++}}: Added C++ implementation)
m (Alphabetized)
Line 11: Line 11:
The matching should yield the highest number of non-overlapping matches. In general, this essentially means matching from left-to-right or right-to-left (see proof on talk page).
The matching should yield the highest number of non-overlapping matches. In general, this essentially means matching from left-to-right or right-to-left (see proof on talk page).


=={{header|ALGOL 68}}==
{{works with|ALGOL 68|Revision 1 - no extensions to language used.}}
{{works with|ALGOL 68G|Any - tested with release [http://sourceforge.net/projects/algol68/files/algol68g/algol68g-1.18.0/algol68g-1.18.0-9h.tiny.el5.centos.fc11.i386.rpm/download 1.18.0-9h.tiny].}}
{{wont work with|ELLA ALGOL 68|Any (with appropriate job cards) - tested with release [http://sourceforge.net/projects/algol68/files/algol68toc/algol68toc-1.8.8d/algol68toc-1.8-8d.fc9.i386.rpm/download 1.8-8d] - due to extensive use of '''format'''[ted] ''transput''.}}
Algol68 has no build in function to do this task, hence the next to create a ''count string in string'' routine.
<lang algol68>#!/usr/local/bin/a68g --script #

PROC count string in string = (STRING needle, haystack)INT: (
INT start:=LWB haystack, next, out:=0;
FOR count WHILE string in string(needle, next, haystack[start:]) DO
start+:=next+UPB needle-LWB needle;
out:=count
OD;
out
);

printf(($d" "$,
count string in string("th", "the three truths"), # expect 3 #
count string in string("abab", "ababababab"), # expect 2 #
count string in string("a*b", "abaabba*bbaba*bbab"), # expect 2 #
$l$
))</lang>
Output:
<pre>
3 2 2
</pre>
=={{header|C++}}==
=={{header|C++}}==
<lang cpp>#include <iostream>
<lang cpp>#include <iostream>
Line 42: Line 68:
</pre>
</pre>


=={{header|ALGOL 68}}==
{{works with|ALGOL 68|Revision 1 - no extensions to language used.}}
{{works with|ALGOL 68G|Any - tested with release [http://sourceforge.net/projects/algol68/files/algol68g/algol68g-1.18.0/algol68g-1.18.0-9h.tiny.el5.centos.fc11.i386.rpm/download 1.18.0-9h.tiny].}}
{{wont work with|ELLA ALGOL 68|Any (with appropriate job cards) - tested with release [http://sourceforge.net/projects/algol68/files/algol68toc/algol68toc-1.8.8d/algol68toc-1.8-8d.fc9.i386.rpm/download 1.8-8d] - due to extensive use of '''format'''[ted] ''transput''.}}
Algol68 has no build in function to do this task, hence the next to create a ''count string in string'' routine.
<lang algol68>#!/usr/local/bin/a68g --script #

PROC count string in string = (STRING needle, haystack)INT: (
INT start:=LWB haystack, next, out:=0;
FOR count WHILE string in string(needle, next, haystack[start:]) DO
start+:=next+UPB needle-LWB needle;
out:=count
OD;
out
);

printf(($d" "$,
count string in string("th", "the three truths"), # expect 3 #
count string in string("abab", "ababababab"), # expect 2 #
count string in string("a*b", "abaabba*bbaba*bbab"), # expect 2 #
$l$
))</lang>
Output:
<pre>
3 2 2
</pre>
=={{header|Java}}==
=={{header|Java}}==
This method removes the first occurrence of the substring, checks for changes, and continues if something changed. It counts each removal attempt (even if nothing is replaced), so the count starts at -1 to offset the last removal attempt where nothing changed.
This method removes the first occurrence of the substring, checks for changes, and continues if something changed. It counts each removal attempt (even if nothing is replaced), so the count starts at -1 to offset the last removal attempt where nothing changed.

Revision as of 19:43, 16 June 2011

Task
Count occurrences of a substring
You are encouraged to solve this task according to the task description, using any language you may know.

The task is to either create a function, or show a built-in function, to count the number of non-overlapping occurrences of a substring inside a string. The function should take two arguments: the first argument being the string to search and the second a substring to be search for. It should return an integer count.

<lang pseudocode>print countSubstring("the three truths","th") 3

// do not count substrings that overlap with previously-counted substrings: print countSubstring("ababababab","abab") 2</lang>

The matching should yield the highest number of non-overlapping matches. In general, this essentially means matching from left-to-right or right-to-left (see proof on talk page).

ALGOL 68

Works with: ALGOL 68 version Revision 1 - no extensions to language used.
Works with: ALGOL 68G version Any - tested with release 1.18.0-9h.tiny.

Algol68 has no build in function to do this task, hence the next to create a count string in string routine. <lang algol68>#!/usr/local/bin/a68g --script #

PROC count string in string = (STRING needle, haystack)INT: (

 INT start:=LWB haystack, next, out:=0;
 FOR count WHILE string in string(needle, next, haystack[start:]) DO
   start+:=next+UPB needle-LWB needle;
   out:=count
 OD;
 out

);

printf(($d" "$,

 count string in string("th", "the three truths"),    # expect 3 #
 count string in string("abab", "ababababab"),        # expect 2 #
 count string in string("a*b", "abaabba*bbaba*bbab"), # expect 2 #
 $l$

))</lang> Output:

3 2 2 

C++

<lang cpp>#include <iostream>

  1. include <string>

// returns count of non-overlapping occurrences of 'sub' in 'str' int countSubstring(const std::string& str, const std::string& sub) {

   if (sub.length() == 0) return 0;
   int count = 0;
   for (size_t offset = 0;
       (offset < str.length()) && ((offset = str.find(sub, offset)) != std::string::npos);
       offset += sub.length())
   {
       ++count;
   }
   return count;

}

int main() {

   std::cout << countSubstring("the three truths", "th")    << '\n';
   std::cout << countSubstring("ababababab", "abab")        << '\n';
   std::cout << countSubstring("abaabba*bbaba*bbab", "a*b") << '\n';

}</lang> Output:

3
2
2

Java

This method removes the first occurrence of the substring, checks for changes, and continues if something changed. It counts each removal attempt (even if nothing is replaced), so the count starts at -1 to offset the last removal attempt where nothing changed. <lang java>import java.util.regex.Pattern;

public class CountSubstring { public static int countSubstring(String subStr, String str){ int ans = -1;//start at -1 since we always count at least 1 replacement String newStr = str; do{ str = newStr; newStr = str.replaceFirst(Pattern.quote(subStr), "");//attempt a removal ans++;//count it }while(!str.equals(newStr)); return ans; }

public static void main(String[] args){ System.out.println(countSubstring("th", "the three truths")); System.out.println(countSubstring("abab", "ababababab")); System.out.println(countSubstring("a*b", "abaabba*bbaba*bbab")); } }</lang> Output:

3
2
2

Perl

<lang perl>sub countSubstring {

 my $str = shift;
 my $sub = quotemeta(shift);
 my $count = () = $str =~ /$sub/g;
 return $count;
  1. or return scalar( () = $str =~ /$sub/g );

}

print countSubstring("the three truths","th"), "\n"; print countSubstring("ababababab","abab"), "\n";</lang>

Perl 6

<lang perl6>sub count-substring($big,$little) { +$big.comb: /$little/ }

say count-substring("the three truths","th"); say count-substring("ababababab","abab");</lang> Note that in Perl 6 the /$little/ matches the variable literally, so there's no need to quote regex metacharacters. Also, prefix + forces numeric context in Perl 6 (it's a no-op in Perl 5). One other style point: we now tend to prefer hyphenated names over camelCase.

PureBasic

<lang PureBasic>a = CountString("the three truths","th") b = CountString("ababababab","abab")

a = 3
b = 2</lang>

Python

<lang python>>>> "the three truths".count("th") 3 >>> "ababababab".count("abab") 2</lang>

Tcl

The regular expression engine is ideal for this task, especially as the ***= prefix makes it interpret the rest of the argument as a literal string to match: <lang tcl>proc countSubstrings {haystack needle} {

   regexp -all ***=$needle $haystack

} puts [countSubstrings "the three truths" "th"] puts [countSubstrings "ababababab" "abab"] puts [countSubstrings "abaabba*bbaba*bbab" "a*b"]</lang>

Output:

3
2
2