Substring

From Rosetta Code
Revision as of 00:52, 10 August 2009 by 76.173.203.32 (talk) (→‎{{header|C}}: there is no substr() in c; re-implemented to use only one pass of O(n+m))
Task
Substring
You are encouraged to solve this task according to the task description, using any language you may know.

Basic Data Operation
This is a basic data operation. It represents a fundamental action on a basic data type.

You may see other such operations in the Basic Data Operations category, or:

Integer Operations
Arithmetic | Comparison

Boolean Operations
Bitwise | Logical

String Operations
Concatenation | Interpolation | Comparison | Matching

Memory Operations
Pointers & references | Addresses

In this task display a substring:

  • starting from n characters in and of m length;
  • starting from n characters in, up to the end of the string;
  • whole string minus last character;
  • starting from a known character within the string and of m length;
  • starting from a known substring within the string and of m length.

Ada

This example may be incorrect.
The first output needs to be checked.
Please verify it and remove this message. If the example does not match the requirements or does not work, replace this message with Template:incorrect or fix the code yourself.

<lang Ada>with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Fixed; use Ada.Strings.Fixed;

procedure Test_Slices is

  Str : constant String := "abcdefgh";
  N : constant := 2;
  M : constant := 3;

begin

  Put_Line (Str (N..(N + M)));
  Put_Line (Str (N..Str'Last));
  Put_Line (Str (Str'First..Str'Last - 1));
  Put_Line (Head (Tail (Str, Str'Last - Index (Str, "d", 1)), M));
  Put_Line (Head (Tail (Str, Str'Last - Index (Str, "de", 1) - 1), M));

end Test_Slices;</lang> Sample output:

bcd
bcdefgh
abcdefg
efg
fgh

C

<lang c>#include <stdio.h>

  1. include <stdlib.h>
  2. include <string.h>

char *substring(const char *s, int n, int m) {

 char *result;
 /* n < 0 or m < 0 is invalid */
 if (n < 0 || m < 0)
   return NULL;
 /* make sure string does not end before n
  * and advance the "s" pointer to beginning of substring */
 for ( ; n > 0; s++, n--)
   if (*s == '\0')
     /* string ends before n: invalid */
     return NULL;
 result = malloc(m+1);
 result[0] = '\0';
 strncat(result, s, m); /* strncat() will automatically add null terminator
                         * if string ends early or after reading m characters */
 return result;

}

char *str_wholeless1(const char *s) {

 int slen = strlen(s);
 return substring(s, 0, slen-1);

}

char *str_fromch(const char *s, int ch, int m) {

 return substring(s, strchr(s, ch) - s, m);

}

char *str_fromstr(const char *s, char *in, int m) {

 return substring(s, strstr(s, in) - s , m);

}</lang>


<lang c>#define TEST(A) do { \

   const char *r = (A);	      \
   printf("%s\n", r);	      \
   free(r);     \
 } while(0)

int main() {

 const char *s = "hello world shortest program";
 TEST( substring(s, 12, 5) );      // get "short"
 TEST( substring(s, 6, -1) );      // get "world shortest program"
 TEST( str_wholeless1(s) );        // "... progra"
 TEST( str_fromch(s, 'w', 5) );    // "world"
 TEST( str_fromstr(s, "ro", 3) ); // "rog"
 return 0;

}</lang>

Common Lisp

<lang lisp>(let ((string "0123456789")

     (n 2)
     (m 3)
     (start #\5)
     (substring "34"))
 (list (subseq string n (+ n m))
       (subseq string n)
       (subseq string 0 (1- (length string)))
       (let ((pos (position start string)))
         (subseq string pos (+ pos m)))
       (let ((pos (search substring string)))
         (subseq string pos (+ pos m)))))</lang>

E

<lang e>def string := "aardvarks" def n := 4 def m := 4 println(string(n, n + m)) println(string(n)) println(string(0, string.size() - 1)) println({string(def i := string.indexOf1('d'), i + m)}) println({string(def i := string.startOf("ard"), i + m)})</lang> Output:

vark
varks
aardvark
dvar
ardv

Forth

<lang forth> 2 constant Pos 3 constant Len

substrings
 s" abcdefgh"  ( addr len )
 over Pos + Len   cr type       \ cde
 2dup Pos /string cr type       \ cdefgh
 2dup 1-          cr type       \ abcdefg
 2dup 'd scan     Len min cr type       \ def
 s" de" search if Len min cr type then  \ def

</lang>

Java

<lang java>String x = "testing123"; System.out.println(x.substring(n, n + m)); System.out.println(x.substring(n)); System.out.println(x.substring(0, x.length() - 1)); int index1 = x.indexOf('i'); System.out.println(x.substring(index1, index1 + m)); int index2 = x.indexOf("ing"); System.out.println(x.substring(index2, index2 + m)); //indexOf methods also have an optional "from index" argument which will //make indexOf ignore characters before that index</lang>

Perl

<lang perl>my $str = 'abcdefgh'; my $n = 2; my $m = 3; print substr($str, $n, $m), "\n"; print substr($str, $n), "\n"; print substr($str, 0, -1), "\n"; print substr($str, index($str, 'd'), $m), "\n"; print substr($str, index($str, 'de'), $m), "\n";</lang>

PHP

<lang php><?php $str = 'abcdefgh'; $n = 2; $m = 3; echo substr($str, $n, $m), "\n"; echo substr($str, $n), "\n"; echo substr($str, 0, -1), "\n"; echo substr($str, strpos($str, 'd'), $m), "\n"; echo substr($str, strpos($str, 'de'), $m), "\n"; ?></lang>

Python

Python uses zero-based indexing, so the n'th character is at index n-1.

<lang python>>>> s = 'abcdefgh' >>> n, m, char, chars = 2, 3, 'd', 'cd' >>> # starting from n=2 characters in and m=3 in length; >>> s[n-1:n+m-1] 'bcd' >>> # starting from n characters in, up to the end of the string; >>> s[n-1:] 'bcdefgh' >>> # whole string minus last character; >>> s[:-1] 'abcdefg' >>> # starting from a known character char="d" within the string and of m length; >>> indx = s.index(char) >>> s[indx:indx+m] 'def' >>> # starting from a known substring chars="cd" within the string and of m length. >>> indx = s.index(chars) >>> s[indx:indx+m] 'cde' >>> </lang>

Ruby

<lang ruby>str = 'abcdefgh' n = 2 m = 3 puts str[n, m] puts str[n..-1] puts str[0..-2] puts str[str.index('d'), m] puts str[str.index('de'), m]</lang>

Smalltalk

The distinction between searching a single character or a string into another string is rather blurred. In the following code, instead of using 'w' (a string) we could use $w (a character), but it makes no difference.

<lang smalltalk>|s| s := 'hello world shortest program'.

(s copyFrom: 13 to: (13+4)) displayNl. "4 is the length (5) - 1, since we need the index of the

last char we want, which is included" 

(s copyFrom: 7) displayNl. (s allButLast) displayNl.

(s copyFrom: ((s indexOfRegex: 'w') first)

  to: ( ((s indexOfRegex: 'w') first) + 4) ) displayNl.

(s copyFrom: ((s indexOfRegex: 'ro') first)

  to: ( ((s indexOfRegex: 'ro') first) + 2) ) displayNl.</lang>

These last two examples in particular seem rather complex, so we can extend the string class.

Works with: GNU Smalltalk

<lang smalltalk>String extend [

 copyFrom: index length: nChar [
   ^ self copyFrom: index to: ( index + nChar - 1 )
 ]
 copyFromRegex: regEx length: nChar [
   |i|
   i := self indexOfRegex: regEx.
   ^ self copyFrom: (i first) length: nChar
 ]

].

"and show it simpler..."

(s copyFrom: 13 length: 5) displayNl. (s copyFromRegex: 'w' length: 5) displayNl. (s copyFromRegex: 'ro' length: 3) displayNl.</lang>

Tcl

<lang tcl>set str "abcdefgh" set n 2 set m 3

puts [string range $str $n [expr {$n+$m-1}]] puts [string range $str $n end] puts [string range $str 0 end-1]

  1. Because Tcl does substrings with a pair of indices, it is easier to express
  2. the last two parts of the task as a chained pair of [string range] operations.

puts [string range [string range $str [string first "d" $str] end] [expr {$m-1}] puts [string range [string range $str [string first "de" $str] end] [expr {$m-1}]</lang> Of course, if you were doing 'position-plus-length' a lot, it would be easier to add another subcommand to string, like this:

Works with: Tcl version 8.5

<lang tcl># Define the substring operation proc ::substring {string start length} {

   string range [string range $string $start end] 0 $length-1

}

  1. Plumb it into the language

set ops [namespace ensemble configure string -map] dict set ops substr ::substring namespace ensemble configure string -map $ops

  1. Now show off by repeating the challenge!

set str "abcdefgh" set n 2 set m 3

puts [string substr $str $n $m] puts [string range $str $n end] puts [string range $str 0 end-1] puts [string substr $str [string first "d" $str] $m] puts [string substr $str [string first "de" $str] $m]</lang>