Text between

From Rosetta Code
Revision as of 00:30, 7 January 2018 by PureFox (talk | contribs) (→‎{{header|Kotlin}}: Amended code and output following clarification of task requirements.)
Text between is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task

Get the text in a string that occurs between a start and end delimiter. Programs will be given a search string, a start delimiter string, and an end delimiter string. The delimiters will not be unset, and will not be the empty string.

The value returned should be the text in the search string that occurs between the first occurrence of the start delimiter (starting after the text of the start delimiter) and the first occurrence of the end delimiter after that.

If the start delimiter is not present in the search string, a blank string should be returned.

If the end delimiter is not present after the end of the first occurrence of the start delimiter in the search string, the remainder of the search string after that point should be returned.

There are two special values for the delimiters. If the value of the start delimiter is "start", the beginning of the search string will be matched. If the value of the end delimiter is "end", the end of the search string will be matched.

Example 1. Both delimiters set

Text: "Hello Rosetta Code world"
Start delimiter: "Hello "
End delimiter: " world"
Output: "Rosetta Code"

Example 2. Start delimiter is the start of the string

Text: "Hello Rosetta Code world"
Start delimiter: "start"
End delimiter: " world"
Output: "Hello Rosetta Code"

Example 3. End delimiter is the end of the string

Text: "Hello Rosetta Code world"
Start delimiter: "Hello"
End delimiter: "end"
Output: "Rosetta Code world"

Example 4. End delimiter appears before and after start delimiter

Text: "</div><div style=\"chinese\">你好嗎</div>"
Start delimiter: "<div style=\"chinese\">"
End delimiter: "</div>"
Output: "你好嗎"

Example 5. End delimiter not present

Text: "<text>Hello <span>Rosetta Code</span> world</text><table style=\"myTable\">"
Start delimiter: "<text>"
End delimiter: "<table>"
Output: "Hello <span>Rosetta Code</span> world</text><table style=\"myTable\">"

Example 6. Start delimiter not present

Text: "<table style=\"myTable\"><tr><td>hello world</td></tr></table>"
Start delimiter: "<table>"
End delimiter: "</table>"
Output: ""

Example 7. Multiple instances of end delimiter after start delimiter (match until the first one)

Text: "The quick brown fox jumps over the lazy other fox"
Start delimiter: "quick "
End delimiter: " fox"
Output: "brown"

Example 8. Multiple instances of the start delimiter (start matching at the first one)

Text: "One fish two fish red fish blue fish"
Start delimiter: "fish "
End delimiter: " red"
Output: "two fish"

Example 9. Start delimiter is end delimiter

Text: "FooBarBazFooBuxQuux"
Start delimiter: "Foo"
End delimiter: "Foo"
Output: "BarBaz"


AppleScript

<lang applescript> my text_between("Hello Rosetta Code world", "Hello ", " world")

on text_between(this_text, start_text, end_text) set return_text to "" try if (start_text is not "start") then set AppleScript's text item delimiters to start_text set return_text to text items 2 thru end of this_text as string else set return_text to this_text end if if (end_text is not "end") then set AppleScript's text item delimiters to end_text set return_text to text item 1 of return_text as string set AppleScript's text item delimiters to "" end if end try set AppleScript's text item delimiters to ""

return return_text end text_between </lang>

C

<lang c> /*

* textBetween: Gets text between two delimiters
*/

char* textBetween(char* thisText, char* startText, char* endText, char* returnText) { //printf("textBetween\n");

   char* startPointer = NULL;
   int stringLength = 0;
   char* endPointer = NULL;
   int endLength = 0;

if (strstr(startText, "start") != NULL) { // Set the beginning of the string startPointer = thisText; } else { startPointer = strstr(thisText, startText);

   	if (startPointer != NULL)

{

       	startPointer = startPointer + strlen(startText);
       }

} // end if the start delimiter is "start"

   if (startPointer != NULL)
   {

if (strstr(endText, "end") != NULL) { // Set the end of the string endPointer = thisText; endLength = 0; } else { endPointer = strstr(startPointer, endText); endLength = (int)strlen(endPointer); } // end if the end delimiter is "end"

       stringLength = strlen(startPointer) - endLength;
       
       if (stringLength == 0)
       {

returnText = ""; startPointer = NULL;

       } else {

// Copy characters between the start and end delimiters

   	    strncpy(returnText,startPointer, stringLength);

returnText[stringLength++] = '\0'; }

   } else {

//printf("Start pointer not found\n"); returnText = "";

   } // end if the start pointer is not found
   
   return startPointer;

} // end textBetween method</lang>

Haskell

<lang Haskell>import Data.Text (Text, breakOn, pack, stripPrefix, unpack) import Data.List (intercalate) import Control.Arrow ((***))

-- TEXT BETWEEN ----------------------------------------------------------- textBetween :: (Either String Text, Either String Text) -> Text -> Text textBetween (s, e) txt =

 let stet t = (Just . const t)
     prune f t = f . flip breakOn t
     -- Text up to and including any start delimiter dropped
     mb =
       either (stet txt) (stripPrefix <*> prune snd txt) s >>=
       -- Residue up to any end delimiter (or if none found, to end)
       (\lp -> either (stet lp) (Just . prune fst lp) e)
 in case mb of
      Just x -> x
      _ -> pack []

-- TESTS ------------------------------------------------------------------ samples :: [Text] samples =

 pack <$>
 [ "Hello Rosetta Code world"

, "

你好吗

" , "<text>Hello Rosetta Code world</text>

" , "
hello world

"

 ]

delims :: [(Either String Text, Either String Text)] delims =

 (wrap *** wrap) <$>
 [ ("Hello ", " world")
 , ("start", " world")
 , ("Hello", "end")

, ("

", "

") , ("<text>", "

") , ("<text>", "

")

 ]

wrap :: String -> Either String Text wrap x =

 pack <$>
 if x `elem` ["start", "end"]
   then Left x
   else Right x

main :: IO () main = do

 mapM_ print $ flip textBetween (head samples) <$> take 3 delims
 (putStrLn . unlines) $
   zipWith
     (\d t -> intercalate (unpack $ textBetween d t) ["\"", "\""])
     (drop 3 delims)
     (tail samples)</lang>
Output:
"Rosetta Code"
"Hello Rosetta Code"
" Rosetta Code world"
"你好吗"
"Hello <span>Rosetta Code</span> world</text><table style="myTable">"
""

Java

javac textBetween.java
java -cp . textBetween "hello Rosetta Code world" "hello " " world"

<lang java> public class textBetween {

   /*
    * textBetween: Get the text between two delimiters
    */
   static String textBetween(String thisText, String startString, String endString)
   {
   	String returnText = "";
   	int startIndex = 0;
   	int endIndex = 0;
   	
   	if (startString.equals("start"))
   	{
   		startIndex = 0;
   	} else {

startIndex = thisText.indexOf(startString);

if (startIndex < 0) { return ""; } else { startIndex = startIndex + startString.length(); }

   	}
       
   	if (endString.equals("end"))
   	{
   		endIndex = thisText.length();
   	} else {
   		endIndex = thisText.indexOf(endString);
           
           if (endIndex <= 0) 
           {
           	return "";
           } else {
           }	
   	}
   	
   	returnText = thisText.substring(startIndex,endIndex);
   	
   	return returnText;
   } // end method textBetween
   /**
    * Main method
    */
   public static void main(String[] args)
   {
   	String thisText = args[0];
   	String startDelimiter = args[1];
   	String endDelimiter = args[2];
   	
   	String returnText = "";
   	returnText = textBetween(thisText, startDelimiter, endDelimiter);
   	
       System.out.println(returnText);
   } // end method main
   

} // end class TextBetween </lang>

JavaScript

<lang javascript> function textBetween(thisText, startString, endString) { if (thisText == undefined) { return ""; }

var start_pos = 0; if (startString != 'start') { start_pos = thisText.indexOf(startString);

// If the text does not contain the start string, return a blank string if (start_pos < 0) { return ; }

// Skip the first startString characters start_pos = start_pos + startString.length; }

var end_pos = thisText.length; if (endString != 'end') { end_pos = thisText.indexOf(endString,start_pos); }

// If the text does not have the end string after the start string, return the whole string after the start if (end_pos < start_pos) { end_pos = thisText.length; }

var newText = thisText.substring(start_pos,end_pos);

return newText; } // end textBetween </lang>

Kotlin

In the third example, I've assumed that the start delimiter should be "Hello " (not "Hello") to match the required output. <lang scala>// version 1.2.10

fun String.textBetween(start: String, end: String): String {

   require(!start.isEmpty() && !end.isEmpty())
   if (this.isEmpty()) return this
   val s = if (start == "start") 0 else this.indexOf(start)
   if (s == -1) return ""
   val si = if (start == "start") 0 else s + start.length
   val e = if (end == "end") this.length else this.indexOf(end, si)
   if (e == -1) return this.substring(si)
   return this.substring(si, e)

}

fun main(args: Array<String>) {

   val texts = listOf(
       "Hello Rosetta Code world",
       "Hello Rosetta Code world",
       "Hello Rosetta Code world",

"

你好嗎

", "<text>Hello Rosetta Code world</text>

", "
hello world

",

       "The quick brown fox jumps over the lazy other fox",
       "One fish two fish red fish blue fish",
       "FooBarBazFooBuxQuux"
   )
   val startEnds = listOf(
       "Hello " to " world",
       "start" to " world",
       "Hello " to "end",

"

" to "

", "<text>" to "

", "
" to "

",

       "quick " to " fox",
       "fish " to " red",
       "Foo" to "Foo"
   )
   for ((i, text) in texts.withIndex()) {
       println("Text: \"$text\"")
       val (s, e) = startEnds[i]
       println("Start delimiter: \"$s\"")
       println("End delimiter: \"$e\"")
       val b = text.textBetween(s, e)
       println("Output: \"$b\"\n")
   }

}</lang>

Output:
Text: "Hello Rosetta Code world"
Start delimiter: "Hello "
End delimiter: " world"
Output: "Rosetta Code"

Text: "Hello Rosetta Code world"
Start delimiter: "start"
End delimiter: " world"
Output: "Hello Rosetta Code"

Text: "Hello Rosetta Code world"
Start delimiter: "Hello "
End delimiter: "end"
Output: "Rosetta Code world"

Text: "</div><div style="chinese">你好嗎</div>"
Start delimiter: "<div style="chinese">"
End delimiter: "</div>"
Output: "你好嗎"

Text: "<text>Hello <span>Rosetta Code</span> world</text><table style="myTable">"
Start delimiter: "<text>"
End delimiter: "<table>"
Output: "Hello <span>Rosetta Code</span> world</text><table style="myTable">"

Text: "<table style="myTable"><tr><td>hello world</td></tr></table>"
Start delimiter: "<table>"
End delimiter: "</table>"
Output: ""

Text: "The quick brown fox jumps over the lazy other fox"
Start delimiter: "quick "
End delimiter: " fox"
Output: "brown"

Text: "One fish two fish red fish blue fish"
Start delimiter: "fish "
End delimiter: " red"
Output: "two fish"

Text: "FooBarBazFooBuxQuux"
Start delimiter: "Foo"
End delimiter: "Foo"
Output: "BarBaz"

Perl 6

Works with: Rakudo version 2017.12

It seems somewhat pointless to write a general purpose routine to do text matching as built-in primitives can do so more flexibly and concisely, but whatever.

<lang perl6>sub text-between ( $text, $start, $end ) {

   return $/»[0]».Str if $text ~~ m:g/ $start (.*?) $end /;
   []

}

  1. Testing

my $text = 'Hello Rosetta Code world';

  1. String start and end delimiter

put '1> ', $text.&text-between( 'Hello ', ' world' );

  1. Regex string start delimiter

put '2> ', $text.&text-between( rx/^/, ' world' );

  1. Regex string end delimiter

put '3> ', $text.&text-between( 'Hello', rx/$/ );

  1. Return all matching strings when multiple matches are possible

put '4> ', join ',', $text.&text-between( 'e', 'o' );

  1. End delimiter only valid after start delimiter

put '5> ', '

你好嗎

'\ .&text-between( '

', '

' );

  1. End delimiter or string end if not found

put '6> ', '<text>Hello Rosetta Code world</text>

'\ .&text-between( '<text>', rx/'
' | $/ );
  1. Start delimiter not found, return blank string
put '7> ', '
hello world

'\ .&text-between( '

', '

' );</lang>

Output:
1> Rosetta Code
2> Hello Rosetta Code
3>  Rosetta Code world
4> ll,tta C, w
5> 你好嗎
6> Hello <span>Rosetta Code</span> world</text><table style="myTable">
7> 

PHP

http://localhost/textBetween.php?thisText=hello%20Rosetta%20Code%20world&start=hello%20&end=%20world

<lang php> <?php function text_between($string, $start, $end) {

   //$string = " ".$string;
   $startIndex = strpos($string,$start);
   
   if ($start == "start")
   {
   	$startIndex = 0;
   } else {
   	if ($startIndex == 0)
   	{
   		return "Start text not found";
   	}
   }
   
   if ($end == "end")
   {
   	$endIndex=strlen($string);
   	$resultLength = $endIndex - $startIndex;
   } else {

$resultLength = strpos($string,$end,$startIndex) - $startIndex; }

   if ($start != "start")
   {

$startIndex += strlen($start); }

   if ($resultLength <= 0)
   {
   	return "End text not found";
   }
   
   return substr($string,$startIndex,$resultLength);

}

$thisText=$_GET["thisText"]; $startDelimiter=$_GET["start"]; $endDelimiter=$_GET["end"];

$returnText = text_between($thisText, $startDelimiter, $endDelimiter);

print_r($returnText); ?> </lang>

Python

<lang python>

  1. !/usr/bin/env python

from sys import argv

  1. textBetween in python
  2. Get the text between two delimiters
  3. Usage:
  4. python textBetween.py "hello Rosetta Code world" "hello " " world"

def textBetween( thisText, startString, endString ):

   try:
   	if startString is 'start':
   		startIndex = 0
   	else:
   		startIndex = thisText.index( startString ) 
   	
   	if not (startIndex >= 0):
   		return 'Start delimiter not found'
   	else:
       	startIndex = startIndex + len( startString )
       
       returnText = thisText[startIndex:]


   	if endString is 'end':
   		return returnText
   	else:
   		endIndex = returnText.index( endString )
   	if not (endIndex >= 0):
   		return 'End delimiter not found'
   	else:
       	returnText = returnText[:endIndex]
       return returnText
   except ValueError:
       return "Value error"

script, first, second, third = argv

thisText = first startString = second endString = third

print textBetween( thisText, startString, endString ) </lang>

REXX

Translation of: Kotlin

<lang rexx>Say 'Using the string Hello Rosetta Code world:' Call test 'Hello Rosetta Code world','Hello ',' world' Call test 'Hello Rosetta Code world','<start>',' world' Call test 'Hello Rosetta Code world','Hello','<end>' Call test 'Hello Rosetta Code world','Hello Rosetta ','Code world' Call test 'Hello Rosetta Code world','Hello Rosetta','Code world' Call test 'Hello Rosetta Code world','Code','Hello' Call test 'Hello Rosetta Code world','Hello Rosetta Code','Code world' Call test 'Hello Rosetta Code world','Goodbye','Code world' Exit

test: Procedure

 Parse Arg t,s,e
 res=text_between(t,s,e)
 Say '  text between' "'"s"'" 'and' "'"e"'" 'is' "'"res"'"
 Return

text_between: Procedure

 Parse Arg this_text, start_text, end_text
 If start_text='<start>' Then
   rest=this_text
 Else Do
   s=pos(start_text,this_text)
   If s>0 Then
     rest=substr(this_text,s+length(start_text))
   Else
     Return this_text
   End
 If end_text='<end>' Then
   Return rest
 Else Do
   e=pos(end_text,rest)
   If e=0 Then
     Return this_text
   Return left(rest,e-1)
   End</lang>
Output:
Using the string 'Hello Rosetta Code world':
  text between 'Hello ' and ' world' is 'Rosetta Code'
  text between '<start>' and ' world' is 'Hello Rosetta Code'
  text between 'Hello' and '<end>' is ' Rosetta Code world'
  text between 'Hello Rosetta ' and 'Code world' is ''
  text between 'Hello Rosetta' and 'Code world' is ' '
  text between 'Code' and 'Hello' is 'Hello Rosetta Code world'
  text between 'Hello Rosetta Code' and 'Code world' is 'Hello Rosetta Code world'
  text between 'Goodbye' and 'Code world' is 'Hello Rosetta Code world'

Ruby

Test <lang ruby> class String

 def textBetween startDelimiter, endDelimiter
 
 	if (startDelimiter == "start") then
 		startIndex = 0
 	else
 		startIndex = self.index(startDelimiter) + startDelimiter.length
 	end
 	
 	if (startIndex == nil) then
 		return "Start delimiter not found"
 	end
 	
 	thisLength = self.length
 	
 	returnText = self[startIndex, thisLength]
 	  	
	if (endDelimiter == "end") then
 		endIndex = thisLength
 	else
 		endIndex = returnText.index(endDelimiter)
 	end
 	
 	if (endIndex == nil) then
 		return "End delimiter not found"
 	end
 	  	
 	returnText = returnText[0, endIndex]
 	
 	return returnText
 
 end

end

thisText = ARGV[0] startDelimiter = ARGV[1] endDelimiter = ARGV[2]

  1. puts thisText
  2. puts startDelimiter
  3. puts endDelimiter

returnText = thisText.textBetween(startDelimiter, endDelimiter)

puts returnText </lang>

UNIX Shell

Works with: Bash
Works with: Dash
Works with: Zsh

This implementation creates no processes/subshells in modern shells (e.g. shells in which 'echo' and '[' are builtins). It modifies/leaks no global state other than the "text_between" function's name. Its behavior is not changed by the presence or absence of common shell options (e.g. "-e", "-u", "pipefail", or POSIX compatibility mode) or settings (e.g. "IFS").

This can be made to work with ksh (93) by removing all uses of the "local" keyword, though this will cause it to modify global variables.

The "hard" assertions when unpacking the arguments to the "text_between" function reflect the assumptions in the requirements for this problem: that null/empty arguments will never be provided. If any empty arguments are given, the interpreter running this function will exit after printing an error. If this function is invoked without a subshell, that will crash the invoking program as well. In practical use, that may not be desirable, in which case the ":?" assertions should be replaced with less harsh conditional-unpack code (e.g. if [ -z "${1:-}" ]; then echo "Invalid input!" && return 127; else local var="$1"; fi).

<lang bash>text_between() { local search="${1:?Search text not provided}" local start_str="${2:?Start text not provided}" local end_str="${3:?End text not provided}" local temp=

if [ "$start_str" != "start" ]; then # $temp will be $search with everything before the first occurrence of # $start_str (inclusive) removed, searching from the beginning. temp="${search#*$start_str}" # If the start delimiter wasn't found, return an empty string. # Comparing length rather than string equality because character # comparison is not necessary here. if [ "${#temp}" -eq "${#search}" ]; then search= else search="$temp" fi fi

if [ "$end_str" = "end" ]; then echo "$search" else # Output will be $search with everything after the last occurrence of # $end_str (inclusive) removed, searching from the end. echo "${search%%$end_str*}" fi return 0 }

text_between "Hello Rosetta Code world" "Hello " " world" text_between "Hello Rosetta Code world" "start" " world" text_between "Hello Rosetta Code world" "Hello " "end"</lang>

zkl

<lang zkl>fcn getText(text,start,end){

  s = (if((s:=text.find(start))==Void) 0 else s + start.len());
  e = (if((e:=text.find(end,s))==Void) text.len() else e);
  text[s,e - s]

} getText("Hello Rosetta Code world","Hello "," world").println(); getText("Hello Rosetta Code world","start", " world").println(); getText("Hello Rosetta Code world","Hello", "end" ).println();</lang>

Output:
Rosetta Code
Hello Rosetta Code
 Rosetta Code world