String Character Length

From Rosetta Code
Revision as of 02:15, 20 October 2007 by rosettacode>Nirs (Handling encoded strings)
Task
String Character Length
You are encouraged to solve this task according to the task description, using any language you may know.
This task has has been split off from another task. Its programming examples are in need of review to ensure that they fit the requirements of the new task.

In this task, the goal is to find the character length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.

For byte length, see String Byte Length.

ActionScript

myStrVar.length()

Ada

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Length;

AppleScript

count of "Hello World"

Or:

count "Hello World"

AWK

From within any code block:

w=length("Hello, world!")      # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1)                   # input field example
z=length(s)                    # variable name example

Ad hoc program from command line:

echo "Hello, world!" | awk '{print length($0)}'

From executable script: (prints for every line arriving on stdin)

#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

C

Standard: ANSI C (AKA C89):

Compiler: GCC 3.3.3

 #include <string.h>

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = strlen(string);
          
   return 0;
 }

or by hand:

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = 0;
   
   char *p = (char *) string;
   while (*p++ != '\0') length++;                                         
   
   return 0;
 }

or (for arrays of char only)

 #include <stdlib.h>
 
 int main(void)
 {
   char const s[] = "Hello, world!";
   size_t length = sizeof s - 1;
   
   return 0;
 }

For wide character strings (usually Unicode):

 #include <stdio.h>
 #include <wchar.h>
 
 int main(void) 
 {
    wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
    size_t length;
 
    length = wcslen(s);
    printf("Length in characters = %d\n", length);
    printf("Length in bytes      = %d\n", sizeof(s) * sizeof(wchar_t));
    
    return 0;
 }

Objective-C

// Return the length in unicode characters
unsigned length = [@"Hello Word!" length];

C++

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

 #include <string> // note: not <string.h>
 
 int main()
 {
   std::string s = "Hello, world!";
   // Always in characters == bytes since sizeof(char) == 1
   std::string::size_type length = s.length(); // option 1: In Characters/Bytes
   std::string::size_type size = s.size();     // option 2: In Characters/Bytes
 }

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length();
}

C#

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int clength = s.Length;  // In characters
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.

Clean

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv

strlen :: String -> Int
strlen string = size string 

Start = strlen "Hello, world!"

ColdFusion

  #len("Hello World")#

Common Lisp

  (length "Hello World")

Component Pascal

  LEN("Hello, World!")

E

"Hello World".size()

Forth

The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)

Interpreter: ANS Forth

The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".

binary
: utf8+ ( str -- str )
  begin
    char+
    dup c@
    11000000 and
    10000000 <>
  until ;
decimal

: count-utf8 ( zstr -- n )
  0
  begin
    swap dup c@
  while
    utf8+
    swap 1+
  repeat drop ;

Haskell

Interpreter: GHCi 6.6, Hugs

Compiler: GHC 6.6

strlen = length "Hello, world!"

IDL

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Java

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

The length method of String objects gives the number of 16-bit values used to encode a string.

String s = "Hello, world!";
int length = s.length();

Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.

String str = "\uD834\uDD2A"; //U+1D12A
int length1 = str.length(); //2
int length2 = str.codePointCount(0, str.length()); //1

JavaScript

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

var str1 = "Hello, world!";
var len1 = str1.length; //13

var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2

JudoScript

 //Store length of hello world in length and print it
 . length = "Hello World".length();

Lua

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

MAXScript

"Hello world".count

mIRC Scripting Language

Interpreter: mIRC

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

OCaml

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;


Perl

Interpreter: Perl any 5.X

 my $length = length "Hello, world!";

PHP

 $length = strlen('Hello, world!');

PL/SQL

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Python

Interpreter: Python 2.4

len() returns the length of a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:

>>> len('ascii')
5
>>> len(u'\u05d0') # the letter Alef as unicode literal
1
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
1

Ruby

Library: active_support

 require 'active_support'
 puts "Hello World".chars.length

Scheme

 (string-length "Hello world")

Seed7

 length("Hello, world!")

Smalltalk

 string := 'Hello, world!".
 string size.

Standard ML

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Tcl

Basic version:

 string length "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in characters is %d"  $s1 [string length $s1]]
 puts [format "length of \"%s\" in characters is %d"  $s2 [string length $s2]]

UNIX Shell

With external utilities:

Interpreter: any bourne shell

 string='Hello, world!'
 length=`echo -n "$string" | wc -c | tr -dc '0-9'`
 echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell

 string='Hello, world!'
 length="${#string}"
 echo $length # if you want it printed to the terminal


VBScript

Len(string|varname) 

Returns the length of the string|varname Returns null if string|varname is null

xTalk

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"