String Character Length: Difference between revisions

Content deleted Content added

Inline

Revision as of 23:09, 7 December 2007

This task has has been split off from another task. Its programming examples are in need of review to ensure that they fit the requirements of the new task.

In this task, the goal is to find the character length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.

For byte length, see String Byte Length.

ActionScript

myStrVar.length()

Ada

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Length;

ALGOL 68

STRING str := "hello, world";
INT length := UPB str;
printf(($"Length of """g""" is "g(3)$,str,length))

Result:

Length of "hello, world" is +12

AppleScript

count of "Hello World"

Or:

count "Hello World"

C

Compiler: ???

For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):

 #include <stdio.h>
 #include <wchar.h>
 
 int main(void) 
 {
    wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
    size_t length;
 
    length = wcslen(s);
    printf("Length in characters = %d\n", length);
    printf("Length in bytes      = %d\n", sizeof(s) * sizeof(wchar_t));
    
    return 0;
 }

TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()

Objective-C

// Return the length in unicode characters
unsigned length = [@"Hello Word!" length];

C++

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

 #include <string> // note: not <string.h>
 
 int main()
 {
   std::string s = "Hello, world!";
   // Always in characters == bytes since sizeof(char) == 1
   std::string::size_type length = s.length(); // option 1: In Characters/Bytes
   std::string::size_type size = s.size();     // option 2: In Characters/Bytes
 }

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length();
}

C#

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int clength = s.Length;  // In characters
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.

Clean

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv

strlen :: String -> Int
strlen string = size string 

Start = strlen "Hello, world!"

ColdFusion

  #len("Hello World")#

Common Lisp

  (length "Hello World")

Component Pascal

  LEN("Hello, World!")

E

"Hello World".size()

Forth

The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)

Interpreter: ANS Forth

The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".

2 base !
: utf8+ ( str -- str )
  begin
    char+
    dup c@
    11000000 and
    10000000 <>
  until ;
decimal

: count-utf8 ( zstr -- n )
  0
  begin
    swap dup c@
  while
    utf8+
    swap 1+
  repeat drop ;

Haskell

Interpreter: GHCi 6.6, Hugs

Compiler: GHC 6.6

The base type Char defined by the standard is already intended for (plain) Unicode characters.

strlen = length "Hello, world!"

IDL

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Java

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

The length method of String objects gives the number of 16-bit values used to encode a string.

String s = "Hello, world!";
int length = s.length();

Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.

String str = "\uD834\uDD2A"; //U+1D12A
int length1 = str.length(); //2
int length2 = str.codePointCount(0, str.length()); //1

JavaScript

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.

JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.

var str1 = "Hello, world!";
var len1 = str1.length; //13

var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair
var len2 = str2.length; //2

JudoScript

 //Store length of hello world in length and print it
 . length = "Hello World".length();

LSE64

LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.

" Hello world" @ ,   # 11

Lua

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

MAXScript

"Hello world".count

mIRC Scripting Language

Interpreter: mIRC

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

OCaml

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;

Perl

Interpreter: Perl any 5.X

 my $length = length "Hello, world!";

PHP

 $length = strlen('Hello, world!');

PL/SQL

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Python

Interpreter: Python 2.4

len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:

>>> len('ascii')
5
>>> len(u'\u05d0') # the letter Alef as unicode literal
1
>>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string
1

Ruby

Library: active_support

 require 'active_support'
 puts "Hello World".chars.length

Scheme

 (string-length "Hello world")

Seed7

 length("Hello, world!")

Smalltalk

 string := 'Hello, world!".
 string size.

Standard ML

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Tcl

Basic version:

 string length "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in characters is %d"  $s1 [string length $s1]]
 puts [format "length of \"%s\" in characters is %d"  $s2 [string length $s2]]

UNIX Shell

With external utilities:

Interpreter: any Bourne Shell

 string='Hello, world!'
 length=`echo -n "$string" | wc -c | tr -dc '0-9'`
 echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell

 string='Hello, world!'
 length="${#string}"
 echo $length # if you want it printed to the terminal

VBScript

Len(string|varname)

Returns the length of the string|varname Returns null if string|varname is null

XSLT

<?xml version="1.0" encoding="UTF-8"?>
...
<xsl:value-of select="string-length('møøse')" />

xTalk

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"

Revision as of 23:03, 7 December 2007 view source rosettacode>IanOsgood verified that AWK example only does byte length ← Older edit		Revision as of 23:09, 7 December 2007 view source rosettacode>IanOsgood →‎{{header\|C}}: remove incorrect examples Newer edit →
Line 27: =={{header\|C}}== '''Compiler:''' ~~GCC 3.3.3~~???▼ ~~'''Standard:''' [[ANSI C]] (AKA [[C89]]):~~ For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):▼ ▲'''Compiler:''' GCC 3.3.3 ~~#include <string.h>~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = strlen(string);~~ ~~return 0;~~ } ~~or by hand:~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = 0;~~ ~~char p = (char ) string;~~ ~~while (*p++ != '\0') length++;~~ ~~return 0;~~ } ~~or (for arrays of char only)~~ ~~#include <stdlib.h>~~ ~~int main(void)~~ { ~~char const s[] = "Hello, world!";~~ ~~size_t length = sizeof s - 1;~~ ~~return 0;~~ } ▲For wide character strings (usually Unicode): #include <stdio.h> Line 82 ⟶ 45: return 0; } ''TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()'' =={{header\|Objective-C}}==