String Byte Length

From Rosetta Code
Revision as of 17:11, 20 February 2007 by Nick (talk | contribs) (add 4D)
Task
String Byte Length
You are encouraged to solve this task according to the task description, using any language you may know.
This task has has been split off from another task. Its programming examples are in need of review to ensure that they fit the requirements of the new task.

In this task, the goal is to find the byte length of a string. This means encodings like UTF-8 may need to be handled specially, as there is not necessarily a one-to-one relationship between bytes and characters, and some languages recognize this.

For character length, see String Character Length.

4D

$length:=Length("Hello, world!")

ActionScript

myStrVar.length()

Ada

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Length;

AppleScript

count of "Hello World"

AWK

From within any code block:

w=length("Hello, world!")      # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1)                   # input field example
z=length(s)                    # variable name example

Ad hoc program from command line:

echo "Hello, world!" | awk '{print length($0)}'

From executable script: (prints for every line arriving on stdin)

#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

C

Standard: ANSI C (AKA C89):

Compiler: GCC 3.3.3

 #include <string.h>

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = strlen(string);
          
   return 0;
 }

or by hand:

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = 0;
   
   char *p = (char *) string;
   while (*p++ != '\0') length++;                                         
   
   return 0;
 }

or (for arrays of char only)

 #include <stdlib.h>
 
 int main(void)
 {
   char const s[] = "Hello, world!";
   size_t length = sizeof s - 1;
   
   return 0;
 }

For wide character strings (usually Unicode):

 #include <stdio.h>
 #include <wchar.h>
 
 int main(void) 
 {
    wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */
    size_t length;
 
    length = wcslen(s);
    printf("Length in characters = %d\n", length);
    printf("Length in bytes      = %d\n", sizeof(s) * sizeof(wchar_t));
    
    return 0;
 }

C++

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

 #include <string> // note: not <string.h>
 
 int main()
 {
   std::string s = "Hello, world!";
   std::string::size_type length = s.length(); // option 1
   std::string::size_type size = s.size();     // option 2
 }

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length();
 }

C#

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int length = s.Length;

ColdFusion

  #len("Hello World")#

Common Lisp

  (length "Hello World")

Component Pascal

  LEN("Hello, World!")

Forth

Interpreter: ANS Forth

 CREATE s ," Hello world" \ create string "s"
 s C@ ( -- length )

Haskell

Interpreter: GHCi 6.6, Hugs

Compiler: GHC 6.6

strlen = length "Hello, world!"

IDL

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Java

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

String s = "Hello, world!";
int byteCount = s.length() * 2;

An other way to know the byte length of a string is to explicitly specify the charset we desire.

String s = "Hello, world!";
int byteCountUTF16 = s.getByte("UTF-16").length;
int byteCountUTF8  = s.getByte("UTF-8").length;

JavaScript

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

var s = "Hello, world!";
var byteCount = s.length * 2; //26

JudoScript

 //Store length of hello world in length and print it
 . length = "Hello World".length();

Lua

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

mIRC Scripting Language

Interpreter: mIRC

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

OCaml

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;


Perl

Interpreter: perl 5.8.6

Perl strings are in either the platform's native single-byte encoding (usually ISO 8859-1) or UTF-8. utf8::upgrade (a translation function) has the side effect of returning the resulting byte length, and does nothing else if the string is already UTF-8.

$str = "Hello, world!";
$length = utf8::is_utf8($str) ? utf8::upgrade($str) : length $str;

Note: Do not use utf8; in this case. That has other side effects, and the functions are available without it.

PHP

 $length = strlen('Hello, world!');

PL/SQL

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Python

Interpreter: Python 2.4

length = len("The length of this string will be determined")

Ruby

 string="Hello world"
 print string.length

or

 puts "Hello World".length

Scheme

 (string-length "Hello world")

Smalltalk

 string := 'Hello, world!".
 string size.

Standard ML

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Tcl

Basic version:

 string bytelength "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in bytes is %d"  $s1 [string bytelength $s1]]
 puts [format "length of \"%s\" in bytes is %d"  $s2 [string bytelength $s2]]

UNIX Shell

With external utilities:

Interpreter: any bourne shell

 string='Hello, world!'
 length=`echo -n "$string" | wc -c | tr -dc '0-9'`
 echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell

 string='Hello, world!'
 length="${#string}"
 echo $length # if you want it printed to the terminal


VBScript

LenB(string|varname) 

Returns the number of bytes required to store a string in memory Returns null if string|varname is null

xTalk

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"