String Byte Length: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|Java}}: Corrected a method name.)
m (Switch to header template)
Line 6: Line 6:
For character length, see [[String Character Length]].
For character length, see [[String Character Length]].


==[[4D]]==
=={{header|4D}}==
[[Category:4D]]

$length:=Length("Hello, world!")
$length:=Length("Hello, world!")


==[[ActionScript]]==
=={{header|ActionScript}}==
[[Category:ActionScript]]
myStrVar.length()
myStrVar.length()


==[[Ada]]==
=={{header|Ada}}==
[[Category:Ada]]

'''Compiler:''' GCC 4.1.2
'''Compiler:''' GCC 4.1.2


Line 25: Line 20:
The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.
The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.


==[[AppleScript]]==
=={{header|AppleScript}}==
[[Category:AppleScript]]
count of "Hello World"
count of "Hello World"


==[[AWK]]==
=={{header|AWK}}==
[[Category:AWK]]
From within any code block:
From within any code block:
w=length("Hello, world!") # static string example
w=length("Hello, world!") # static string example
Line 42: Line 35:
{print"The length of this line is "length($0)}
{print"The length of this line is "length($0)}


==[[C]]==
=={{header|C}}==
[[Category:C]]

'''Standard:''' [[ANSI C]] (AKA [[C89]]):
'''Standard:''' [[ANSI C]] (AKA [[C89]]):


Line 111: Line 102:
}
}


==[[C sharp|C#]]==
=={{header|C sharp|C#}}==
[[Category:C sharp]]

'''Platform:''' [[.NET]]
'''Platform:''' [[.NET]]
'''Language Version:''' 1.0+
'''Language Version:''' 1.0+
Line 121: Line 110:
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.


==[[Clean]]==
=={{header|Clean}}==
[[Category:Clean]]

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.


Line 133: Line 120:
Start = strlen "Hello, world!"
Start = strlen "Hello, world!"


==[[ColdFusion]]==
=={{header|ColdFusion}}==
[[Category:ColdFusion]]

#len("Hello World")#
#len("Hello World")#


==[[Common Lisp]]==
=={{header|Common Lisp}}==
[[Category:Common Lisp]]

(length "Hello World")
(length "Hello World")


==[[Component Pascal]]==
=={{header|Component Pascal}}==
[[Category:Component Pascal]]

LEN("Hello, World!")
LEN("Hello, World!")


==[[Forth]]==
=={{header|Forth}}==
[[Category:Forth]]

'''Interpreter:''' ANS Forth
'''Interpreter:''' ANS Forth


Line 167: Line 146:
DUP . \ 6
DUP . \ 6


==[[Haskell]]==
=={{header|Haskell}}==
[[Category:Haskell]]

'''Interpreter:''' [[GHC | GHCi]] 6.6, [[Hugs]]
'''Interpreter:''' [[GHC | GHCi]] 6.6, [[Hugs]]


Line 176: Line 153:
strlen = length "Hello, world!"
strlen = length "Hello, world!"


==[[IDL]]==
=={{header|IDL}}==
[[Category:IDL]]

'''Compiler:''' any IDL compiler should do
'''Compiler:''' any IDL compiler should do


Line 196: Line 171:
int byteCountUTF8 = s.getBytes("UTF-8").length;
int byteCountUTF8 = s.getBytes("UTF-8").length;


==[[JavaScript]]==
=={{header|JavaScript}}==
[[Category:JavaScript]]

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.


Line 204: Line 177:
var byteCount = s.length * 2; //26
var byteCount = s.length * 2; //26


==[[JudoScript]]==
=={{JudoScript}}==
[[Category:JudoScript]]

//Store length of hello world in length and print it
//Store length of hello world in length and print it
. length = "Hello World".length();
. length = "Hello World".length();
Line 214: Line 185:
" Hello world" @ 1 + 8 * , # 96 = (11+1)*(size of a cell) = 12*8
" Hello world" @ 1 + 8 * , # 96 = (11+1)*(size of a cell) = 12*8


==[[Lua]]==
=={{Lua}}==
[[Category:Lua]]

'''Interpreter:''' [[Lua]] 5.0 or later.
'''Interpreter:''' [[Lua]] 5.0 or later.


Line 222: Line 191:
length=#string
length=#string


==[[mIRC Scripting Language]]==
=={{header|mIRC Scripting Language}}==
[[Category:mIRC Scripting Language]]

'''Interpreter:''' [[mIRC]]
'''Interpreter:''' [[mIRC]]


alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }


==[[OCaml]]==
=={{header|OCaml}}==
[[Category:OCaml]]
'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09
'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09


Line 236: Line 202:




==[[Perl]]==
=={{header|Perl}}==
[[Category:Perl]]
'''Interpreter:''' [[perl]] 5.8
'''Interpreter:''' [[perl]] 5.8


Line 251: Line 216:
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.


==[[PHP]]==
=={{header|PHP}}==
[[Category:PHP]]

$length = strlen('Hello, world!');
$length = strlen('Hello, world!');


==[[PL/SQL|PL/SQL]]==
=={{header|PL/SQL|PL/SQL}}==
[[Category:PL/SQL|PL/SQL]]

DECLARE
DECLARE
string VARCHAR2( 50 ) := 'Hello, world!';
string VARCHAR2( 50 ) := 'Hello, world!';
Line 266: Line 227:
END;
END;


==[[Pop11]]==
=={{header|Pop11}}==
[[Category:Pop11]]

Currently Pop11 supports only strings consisting of 1-byte units.
Currently Pop11 supports only strings consisting of 1-byte units.
Strings can carry arbitrary binary data, so user can for example
Strings can carry arbitrary binary data, so user can for example
Line 278: Line 237:
lvars len = length(str);
lvars len = length(str);


==[[Python]]==
=={{header|Python}}==
[[Category:Python]]

'''Interpreter:''' [[Python]] 2.x
'''Interpreter:''' [[Python]] 2.x


Line 291: Line 248:
1
1


==[[Ruby]]==
=={{header|Ruby}}==
[[Category:Ruby]]

string="Hello world"
string="Hello world"
print string.length
print string.length
Line 299: Line 254:
puts "Hello World".length
puts "Hello World".length


==[[Scheme]]==
=={{header|Scheme}}==
[[Category:Scheme]]

(string-length "Hello world")
(string-length "Hello world")


==[[Smalltalk]]==
=={{header|Smalltalk}}==
[[Category:Smalltalk]]

string := 'Hello, world!".
string := 'Hello, world!".
string size.
string size.


==[[Standard ML]]==
=={{header|Standard ML}}==
[[Category:Standard ML]]

'''Interpreter:''' [[Standard ML of New Jersey | SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)
'''Interpreter:''' [[Standard ML of New Jersey | SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)


Line 319: Line 268:
val strlen = size "Hello, world!";
val strlen = size "Hello, world!";


==[[Tcl]]==
=={{header|Tcl}}==
[[Category:Tcl]]

Basic version:
Basic version:


Line 334: Line 281:
puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]
puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]


==[[Toka]]==
=={{header|Toka}}==
[[Category:Toka]]

" hello, world!" string.getLength
" hello, world!" string.getLength


==[[UNIX Shell]]==
=={{header|UNIX Shell}}==
[[Category:UNIX Shell]]

With external utilities:
With external utilities:


Line 359: Line 302:




==[[VBScript]]==
=={{header|VBScript}}==
[[Category:VBScript]]
LenB(string|varname)
LenB(string|varname)


Line 366: Line 308:
Returns null if string|varname is null
Returns null if string|varname is null


==[[xTalk]]==
=={{header|xTalk}}==
[[Category:xTalk]]

'''Interpreter:''' HyperCard
'''Interpreter:''' HyperCard



Revision as of 04:35, 13 November 2007

This task has has been split off from another task. Its programming examples are in need of review to ensure that they fit the requirements of the new task.
Task
String Byte Length
You are encouraged to solve this task according to the task description, using any language you may know.

In this task, the goal is to find the byte length of a string. This means encodings like UTF-8 may need to be handled specially, as there is not necessarily a one-to-one relationship between bytes and characters, and some languages recognize this.

For character length, see String Character Length.

4D

$length:=Length("Hello, world!")

ActionScript

myStrVar.length()

Ada

Compiler: GCC 4.1.2

Str    : String := "Hello World";
Length : constant Natural := Str'Size / System.Storage_Unit;

The 'size attribute returns the size of an object in bits. System.Storage_Unit is the number of bits in a byte on the current machine.

AppleScript

count of "Hello World"

AWK

From within any code block:

w=length("Hello, world!")      # static string example
x=length("Hello," s " world!") # dynamic string example
y=length($1)                   # input field example
z=length(s)                    # variable name example

Ad hoc program from command line:

echo "Hello, world!" | awk '{print length($0)}'

From executable script: (prints for every line arriving on stdin)

#!/usr/bin/awk -f
{print"The length of this line is "length($0)}

C

Standard: ANSI C (AKA C89):

Compiler: GCC 3.3.3

 #include <string.h>

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = strlen(string);
          
   return 0;
 }

or by hand:

 int main(void) 
 {
   const char *string = "Hello, world!";
   size_t length = 0;
   
   char *p = (char *) string;
   while (*p++ != '\0') length++;                                         
   
   return 0;
 }

or (for arrays of char only)

 #include <stdlib.h>
 
 int main(void)
 {
   char const s[] = "Hello, world!";
   size_t length = sizeof s - 1;
   
   return 0;
 }

C++

Standard: ISO C++ (AKA C++98):

Compiler: g++ 4.0.2

 #include <string> // note: not <string.h>
 
 int main()
 {
   std::string s = "Hello, world!";
   std::string::size_type length = s.length(); // option 1: In Characters/Bytes
   std::string::size_type size = s.size();     // option 2: In Characters/Bytes
   // In bytes same as above since sizeof(char) == 1
   std::string::size_type bytes = s.length() * sizeof(std::string::value_type); 
 }

For wide character strings:

 #include <string>
 
 int main()
 {
   std::wstring s = L"\u304A\u306F\u3088\u3046";
   std::wstring::size_type length = s.length() * sizeof(std::wstring::value_type); // in bytes
 }

C#

Platform: .NET Language Version: 1.0+

string s = "Hello, world!";
int clength = s.Length;  // In characters
int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.

Clean

Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.

import StdEnv

strlen :: String -> Int
strlen string = size string 

Start = strlen "Hello, world!"

ColdFusion

  #len("Hello World")#

Common Lisp

  (length "Hello World")

Component Pascal

  LEN("Hello, World!")

Forth

Interpreter: ANS Forth

Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.

Counted string

A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.

 CREATE s ," Hello world" \ create string "s"
 s C@ ( -- length=11 )

Stack string

A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word COUNT converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters.

S" string" ( addr len)
DUP .   \ 6

Haskell

Interpreter: GHCi 6.6, Hugs

Compiler: GHC 6.6

strlen = length "Hello, world!"

IDL

Compiler: any IDL compiler should do

 length = strlen("Hello, world!")

Java

Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

String s = "Hello, world!";
int byteCount = s.length() * 2;

Another way to know the byte length of a string is to explicitly specify the charset we desire.

String s = "Hello, world!";
int byteCountUTF16 = s.getBytes("UTF-16").length;
int byteCountUTF8  = s.getBytes("UTF-8").length;

JavaScript

JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number.

var s = "Hello, world!";
var byteCount = s.length * 2; //26

Template:JudoScript

 //Store length of hello world in length and print it
 . length = "Hello World".length();

LSE64

LSE stores strings as arrays of characters in 64-bit cells plus a count.

" Hello world" @ 1 + 8 * ,   # 96 = (11+1)*(size of a cell) = 12*8

Template:Lua

Interpreter: Lua 5.0 or later.

 string="Hello world"
 length=#string

mIRC Scripting Language

Interpreter: mIRC

alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }

OCaml

Interpreter/Compiler: Ocaml 3.09

String.length "Hello world";;


Perl

Interpreter: perl 5.8

Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).

use utf8; # so we can use literal characters like ☺ in source
use Encode qw(encode);

print length encode 'UTF-8', "Hello, world! ☺";
# 17. The last character takes 3 bytes, the others 1 byte each.

print length encode 'UTF-16', "Hello, world! ☺";
# 32. 2 bytes for the BOM, then 15 byte pairs for each character.

PHP

 $length = strlen('Hello, world!');

PL/SQL

DECLARE
  string VARCHAR2( 50 ) := 'Hello, world!';
  stringlength NUMBER;
BEGIN
  stringlength := length( string );
END;

Pop11

Currently Pop11 supports only strings consisting of 1-byte units. Strings can carry arbitrary binary data, so user can for example use UTF-8 (however builtin procedures will treat each byte as a single character). The length function for strings returns length in bytes:

lvars str = 'Hello, world!';
lvars len = length(str);

Python

Interpreter: Python 2.x

Byte length depends on the encoding. Python use 2 or 4 bytes per character internally for unicode strings, depending on how it was built. The internal representation is not interesting for the user.

# The letter Alef
>>> len(u'\u05d0'.encode('utf-8'))
2
>>> len(u'\u05d0'.encode('iso-8859-8'))
1

Ruby

 string="Hello world"
 print string.length

or

 puts "Hello World".length

Scheme

 (string-length "Hello world")

Smalltalk

 string := 'Hello, world!".
 string size.

Standard ML

Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)

Compiler: MLton 20061107

val strlen = size "Hello, world!";

Tcl

Basic version:

 string bytelength "Hello, world!"

or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.

 fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly
 set s1 "hello, world"
 set s2 "\u304A\u306F\u3088\u3046"
 puts [format "length of \"%s\" in bytes is %d"  $s1 [string bytelength $s1]]
 puts [format "length of \"%s\" in bytes is %d"  $s2 [string bytelength $s2]]

Toka

 " hello, world!" string.getLength

UNIX Shell

With external utilities:

Interpreter: any bourne shell

 string='Hello, world!'
 length=`echo -n "$string" | wc -c | tr -dc '0-9'`
 echo $length # if you want it printed to the terminal

With SUSv3 parameter expansion modifier:

Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell

 string='Hello, world!'
 length="${#string}"
 echo $length # if you want it printed to the terminal


VBScript

LenB(string|varname) 

Returns the number of bytes required to store a string in memory Returns null if string|varname is null

xTalk

Interpreter: HyperCard

 put the length of "Hello World"

or

 put the number of characters in "Hello World"