String Byte Length: Difference between revisions

← Older edit

Content deleted Content added

Inline

Latest revision as of 19:32, 19 January 2008

Redirect to:

String length

Revision as of 19:15, 6 September 2007 view source Ce (talk \| contribs) 973 edits Undo revision 8281 by Special:Contributions/IjxJaq (User talk:IjxJaq) ← Older edit		Latest revision as of 19:32, 19 January 2008 view source rosettacode>Mwn3d m Stupid case-sensitivity.
(25 intermediate revisions by 13 users not shown)
Line 1: #REDIRECT [[String length]] ~~{{Template:split-review}}~~ ~~{{task}}~~ In this task, the goal is to find the <em>byte</em> length of a string. This means encodings like [[UTF-8]] may need to be handled specially, as there is not necessarily a one-to-one relationship between bytes and characters, and some languages recognize this. ~~For character length, see [[String Character Length]].~~ ~~==[[4D]]==~~ ~~[[Category:4D]]~~ ~~$length:=Length("Hello, world!")~~ ~~==[[ActionScript]]==~~ ~~[[Category:ActionScript]]~~ ~~myStrVar.length()~~ ~~==[[Ada]]==~~ ~~[[Category:Ada]]~~ ~~'''Compiler:''' GCC 4.1.2~~ ~~Str : String := "Hello World";~~ ~~Length : constant Natural := Str'Length;~~ ~~==[[AppleScript]]==~~ ~~[[Category:AppleScript]]~~ ~~count of "Hello World"~~ ~~==[[AWK]]==~~ ~~[[Category:AWK]]~~ ~~From within any code block:~~ ~~w=length("Hello, world!") # static string example~~ ~~x=length("Hello," s " world!") # dynamic string example~~ ~~y=length($1) # input field example~~ ~~z=length(s) # variable name example~~ ~~Ad hoc program from command line:~~ ~~echo "Hello, world!" \| awk '{print length($0)}'~~ ~~From executable script: (prints for every line arriving on stdin)~~ ~~#!/usr/bin/awk -f~~ ~~{print"The length of this line is "length($0)}~~ ~~==[[C]]==~~ ~~[[Category:C]]~~ ~~'''Standard:''' [[ANSI C]] (AKA [[C89]]):~~ ~~'''Compiler:''' GCC 3.3.3~~ ~~#include <string.h>~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = strlen(string);~~ ~~return 0;~~ } ~~or by hand:~~ ~~int main(void)~~ { ~~const char string = "Hello, world!";~~ ~~size_t length = 0;~~ ~~char p = (char ) string;~~ ~~while (p++ != '\0') length++;~~ ~~return 0;~~ } ~~or (for arrays of char only)~~ ~~#include <stdlib.h>~~ ~~int main(void)~~ { ~~char const s[] = "Hello, world!";~~ ~~size_t length = sizeof s - 1;~~ ~~return 0;~~ } ~~==[[C plus plus\|C++]]==~~ ~~[[Category:C plus plus\|C++]]~~ ~~'''Standard:''' [[ISO C plus plus\|ISO C++]] (AKA [[C plus plus 98\|C++98]]):~~ ~~'''Compiler:''' g++ 4.0.2~~ ~~#include <string> // note: '''not''' <string.h>~~ ~~int main()~~ { ~~std::string s = "Hello, world!";~~ ~~std::string::size_type length = s.length(); // option 1: In Characters/Bytes~~ ~~std::string::size_type size = s.size(); // option 2: In Characters/Bytes~~ ~~// In bytes same as above since sizeof(char) == 1~~ ~~std::string::size_type bytes = s.length() sizeof(std::string::value_type);~~ } ~~For wide character strings:~~ ~~#include <string>~~ ~~int main()~~ { ~~std::wstring s = L"\u304A\u306F\u3088\u3046";~~ ~~std::wstring::size_type length = s.length() * sizeof(std::wstring::value_type); // in bytes~~ } ~~==[[C sharp\|C#]]==~~ ~~[[Category:C sharp\|C#]]~~ ~~'''Platform:''' [[.NET]]~~ ~~'''Language Version:''' 1.0+~~ ~~string s = "Hello, world!";~~ ~~int clength = s.Length; // In characters~~ ~~int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.~~ ~~==[[Clean]]==~~ ~~[[Category:Clean]]~~ ~~Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.~~ ~~import StdEnv~~ ~~strlen :: String -> Int~~ ~~strlen string = size string~~ ~~Start = strlen "Hello, world!"~~ ~~==[[ColdFusion]]==~~ ~~[[Category:ColdFusion]]~~ ~~#len("Hello World")#~~ ~~==[[Common Lisp]]==~~ ~~[[Category:Common Lisp]]~~ ~~(length "Hello World")~~ ~~==[[Component Pascal]]==~~ ~~[[Category:Component Pascal]]~~ ~~LEN("Hello, World!")~~ ~~==[[Forth]]==~~ ~~[[Category:Forth]]~~ ~~'''Interpreter:''' ANS Forth~~ ~~Strings in Forth come in two forms, neither of which are the null-terminated form commonly used in the C standard library.~~ ~~===Counted string===~~ ~~A counted string is a single pointer to a short string in memory. The string's first byte is the count of the number of characters in the string. This is how symbols are stored in a Forth dictionary.~~ ~~CREATE s ," Hello world" \ create string "s"~~ ~~s C@ ( -- length=11 )~~ ~~===Stack string===~~ A string on the stack is represented by a pair of cells: the address of the string data and the length of the string data (in characters). The word '''COUNT''' converts a counted string into a stack string. The STRING utility wordset of ANS Forth works on these addr-len pairs. This representation has the advantages of not requiring null-termination, easy representation of substrings, and not being limited to 255 characters. ~~S" string" ( addr len)~~ ~~DUP . \ 6~~ ~~==[[Haskell]]==~~ ~~[[Category:Haskell]]~~ ~~'''Interpreter:''' [[GHC \| GHCi]] 6.6, [[Hugs]]~~ ~~'''Compiler:''' [[GHC]] 6.6~~ ~~strlen = length "Hello, world!"~~ ~~==[[IDL]]==~~ ~~[[Category:IDL]]~~ ~~'''Compiler:''' any IDL compiler should do~~ ~~length = strlen("Hello, world!")~~ ~~==[[Java]]==~~ ~~[[Category:Java]]~~ Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length method of String objects returns the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number. ~~String s = "Hello, world!";~~ ~~int byteCount = s.length() * 2;~~ ~~An other way to know the byte length of a string is to explicitly specify the charset we desire.~~ ~~String s = "Hello, world!";~~ ~~int byteCountUTF16 = s.getByte("UTF-16").length;~~ ~~int byteCountUTF8 = s.getByte("UTF-8").length;~~ ~~==[[JavaScript]]==~~ ~~[[Category:JavaScript]]~~ JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The length property of string objects gives the number of 16-bit values used to encode a string, so the number of bytes can be determined by doubling that number. ~~var s = "Hello, world!";~~ ~~var byteCount = s.length * 2; //26~~ ~~==[[JudoScript]]==~~ ~~[[Category:JudoScript]]~~ ~~//Store length of hello world in length and print it~~ ~~. length = "Hello World".length();~~ ~~==[[Lua]]==~~ ~~[[Category:Lua]]~~ ~~'''Interpreter:''' [[Lua]] 5.0 or later.~~ ~~string="Hello world"~~ ~~length=#string~~ ~~==[[mIRC Scripting Language]]==~~ ~~[[Category:mIRC Scripting Language]]~~ ~~'''Interpreter:''' [[mIRC]]~~ ~~alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }~~ ~~==[[OCaml]]==~~ ~~[[Category:OCaml]]~~ ~~'''Interpreter'''/'''Compiler:''' [[Ocaml]] 3.09~~ ~~String.length "Hello world";;~~ ~~==[[Perl]]==~~ ~~[[Category:Perl]]~~ ~~'''Interpreter:''' [[perl]] 5.8~~ ~~Strings in Perl consist of characters. Measuring the byte length therefore requires conversion to some binary representation (called encoding, both noun and verb).~~ ~~use utf8; # so we can use literal characters like ☺ in source~~ ~~use Encode qw(encode);~~ ~~print length encode 'UTF-8', "Hello, world! ☺";~~ ~~# 17. The last character takes 3 bytes, the others 1 byte each.~~ ~~print length encode 'UTF-16', "Hello, world! ☺";~~ ~~# 32. 2 bytes for the BOM, then 15 byte pairs for each character.~~ ~~==[[PHP]]==~~ ~~[[Category:PHP]]~~ ~~$length = strlen('Hello, world!');~~ ~~==[[PL/SQL\|PL/SQL]]==~~ ~~[[Category:PL/SQL\|PL/SQL]]~~ ~~DECLARE~~ ~~string VARCHAR2( 50 ) := 'Hello, world!';~~ ~~stringlength NUMBER;~~ ~~BEGIN~~ ~~stringlength := length( string );~~ ~~END;~~ ~~==[[Pop11]]==~~ ~~[[Category:Pop11]]~~ ~~Currently Pop11 supports only strings consisting of 1-byte units.~~ ~~Strings can carry arbitrary binary data, so user can for example~~ ~~use UTF-8 (however builtin procedures will treat each byte as~~ ~~a single character). The length function for strings returns~~ ~~length in bytes:~~ ~~lvars str = 'Hello, world!';~~ ~~lvars len = length(str);~~ ~~==[[Python]]==~~ ~~[[Category:Python]]~~ ~~'''Interpreter:''' [[Python]] 2.4~~ ~~length = len("The length of this string will be determined")~~ ~~==[[Ruby]]==~~ ~~[[Category:Ruby]]~~ ~~string="Hello world"~~ ~~print string.length~~ or ~~puts "Hello World".length~~ ~~==[[Scheme]]==~~ ~~[[Category:Scheme]]~~ ~~(string-length "Hello world")~~ ~~==[[Smalltalk]]==~~ ~~[[Category:Smalltalk]]~~ ~~string := 'Hello, world!".~~ ~~string size.~~ ~~==[[Standard ML]]==~~ ~~[[Category:Standard ML]]~~ ~~'''Interpreter:''' [[Standard ML of New Jersey \| SML/NJ]] 110.60, [[Moscow ML]] 2.01 (January 2004)~~ ~~'''Compiler:''' [[MLton]] 20061107~~ ~~val strlen = size "Hello, world!";~~ ~~==[[Tcl]]==~~ ~~[[Category:Tcl]]~~ ~~Basic version:~~ ~~string bytelength "Hello, world!"~~ ~~or more elaborately, needs '''Interpreter''' any 8.X. Tested on 8.4.12.~~ ~~fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly~~ ~~set s1 "hello, world"~~ ~~set s2 "\u304A\u306F\u3088\u3046"~~ ~~puts [format "length of \"%s\" in bytes is %d" $s1 [string bytelength $s1]]~~ ~~puts [format "length of \"%s\" in bytes is %d" $s2 [string bytelength $s2]]~~ ~~==[[Toka]]==~~ ~~[[Category:Toka]]~~ ~~This will include the terminating 0 in the length.~~ ~~" hello, world!" count~~ ~~==[[UNIX Shell]]==~~ ~~[[Category:UNIX Shell]]~~ ~~With external utilities:~~ ~~'''Interpreter:''' any bourne shell~~ ~~string='Hello, world!'~~ ~~length=`echo -n "$string" \| wc -c \| tr -dc '0-9'`~~ ~~echo $length # if you want it printed to the terminal~~ ~~With SUSv3 parameter expansion modifier:~~ ~~'''Interpreter:''' [[Almquist SHell]] (NetBSD 3.0), [[Bourne Again SHell]] 3.2, [[Korn SHell]] (5.2.14 99/07/13.2), [[Z SHell]]~~ ~~string='Hello, world!'~~ ~~length="${#string}"~~ ~~echo $length # if you want it printed to the terminal~~ ~~==[[VBScript]]==~~ ~~[[Category:VBScript]]~~ ~~LenB(string\|varname)~~ ~~Returns the number of bytes required to store a string in memory~~ ~~Returns null if string\|varname is null~~ ~~==[[xTalk]]==~~ ~~[[Category:xTalk]]~~ ~~'''Interpreter:''' HyperCard~~ ~~put the length of "Hello World"~~ or ~~put the number of characters in "Hello World"~~