String Character Length: Difference between revisions
verified that AWK example only does byte length |
→{{header|C}}: remove incorrect examples |
||
Line 27:
=={{header|C}}==
For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):▼
▲'''Compiler:''' GCC 3.3.3
▲For wide character strings (usually Unicode):
#include <stdio.h>
Line 82 ⟶ 45:
return 0;
}
''TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()''
=={{header|Objective-C}}==
|
Revision as of 23:09, 7 December 2007
You are encouraged to solve this task according to the task description, using any language you may know.
In this task, the goal is to find the character length of a string. This means encodings like UTF-8 need to be handled properly, as there is not necessarily a one-to-one relationship between bytes and characters.
For byte length, see String Byte Length.
ActionScript
myStrVar.length()
Ada
Compiler: GCC 4.1.2
Str : String := "Hello World"; Length : constant Natural := Str'Length;
ALGOL 68
STRING str := "hello, world"; INT length := UPB str; printf(($"Length of """g""" is "g(3)$,str,length))
Result:
Length of "hello, world" is +12
AppleScript
count of "Hello World"
Or:
count "Hello World"
C
Compiler: ???
For wide character strings (usually Unicode uniform-width encodings such as UCS-2 or UCS-4):
#include <stdio.h> #include <wchar.h> int main(void) { wchar_t *s = L"\x304A\x306F\x3088\x3046"; /* Japanese hiragana ohayou */ size_t length; length = wcslen(s); printf("Length in characters = %d\n", length); printf("Length in bytes = %d\n", sizeof(s) * sizeof(wchar_t)); return 0; }
TODO: non-standard library calls for system multi-byte encodings, such as _mbcslen()
Objective-C
// Return the length in unicode characters unsigned length = [@"Hello Word!" length];
C++
Standard: ISO C++ (AKA C++98):
Compiler: g++ 4.0.2
#include <string> // note: not <string.h> int main() { std::string s = "Hello, world!"; // Always in characters == bytes since sizeof(char) == 1 std::string::size_type length = s.length(); // option 1: In Characters/Bytes std::string::size_type size = s.size(); // option 2: In Characters/Bytes }
For wide character strings:
#include <string> int main() { std::wstring s = L"\u304A\u306F\u3088\u3046"; std::wstring::size_type length = s.length(); }
C#
Platform: .NET Language Version: 1.0+
string s = "Hello, world!"; int clength = s.Length; // In characters int blength = System.Text.Encoding.GetBytes(s).length; // In Bytes.
Clean
Clean Strings are unboxed arrays of characters. Characters are always a single byte. The function size returns the number of elements in an array.
import StdEnv strlen :: String -> Int strlen string = size string Start = strlen "Hello, world!"
ColdFusion
#len("Hello World")#
Common Lisp
(length "Hello World")
Component Pascal
LEN("Hello, World!")
E
"Hello World".size()
Forth
The 1994 ANS standard does not have any notion of a particular character encoding, although it distinguishes between character and machine-word addresses. (There is some ongoing work on standardizing an "XCHAR" wordset for dealing with strings in particular encodings such as UTF-8.)
Interpreter: ANS Forth
The following code will count the number of UTF-8 characters in a null-terminated string. It relies on the fact that all bytes of a UTF-8 character except the first have the the binary bit pattern "10xxxxxx".
2 base ! : utf8+ ( str -- str ) begin char+ dup c@ 11000000 and 10000000 <> until ; decimal : count-utf8 ( zstr -- n ) 0 begin swap dup c@ while utf8+ swap 1+ repeat drop ;
Haskell
Compiler: GHC 6.6
The base type Char defined by the standard is already intended for (plain) Unicode characters.
strlen = length "Hello, world!"
IDL
Compiler: any IDL compiler should do
length = strlen("Hello, world!")
Java
Java encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
The length method of String objects gives the number of 16-bit values used to encode a string.
String s = "Hello, world!"; int length = s.length();
Since Java 1.5, the actual number of characters can be determined by calling the codePointCount method.
String str = "\uD834\uDD2A"; //U+1D12A int length1 = str.length(); //2 int length2 = str.codePointCount(0, str.length()); //1
JavaScript
JavaScript encodes strings in UTF-16, which represents each character with one or two 16-bit values. The most commonly used characters are represented by one 16-bit value, while rarer ones like some mathematical symbols are represented by two.
JavaScript has no built-in way to determine how many characters are in a string. However, if the string only contains commonly used characters, the number of characters will be equal to the number of 16-bit values used to represent the characters.
var str1 = "Hello, world!"; var len1 = str1.length; //13 var str2 = "\uD834\uDD2A"; //U+1D12A represented by a UTF-16 surrogate pair var len2 = str2.length; //2
JudoScript
//Store length of hello world in length and print it . length = "Hello World".length();
LSE64
LSE uses counted strings: arrays of characters, where the first cell contains the number of characters in the string.
" Hello world" @ , # 11
Lua
Interpreter: Lua 5.0 or later.
string="Hello world" length=#string
MAXScript
"Hello world".count
mIRC Scripting Language
Interpreter: mIRC
alias stringlength { echo -a Your Name is: $len($$?="Whats your name") letters long! }
OCaml
Interpreter/Compiler: Ocaml 3.09
String.length "Hello world";;
Perl
Interpreter: Perl any 5.X
my $length = length "Hello, world!";
PHP
$length = strlen('Hello, world!');
PL/SQL
DECLARE string VARCHAR2( 50 ) := 'Hello, world!'; stringlength NUMBER; BEGIN stringlength := length( string ); END;
Python
Interpreter: Python 2.4
len() returns the number of characters in a unicode string or plain ascii string. To get the length of encoded string, you have to decode it first:
>>> len('ascii') 5 >>> len(u'\u05d0') # the letter Alef as unicode literal 1 >>> len('\xd7\x90'.decode('utf-8')) # Same encoded as utf-8 string 1
Ruby
Library: active_support
require 'active_support' puts "Hello World".chars.length
Scheme
(string-length "Hello world")
Seed7
length("Hello, world!")
Smalltalk
string := 'Hello, world!". string size.
Standard ML
Interpreter: SML/NJ 110.60, Moscow ML 2.01 (January 2004)
Compiler: MLton 20061107
val strlen = size "Hello, world!";
Tcl
Basic version:
string length "Hello, world!"
or more elaborately, needs Interpreter any 8.X. Tested on 8.4.12.
fconfigure stdout -encoding utf-8; #So that Unicode string will print correctly set s1 "hello, world" set s2 "\u304A\u306F\u3088\u3046" puts [format "length of \"%s\" in characters is %d" $s1 [string length $s1]] puts [format "length of \"%s\" in characters is %d" $s2 [string length $s2]]
UNIX Shell
With external utilities:
Interpreter: any Bourne Shell
string='Hello, world!' length=`echo -n "$string" | wc -c | tr -dc '0-9'` echo $length # if you want it printed to the terminal
With SUSv3 parameter expansion modifier:
Interpreter: Almquist SHell (NetBSD 3.0), Bourne Again SHell 3.2, Korn SHell (5.2.14 99/07/13.2), Z SHell
string='Hello, world!' length="${#string}" echo $length # if you want it printed to the terminal
VBScript
Len(string|varname)
Returns the length of the string|varname Returns null if string|varname is null
XSLT
<?xml version="1.0" encoding="UTF-8"?> ... <xsl:value-of select="string-length('møøse')" />
xTalk
Interpreter: HyperCard
put the length of "Hello World"
or
put the number of characters in "Hello World"
- Programming Tasks
- Solutions by Programming Task
- Split and Needing Review
- ActionScript
- Ada
- ALGOL 68
- AppleScript
- C
- Objective-C
- C++
- C sharp
- Clean
- ColdFusion
- Common Lisp
- Component Pascal
- E
- Forth
- Haskell
- IDL
- Java
- JavaScript
- JudoScript
- LSE64
- Lua
- MAXScript
- MIRC Scripting Language
- OCaml
- Perl
- PHP
- PL/SQL
- Python
- Ruby
- Scheme
- Seed7
- Smalltalk
- Standard ML
- Tcl
- UNIX Shell
- VBScript
- XSLT
- XTalk