Special characters

From Rosetta Code
Task
Special characters
You are encouraged to solve this task according to the task description, using any language you may know.

List the special characters and escape sequences in the language.

See also: Quotes

Ada

There is no escape sequences in character literals. Any character supported by the source encoding is allowed. The only escape sequence of string literals is "" (doubled double quotation marks) which denotes ". When characters need to be specified by their code positions (in Unicode), this is made using the 'Val attribute: <lang Ada>with Ada.Text_IO; use Ada.Text_IO;

procedure Test is begin

  Put ("Quote """ &  & """" & Character'Val (10));

end Test;</lang> Sample output:

Quote "'"

Note that character and string literals serve all character and string types. For example with Wide_Wide characters (32-bit) and strings: <lang Ada>with Ada.Wide_Wide_Text_IO; use Ada.Wide_Wide_Text_IO;

procedure Test is begin

  Put ("Unicode """ &  & """" & Wide_Wide_Character'Val (10));

end Test;</lang>

ALGOL 68

ALGOL 68 has several built-in character constants. The following characters are (respectively) the representations of TRUE and FALSE, the blank character ".", the character displayed when a number cannot being printed in the width provided. And the null character indicating the end of characters in a BYTES array. <lang algol68>printf(($"flip:"g"!"l$,flip)); printf(($"flop:"g"!"l$,flop)); printf(($"blank:"g"!"l$,blank)); printf(($"error char:"g"!"l$,error char)); printf(($"null character:"g"!"l$,null character))</lang> Output:

flip:T!
flop:F!
blank: !
error char:*!
null character:

To handle the output movement to (and input movement from) a device ALGOL 68 has the following four positioning procedures: <lang algol68>print(("new page:",new page)); print(("new line:",new line)); print(("space:",space)); print(("backspace:",backspace))</lang>These procedures may not all be supported on a particular device.

If a particular device (CHANNEL) is set possible, then there are three built-in procedures that allow movement about this device.

  • set char number - set the position in the current line.
  • reset - move to the first character of the first line of the first page. For example a home or tape rewind.
  • set - allows the movement to selected page, line and character.

ALGOL 68 pre-dates the current ASCII standard, and hence supports many non ASCII characters. Moreover ALGOL 68 had to work on 6-bits per byte hardware, hence it was necessary to be able to write the same ALGOL 68 code in strictly upper-case. Here are the special characters together with their upper-case alternatives (referred to as "worthy characters").

Character ASCII Worthy bold
¢ # CO co
>= GE ge
<= LE le
/= or ~= NE ne
¬ ~ NOT not
\/ OR or
/\ or & AND and
÷ % OVER over
× * TIMES times
** UP up
DOWN down
-> OF of
⊥ or ×+ *+ I i
\ E e
NIL nil
ELEM elem
LWB lwb
UPB upb
LWS lws
UPS ups

Most of these characters made their way into European standard characters sets (eg ALCOR and GOST). Ironically the ¢ character was dropped from later versions of America's own ASCII character set.

The character "⏨" is one ALGOL 68 byte (not two bytes).

AutoHotkey

The escape character defaults to accent/backtick (`).

  • `, = , (literal comma). Note: Commas that appear within the last parameter of a command do not need to be escaped because the program knows to treat them literally. The same is true for all parameters of MsgBox because it has smart comma handling.
  • `% = % (literal percent)
  • `` = ` (literal accent; i.e. two consecutive escape characters result in a single literal character)
  • `; = ; (literal semicolon). Note: This is necessary only if a semicolon has a space or tab to its left. If it does not, it will be recognized correctly without being escaped.
  • `n = newline (linefeed/LF)
  • `r = carriage return (CR)
  • `b = backspace
  • `t = tab (the more typical horizontal variety)
  • `v = vertical tab -- corresponds to Ascii value 11. It can also be manifest in some applications by typing Control+K.
  • `a = alert (bell) -- corresponds to Ascii value 7. It can also be manifest in some applications by typing Control+G.
  • `f = formfeed -- corresponds to Ascii value 12. It can also be manifest in some applications by typing Control+L.
  • Send = When the Send command or Hotstrings are used in their default (non-raw) mode, characters such as {}^!+# have special meaning. Therefore, to use them literally in these cases, enclose them in braces. For example: Send {^}{!}{{}
  • "" = Within an expression, two consecutive quotes enclosed inside a literal string resolve to a single literal quote. For example: Var := "The color ""red"" was found."

AWK

AWK uses the following special characters:

  • {...} body of code ("action", or body of if/for/while block)
  • (...) conditional constructs in for/while loops; arguments to a function, grouping expression components, regular expression subexpression enclosures
  • ; statement separator
  • /.../ regular expression
  • "..." string constant
  • a[b] element b of array a
  • # comment marker
  • $ field reference operator and regular expression anchor
  • * multiplication operator and regular expression operator
  • + addition operator and regular expression operator
  • - subtraction operator, regular expression range operator
  • , separates items in a list
  • . decimal point and regular expression operator
  • / division operator and regular expression enclosure symbol
  • ; statement and rule separator
  • \ begin excape sequence, as in C; e.g. \n, \t, \x1B, \\
  • ^ regular expression anchor and compliment box indicator
  • | regular expression alternation operator
  • ~ containment and non containment operator
  • ++ increment nudge operator
  • -- decrement nudge operator
  • += addition compound assignment operator
  • -= subtraction compound assignment operator
  • *= multiplication compound assignment operator
  • /= division compound assignment operator
  • ^= exponent compound assignment operator
  • %= modulus compound assignment operator

In addition, regular expressions and (s)printf have their own "little languages".

Brainf***

The only characters that mean anything in BF are its commands:

> move the pointer one to the right

< move the pointer one to the left

+ increment the value at the pointer

- decrement the value at the pointer

, input one byte to memory at the pointer

. output one byte from memory at the pointer

[ begin loop if the value at the pointer is not 0

] end loop

All other characters are comments.

C

See C++.

As in C++, ?, #, \, ' and " have special meaning (altogether with { and }). Also trigraphs work (they are an "old" way to avoid the "old" difficulties of finding characters like { } etc. on some keyboards).

C99 standard (but not previous standards) recognizes also universal character names, like C++.

String and character literals are like C++ (or rather the other way around!), and even the meaning and usage of the # character is the same.

C++

C++ has several types of escape sequences, which are interpreted in various contexts. The main characters with special properties are the question mark (?), the pound sign (#), the backslash (\), the single quote (') and the double quote (").

Trigraphs

Trigraphs are certain character sequences starting with two question marks, which can be used instead of certain characters, and which are always and in all contexts interpreted as the replacement character. They can be used anywhere in the source, including, but not limited to string constants. The complete list is:

Trigraph  Replacement letter
  ??(       [
  ??)       ]
  ??<       {
  ??>       }
  ??/       \
  ??=       #
  ??'       ^
  ??!       |
  ??-       ~

Note that interpretation of those trigraphs is the very first step in C++ compilation, therefore the trigraphs can be used instead of their replacement letters everywhere, including in all of the following escape sequences (e.g. instead of \u00CF (see next section) you can also write ??/u00CF, and it will be interpreted the same way).

Also note that some compilers don't interpret trigraphs by default, since today's character sets all contain the replacement characters, and therefore trigraphs are practically not used. However, accidentally using them (e.g. in a string constant) may change the code semantics on some compilers, so one should still be aware of them.

Universal character names and escaping newlines

Moreover, C++ allows to use arbitrary Unicode letters to be represented in the basic execution character set (which is a subset of ASCII), by using a so-called universal character name. Those have one of the forms

\uXXXX
\UXXXXXXXX

where each X is to be replaced by a hex digit. For example, the German umlaut letter ü can be written as

\u00CF

or

\U000000CF

However, letters in the basic execution character set may not be written in this form (but since all those characters are in standard ASCII, writing them as universal character constants would only obfuscate anyway). If the compiler accepts direct usage of of non-ASCII characters somewhere in the code, the result must be the same as with the corresponding universal character name. For example, the following two lines, if accepted by the compiler, should have the same effect: <lang cpp>std::cout << "Tür\n"; std::cout << "T\u00FC\n";</lang> Note that in principle, C++ would also allow to use such letters in identifiers, e.g. <lang cpp>extern int Tür; // if the compiler allows literal ü extern int T\u00FCr; // should in theory work everywhere</lang> but that's not generally supported by existing compilers (e.g. g++ 4.1.2 doesn't support it).

Another escape sequence working everywhere is to escape the newline: If a backslash is at the end of the line, the next line is pasted to it without any space in between. For example: <lang cpp>int const\ ant; // defines a variable of type int named constant, not a variable of type int const named ant</lang>

String and character literal

A string literal is surrounded by double quotes("). A character literal is surrounded by single quotes ('). Example: <lang cpp>char const str = "a string literal"; char c = 'x'; // a character literal</lang>

The following escape sequences are only allowed inside string constants and character constants:

escape seq.  meaning          ASCII character/codepoint
 \a           alert             BEL ^G/7
 \b           backspace         BS  ^H/8
 \f           form feed         FF  ^L/12
 \n           newline           LF  ^J/10
 \r           carriage return   CR  ^M/13
 \t           tab               TAB ^I/9
 \v           vertical tab      VT  ^K/11
 \'           single quote      '           (unescaped ' would end character constant)
 \"           double quote      "           (unescaped " would end string constant)
 \\           backslash         \           (unescaped \ would introduce escape sequence)
 \?           question mark     ?           (useful to break trigraphs in strings)
 \0           string end marker NUL ^@/0    (special case of octal char value)
 \nnn         (octal char value)            (each n must be an octal digit)
 \xnn         (hex char value)              (each n must be a hexadecimal digit)

Note that C++ doesn't guarantee ASCII. On non-ASCII platforms (e.g. EBCDIC), the rightmost column of course doesn't apply. However, \0 unconditionally has the value 0.

Also note that some compilers add the non-standard escape sequence \e for Escape (that is, the ASCII escape character).

The # character

The # character in C++ is special as it is interpreted only in the preprocessing phase, and shouldn't occur (outside of character/string constants) after preprocessing.

  • If # appears as first non-whitespace character in the line, it introduces a preprocessor directive. For example

<lang cpp>#include <iostream></lang>

  • Inside macro definitions, a single # is the stringification operator, which turns its argument into a string. For example:

<lang cpp>#define STR(x) #x int main() {

 std::cout << STR(Hello world) << std::endl; // STR(Hello world) expands to "Hello world"

}</lang>

  • Also inside macro definitions, ## is the token pasting operator. For example:

<lang cpp>#define THE(x) the_ ## x int THE(answer) = 42; // THE(answer) expands to the_answer</lang>

Note that the # character is not interpreted specially inside character or string literals.

E

E uses typical C-style backslash escapes within literals. The defined escapes are:

Sequence Unicode Meaning
\b U+0008 (Backspace)
\t U+0009 (Tab)
\n U+000A (Line feed)
\f U+000C (Form feed)
\r U+000D (Carriage return)
\" U+0022 "
\' U+0027 '
\\ U+005C \
\<newline> None (Line continuation -- stands for no characters)
\uXXXX U+XXXX (BMP Unicode character, 4 hex digits)

Consensus has not been reached on handling non-BMP characters. All other backslash-followed-by-character sequences are syntax errors.

Within E quasiliterals, backslash is not special and $\ plays the same role;

<lang e>? println(`1 + 1$\n= ${1 + 1}`) 1 + 1 = 2</lang>

Forth

When Forth fails to interpret a symbol as a defined word, an attempt is made to interpret it as a number. In numerical interpretation there arise a number of special characters:

<lang forth>

 10   \ single cell number
 -10  \ negative single cell number
 10.  \ double cell number
 10e  \ floating-point number</lang>

Many systems - and the Forth200x standard - extend this set with base prefixes:

<lang forth>

 #10  \ decimal
 $10  \ hex
 %10  \ binary</lang>

Of strings, Forth200x Escaped Strings adds a string-parsing word with very familiar backslashed escapes.

There are otherwise no special characters or escapes in Forth.

Go

Within a character literals and string literals, the backslash is a special character that begins an escape sequence. Examples are '\n' and “\xFF”. These sequences are documented in the language specification.

Special purpose escape sequences are also defined within the context of certain packages in the standard library, such html and regexp.

Go keywords, operators, and delimiters are all predefined are all composed of ASCII characters, however the character encoding of Go source code is specified to be UTF-8. This allows user-defined identifiers and literals to incorporate non-ASCII characters.

Whitespace is generally ignored except as is it delimits tokens, with one exception: Newline is a very special character. As explained by the language specification, translation (that is compilation) involves a step where the tokenizer converts (most) newlines to semicolons, which are then handled as terminators in the grammar of the formal language. Of course you as the programmer, or user of the language, are not involved in this intermediate stage of the compilation process and so the effect you see is somewhat different. The effect for the programmer is that the grammatical structure is partially determined by the 2D layout of the source code.

GUISS

  • , statement separator
  • : Used as a separator (usually between the user interface component and the component name or gist)
  • > Used to specify user input or selected item
  • [ ] Enclosure for symbol or digraph names

Haskell

Comments <lang haskell>-- comment here until end of line {- comment here -}</lang>

Operator symbols (nearly any sequence can be used) <lang haskell>! # $ % & * + - . / < = > ? @ \ ^ | - ~ :

as first character denotes constructor</lang>

Reserved symbol sequences <lang haskell>.. : :: = \ | <- -> @ ~ => _</lang>

Infix quotes <lang haskell>`identifier` (to use as infix operator)</lang>

Characters <lang haskell>'.' \ escapes</lang>

Strings <lang haskell>"..." \ escapes</lang>

Special escapes <lang haskell>\a alert \b backspace \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab</lang>

Other <lang haskell>( ) (grouping) ( , ) (tuple type/tuple constructor) { ; } (grouping inside let, where, do, case without layout) [ , ] (list type/list constructor) [ | ] (list comprehension)</lang>

Unicode characters, according to category: <lang haskell>Upper case (identifiers) Lower case (identifiers) Digits (numbers) Symbol/punctuation (operators)</lang>

HicEst

HicEst has no escape characters. Strings may contain all characters. String constants can be delimited by most non-standard characters, usually ' or ".

  • ! starts a comment. The comment extends to the end of the line.
  • The global variable $ is the current linear left hand side array index in array expressions
  • The global variable $$ is set to the sequence number of either of the last activated toolbar button number, or menu item number, or popup item number
  • If # appears as the first character in a line, it starts the optional appendix section of the script. This terminates the program section. Appendix chapters are not compiled and are therefore not executable. They serve to store information that can be retrieved by the APPENDIX function.

Icon and Unicon

Icon and Unicon strings and csets may contain the following special characters

\b backspace
\d delete
\e escape
\f formfeed
\l linefeed
\n newline
\r return
\t horizontal tab
\v vertical tab
\' single quote
\" double quote
\\ backslash
\ddd octal code
\xdd hexadecimal code
\^c control code

J

The closest thing J has to an escape sequence is that paired quotes, in a character literal, represent a single quote character.

<lang J> NB. empty string

  '    NB. one quote character

'

  '  NB. two quote characters

</lang>

Since it's not clear what "special characters" would mean, in the context of J, here is an informal treatment of J's word forming rules:

Lines are terminated by newline characters, and J sentences are separated by newline characters. J sometimes treats sequences of lines specially, in which case a line with a single right parenthesis terminates the sequence.

A character literal consists of paired quote characters with any other characters between them.

<lang J> 'For example, this is a character literal'</lang>

A numeric literal consists of a leading numeric character (a digit or _) followed by alphanumeric (numeric or alphabetic) characters, dots and spaces. A sequence of spaces will end a numeric literal if it is not immediately followed by a numeric character.

<lang J> 1

  1 0 1 0 1 0 1
  _3.14159e6</lang>

Some numeric literals are not implemented by the language

<lang J> 3l1t3 |ill-formed number</lang>

Words consist of an alphabetic character (a-z or A-Z) followed by alphanumeric characters and optionally followed by a sequence of dots or colons. Words which do not contain . or : can be given definitions by the user. The special word NB. continues to the end of the line and is ignored (it's a comment) during execution.

<lang J> example=: ARGV NB. example and ARGV are user definable words</lang>

Tokens consist of any other printable character optionally followed by a sequence of dots or colons. (Tokens which begin with . or : must be preceded by a space character).

<lang J> +/ .* NB. + / . and * are all meaningful tokens in J</lang>

Java

Math: <lang java>& | ^ ~ //bitwise AND, OR, XOR, and NOT >> << //bitwise arithmetic shift >>> //bitwise logical shift + - * / = % //+ can be used for String concatenation)</lang> Any of the previous math operators can be placed in front of an equals sign to make a self-operation replacement: <lang java>x = x + 2 is the same as x += 2 ++ -- //increment and decrement--before a variable for pre (++x), after for post(x++) == < > != <= >= //comparison</lang> Boolean: <lang java>! //NOT && || //short-circuit AND, OR ^ & | //long-circuit XOR, AND, OR</lang> Other: <lang java>{ } //scope ( ) //for functions

//statement terminator

[ ] //array index " //string literal ' //character literal ? : //ternary operator</lang> Escape characters: <lang java>\b //Backspace \n //Line Feed \r //Carriage Return \f //Form Feed \t //Tab \0 //Null) Note. This is actually a OCTAL escape but handy nonetheless \' //Single Quote \" //Double Quote \\ //Backslash \DDD //Octal Escape Sequence, D is a number between 0 and 7; can only express characters from 0 to 255 (i.e. \0 to \377)</lang> Unicode escapes: <lang java>\uHHHH //Unicode Escape Sequence, H is any hexadecimal digit between 0 and 9 and between A and F</lang> Be extremely careful with Unicode escapes. Unicode escapes are special and are substituted with the specified character before the source code is parsed. In other words, they apply anywhere in the code, not just inside character and string literals. Variable names can contain foreign characters. It also means that you can use Unicode escapes to write any character in the source code, and it would work. For example, you can say \u002b instead of saying + for addition; you can say String\u0020foo and it would be interpreted as two identifiers: String foo; you can even write the entire Java source file with Unicode escapes, as a poor form of obfuscation.

However, this leads to many problems:

  • \u000A will become a line return in the code, which will terminate line-end comments:

<lang java>// hello \u000A this looks like a comment</lang>

is a syntax error, because the part after \u000A is on the next line and no longer in the comment
  • \u0022 will become a double-quote in the code, which ends / begins a string literal:

<lang java>"hello \u0022 is this a string?"</lang>

is a syntax error, because the part after \u0022 is outside the string literal
  • An invalid sequence of \u, even in comments that usually are ignored, will cause a parsing error:

<lang java>/*

* c:\unix\home\
*/</lang>
is a syntax error, because \unix is not a valid Unicode escape, even though you think that it should be inside a comment

JavaScript

See Java

LaTeX

LaTeX has ten special characters: # $ % & ~ _ ^ \ { }

To make any of these characters appear literally in output, prefix the character with a \. For example, to typeset 5% of $10 you would type

<lang latex>5\% of \$10</lang>

Note that the set of special characters in LaTeX isn't really fixed, but can be changed by LaTeX code. For example, the package ngerman (providing German-specific definitions, including easier access to umlaut letters) re-defines the double quote character (") as special character, so you can more easily write German words like "hören" (as h"oren instead of h{\"o}ren).

MUMPS

MUMPS doesn't have any special characters among the printable ASCII set. The double quote character, ", is a bit odd when it is intended to be part of a string. You double it, which can look quite odd when it's adjacent to the delimiting edge of a string.

USER>Set S1="Hello, World!"  Write S1
Hello, World!
USER>Set S2=""Hello, World!"" Write S2
 
SET S2=""Hello, World!"" Write S2
^
<SYNTAX>
USER>Set S3="""Hello, World!"" she typed." Write S3
"Hello, World!" she typed.
USER>Set S4="""""""Wow""""""" Write S4
"""Wow"""

Objeck

<lang objeck> \b //Backspace \n //Line Feed \r //Carriage Return \t //Tab \0 //Null \' //Single Quote \" //Double Q </lang>

Unicode escapes: <lang objeck>\uHHHH //Unicode Escape Sequence, H is any hexadecimal digit between 0 and 9 and between A and F</lang>

OCaml

Character escape sequences <lang ocaml>\\ backslash \" double quote \' single quote \n line feed \r carriage return \t tab \b backspace \ (backslash followed by a space) space \DDD where D is a decimal digit; the character with code DDD in decimal \xHH where H is a hex digit; the character with code HH in hex</lang>

PARI/GP

\e escape
\t tab
\n newline

Any other character that is quoted simply becomes itself. In particular, \" is useful for adding quotes inside strings.

While not a special character as such, whitespace is handled differently in gp than in most languages. While whitespace is said to be ignored in free-form languages, it is truly ignored in gp scripts: the gp parser literally removes whitespace outside of strings. Thus <lang PARI/GP>is square(9)</lang> is interpreted the same as <lang PARI/GP>issquare(9)</lang> or even <lang PARI/GP>iss qua re(9)</lang>

Perl

Assignment operator symbols

  • = assignment operator

Arithmetic operator symbols

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • \ integer division
  • % modulus
  • ** exponent

Numeric Comparative operator symbols

  • == equality
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to
  • <> inequality
  • <=> Tristate comparative

Comment markers

  • # prefixes comments

Concatenation operator symbols

  • . concatenation

Enclosures

Escape sequences

These escape sequences can be used in any construct with interpolation. See Quote-and-Quote-like-Operators for more info.

\t tab (HT,TAB)
\n newline (NL)
\r carriage return (CR)
\f form feed (FF)
\b backspace (BS)
\a alarm (BEL)
\e escape (ESC)
\0?? octal char example: \033 (ESC)
\x?? hex char example: \x1b (ESC)
\x{???} wide hex char example: \x{263a} (SMILEY)
\c? control char example: \c[ (ESC)
\N{U+????} Unicode character example: \N{U+263D} (FIRST QUARTER MOON)
\N{????} named Unicode character example: \N{FIRST QUARTER MOON}, see charnames

Look up operations

  • -> lookup element or associated container reference

Nudge operators

  • ++ incremental nudge operator
  • -- decremental nudge operator

Shift operators

  • << bitshift left (dyadic)
  • >> bitshift right (dyadic)

Combination assignment operators

Arithmetic Combination Assignment Operators

  • += addition
  • -= subtraction
  • *= multiplication
  • /= division
  • **= exponent
  • %= modulus

String Manipulation Combination Assignment Operators

  • x= repetition
  • .= concatenation

Shift Combination Assignment Operators

  • <<= Binary Shift Left
  • >>= Binary Shift Right

Logical Combination Assignment Operators

  • ||= OR
  • &&= AND

Bitwise Combination Assignment Operators

  • |= BWOR
  • &= BWAND
  • ^= BWXOR

Range operator

  • .. range operator

Ellipsis operator

  • ... ellipsis operator

Sequence numbers

  • $ sequence number, sigil, placeholder modifier (in format string)

Default variable

  • $_ default variable

PicoLisp

Markup:
   () []    List
   .        Dotted pair (when surounded by white space)
   "        Transient symbol (string)
   {}       External symbol (database object)
   \        Escape for following character
   #        Comment line
   #{ }#    Comment block


Read macros:
   '        The 'quote' function
   `        Evaluate and insert a list element
   ~        Evaluate and splice a partial list
   ,        Indexed reference

Within strings:
   ^        ASCII control character
   \        At end of line: Continue on next line, skipping white space

Plain TeX

TeX attachs to each character a category code, that determines its "meaning" for TeX. Macro packages can redefine the category code of any character. Ignoring the category code 10 (blank), 11 (letters) and 12 (a category embracing all characters that are not letters nor "special" characters according to TeX) and few more not interesting here, when TeX begins the only characters that have a category code so that we can consider "special" for the purpose of this page, are

  • \ %

Then plainTeX assigns few more (here I don't list some non-printable characters that also get assigned a "special" category code)

  • { } $ & # ^ _ ~

and these all are "inherited" by a lot of other macro packages (among these, LaTeX).


PL/I

PL/I has no escape characters as such. However, in string constants, enclosed in apostrophes or (since PL/I for OS/2) quotation marks, a single apostrophe/quote in the string must be duplicated, thus: <lang PL/I>'Johns pen' which is stored as <<John's pen>> "He said ""Go!"" and opened the door" which is stored as <<He said "Go!" and opened the door>></lang> Of course, in either of the above the string can be enclosed with the "other" delimiter and no duplication is required.

PowerShell

PowerShell is unusual in that it retains many of the escape sequences of languages descended from C, except that unlike these languages it uses a backtick ` as the escape character rather than a backslash \. For example `n is a new line and `t is a tab.

PureBasic

There is no escape sequences in character literals. Any character supported by the source encoding is allowed and to insert the quote (“) sign either the constant #DOUBLEQUOTE$ or the its Ascii-code can be used.

The code is based on readable words and only a semicolon (;) as start-of-comment & a normal colon (:) as command separator are used. <lang PureBasic>a=1  ; The ';' indicates that a comment starts b=2*a: a=b*33  ; b will now be 2, and a=66</lang>

Python

(From the Python Documentation):

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence Meaning Notes
\newline Ignored  
\\ Backslash (\)  
\' Single quote (')  
\" Double quote (")  
\a ASCII Bell (BEL)  
\b ASCII Backspace (BS)  
\f ASCII Formfeed (FF)  
\n ASCII Linefeed (LF)  
\N{name} Character named name in the Unicode database (Unicode only)  
\r ASCII Carriage Return (CR)  
\t ASCII Horizontal Tab (TAB)  
\uxxxx Character with 16-bit hex value xxxx (Unicode only) (1)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only) (2)
\v ASCII Vertical Tab (VT)  
\ooo Character with octal value ooo (3,5)
\xhh Character with hex value hh (4,5)

Notes:

  1. Individual code units which form parts of a surrogate pair can be encoded using this escape sequence.
  2. Any Unicode character can be encoded this way, but characters outside the Basic Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is compiled to use 16-bit code units (the default). Individual code units which form parts of a surrogate pair can be encoded using this escape sequence.
  3. As in Standard C, up to three octal digits are accepted.
  4. Unlike in Standard C, exactly two hex digits are required.
  5. In a string literal, hexadecimal and octal escapes denote the byte with the given value; it is not necessary that the byte encodes a character in the source character set. In a Unicode literal, these escapes denote a Unicode character with the given value.

REXX

Assignment operator symbols

  • = assignment operator

Arithmetic operator symbols

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • % integer division
  • // modulus
  • ** exponent

Comparative operator symbols

  • = equality
  • == strictly equal to
  • \= inequality
  • \== strictly not equal to
  • /==
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to
  • \<
  • \>
  • <> inequality
  • ><
  • << strictly less than
  • >> strictly greater than
  • <<= strictly less than or equal to
  • >>= strictly greater than or equal to
  • \<< strictly not less than
  • \>> strictly not greater than

Concatenation operator symbols

  • || concatenation

Enclosures

The /* and */ symbols are used as enclosures for comments in REXX. The ' and " symbols are used as enclosures for literal strings.

Literal Character Representation

The lowercase x symbol acts as a literal character notation marker, enabling literal characters to be embedded into strings by using hexadecimal representation of character codes:

lf = '0A'x
cr = '0D'x
greeting = '48656C6C6F'x    /* Hello */

Literal character representation digraphs are not supported

The rexx language does not support the use of character representation digraphs (escape sequences) using a backslash symbol.

Logical operator symbols

  • & logical AND
  • | logical OR
  • && logical XOR

Quotation mark duplication

Apostophe duplication

In string constants (enclosed in apostrophes), a literal apostrophe in the string must be duplicated, thus:

'John''s pen' which is stored as <<John's pen>>

Double quote duplication

Also, in string constants (enclosed in double quotes), a single double quote in the string must be duplicated, thus:

"John""s pen" which is stored as <<John"s pen>>

REXX allows strings to be enclosed in either apostrophes or double quotes, so the above examples could be expressed:

"John's pen"  which is stored as <<John's pen>>

and

'John"s pen' which is stored as <<John"s pen>>

Nudge operators

REXX does not support the use nudge operators, so the ++ and -- symbols are not special in REXX.

Compound assignment operators

REXX does not support compound assignment operators, so the +=, -=, *=, and /= digraphs are not special in REXX.

Seed7

Within a character literals and string literals, the backslash is a special character that begins an escape sequence:

    audible alert    BEL      \a    backslash    (\)   \\
    backspace        BS       \b    apostrophe   (')   \'   
    escape           ESC      \e    double quote (")   \"
    formfeed         FF       \f
    newline          NL (LF)  \n    control-A          \A
    carriage return  CR       \r      ...
    horizontal tab   HT       \t    control-Z          \Z
    vertical tab     VT       \v

Additionally the following escape sequences can be used:

  • Two backslashes with an integer literal between them are interpreted as character with the specified ordinal number. Note that the integer literal is interpreted decimal unless it is written as based integer.
  • Two backslashes with a sequence of blanks, horizontal tabs, carriage returns and new lines between them are completely ignored. The ignored characters are not part of the string. This can be used to continue a string in the following line. Note that in this case the leading spaces in the new line are not part of the string. Although this possibility exists also for character literals it makes more sense to use it with string literals.

Tcl

As documented in man Tcl, the following special characters are defined: <lang Tcl>{...}  ;# group in one word, without substituting content; nests "..."  ;# group in one word, with substituting content [...]  ;# evaluate content as script, then substitute with its result; nests $foo  ;# substitute with content of variable foo $bar(foo) ;# substitute with content of element 'foo' of array 'bar' \a  ;# audible alert (bell) \b  ;# backspace \f  ;# form feed \n  ;# newline \r  ;# carriage return \t  ;# Tab \v  ;# vertical tab \\  ;# backslash \ooo  ;# the Unicode with octal value 'ooo' \xhh  ;# the character with hexadecimal value 'hh' \uhhhh  ;# the Unicode with hexadecimal value 'hhhh'

  1. ;# if first character of a word expected to be a command, begin comment
         ;# (extends till end of line)

{*}  ;# if first characters of a word, interpret as list of words to substitute,

         ;# not single word (introduced with Tcl 8.5)</lang>

TXR

Text not containing the character @ is a TXR query representing a match that text. The sequence @@ encodes a single literal @.

All other special syntax is introduced by @:

  • @# comment
  • @\n # escaped character, embedded into surrounding text. Similar to C escapes, with \e for ASCII ESC.
  • @\x1234 @\1234 Hex or octal escapes: Unicode width, not byte.
  • @symbol variable ref
  • @*symbol variable ref with longest match semantics
  • @{symbol expr ...} variable ref extended syntax
  • @expr directive

Where expr is Lispy syntax which can be an atom, or a list of atoms or lists in parentheses, or possibly a dotted list (terminated by an atom other than nil):

  • (elem1 elem2 ... elemn) proper
  • (elem1 elem2 ... elemn . atom) dotted

Atoms can be:

  • ABc123_4 symbols, represented by tokens consisting of letters, underscores and digits, beginning with a letter. Symbols have packages, e.g., system:foo, but this is not accessible from the TXR lexical conventions.
  • :FoO42 keyword symbols, denoted by colon, which is not part of the symbol name.
  • "string literals"
  • `quasi @literals` with embedded @ syntax
  • 'c' characters
  • 123 integers
  • /reg/ regular expressions

Within literals and regexes:

  • \r various backslash escapes similar to C
  • \\ single backslash

Within literals, quasiliterals and character constants:

  • \' \" \` escape any of the quotes: not available within regex.

The regex syntax is fairly standard fare, with these extensions:

  • ~R complement of R: set of strings other than those that match R
  • R%S match shortest number of repetitions of R prior to S.
  • R&S match R and S simultaneously: the intersection of the set of strings matching S and the set matching R.
  • [] empty class; match nothing, not even the empty string.

UNIX Shell

The Bourne shell treats the following as special characters:

  • # comment marker
  • " interpolated string enclosure
  • ! logical not (within a test command), compliment box operator
  • $ variable referencing prefix
  • & referencing open file descriptors and background process marker
  • ' non interpolated string enclosure
  • * filename and string matching wildcard
  • . inclusion command
  • / pathname separator
  • : parameter expansion and do nothing command
  • ; command separator
  • = assignment and parameter expansion operator
  • \ escape sequence prefix
  • * wildcard metacharacter
  • ? wildcard metacharacter
  • ) switch conditional component
  • ` external command enclosure
  • | pipeline connector
  • - parameter expansion operator
  • + parameter expansion operator
  • *) switch conditional component
  • #! hashbang
  • ;; switch conditional component
  • [ ] test command substitute and character range enclosures
  • ( ) subshell execution
  • `( )` external subshell execution
  • [! ] compliment box enclosures
  • { } code block enclosures and variable name isolation operator
  • << here document operator
  • >> append redirection operator
  • $* single element command line expansion special variable
  • $@ multiple element command line expansion special variable
  • $# number of command line parameters special variable
  • ${ } variable name isolator
  • :- parameter expansion operator
  • := parameter expansion operator
  • :+ parameter expansion operator
  • :? parameter expansion operator

The Korn shell, Bourne Again Shell and Posix shell provide the following additional special characters:

  • - korn shell unary arithmetic operator
  • { brace expansion marker
  • ~ home directory expansion operator
  • && Extended syntax for execute if true (on success)
  • || Extended syntax for execute if false (on failure)
  • $( ) Extended syntax for external command capture construct
  • -- Extended syntax marker for end of command line switches
  • == bash specific feature
  • [[ ]] bash specific feature
  • (( )) arithmetic expansion enclosures

XSLT

XSLT is based on XML, and so has the same special characters which must be escaped using character entities:

  • & - &amp;
  • < - &lt;
  • > - &gt;
  • " - &quot;
  • ' - &apos;

Any Unicode character may also be represented via its decimal code point (&#nnnn;) or hexadecimal code point (&#xdddd;).