URL encoding

From Rosetta Code
Revision as of 23:50, 1 November 2011 by rosettacode>Vectorious (Added C# solution.)
URL encoding
You are encouraged to solve this task according to the task description, using any language you may know.

The task is to provide a function or mechanism to convert a provided string into URL encoding representation.

In URL encoding, special characters, control characters and extended characters are converted into a percent symbol followed by a two digit hexadecimal code, So a space character encodes into %20 within the string.

For the purposes of this task, every character except 0-9, A-Z and a-z requires conversion, so the following characters all require conversion by default:

  • ASCII control codes (Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
  • ASCII symbols (Character ranges 32-47 decimal (20-2F hex))
  • ASCII symbols (Character ranges 58-64 decimal (3A-40 hex))
  • ASCII symbols (Character ranges 91-96 decimal (5B-60 hex))
  • ASCII symbols (Character ranges 123-126 decimal (7B-7E hex))
  • Extended characters with character codes of 128 decimal (80 hex) and above.


The string "http://foo bar/" would be encoded as "http%3A%2F%2Ffoo%20bar%2F".


  • Lowercase escapes are legal, as in "http%3a%2f%2ffoo%20bar%2f".
  • Some standards give different rules: RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, section 2.3, says that "-._~" should not be encoded. HTML 5, section URL-encoded form data, says to preserve "-._*", and to encode space " " to "+". The options below provide for utilization of an exception string, enabling preservation (non encoding) of particular characters to meet specific standards.


It is permissible to use an exception string (containing a set of symbols that do not need to be converted). However, this is an optional feature and is not a requirement of this task.

See also

URL decoding


Library: AWS

<lang Ada>with AWS.URL; with Ada.Text_IO; use Ada.Text_IO; procedure Encode is

  Normal : constant String := "http://foo bar/";


  Put_Line (AWS.URL.Encode (Normal));

end Encode;</lang> Output:



<lang AutoHotkey>rawURL = http://foo bar/ SetFormat, Integer, Hex Loop Parse, rawURL

  If A_LoopField is not alnum ; not a-zA-Z0-9
       encURL .= "%" . SubStr(Asc(A_LoopField), 3)
  else encURL .= A_LoopField

MsgBox % encURL</lang>


<lang awk>BEGIN { for (i = 0; i <= 255; i++) ord[sprintf("%c", i)] = i }

  1. Encode string with application/x-www-form-urlencoded escapes.

function escape(str, c, len, res) { len = length(str) res = "" for (i = 1; i <= len; i++) { c = substr(str, i, 1); if (c ~ /[0-9A-Za-z]/) #if (c ~ /[-._*0-9A-Za-z]/) res = res c #else if (c == " ") # res = res "+" else res = res "%" sprintf("%02X", ord[c]) } return res }

  1. Escape every line of input.

{ print escape($0) }</lang>

The array ord[] uses idea from Character codes#AWK.

To follow the rules for HTML 5, uncomment the two lines that convert " " to "+", and use the regular expression that preserves "-._*".


<lang c>#include <stdio.h>

  1. include <ctype.h>

char rfc3986[256] = {0}; char html5[256] = {0};

/* caller responsible for memory */ void encode(unsigned char *s, char *enc, char *tb) { for (; *s; s++) { if (tb[*s]) sprintf(enc, "%c", tb[*s]); else sprintf(enc, "%%%02X", *s); while (*++enc); } }

int main() { unsigned char url[] = "http://foo bar/"; char enc[sizeof(url) * 3];

int i; for (i = 0; i < 256; i++) { rfc3986[i] = isalnum(i)||i == '~'||i == '-'||i == '.'||i == '_' ? i : 0; html5[i] = isalnum(i)||i == '*'||i == '-'||i == '.'||i == '_' ? i : (i == ' ') ? '+' : 0; }

encode(url, enc, rfc3986); puts(enc);

return 0; }</lang>


using Qt 4.6 as a library <lang cpp>#include <QByteArray>

  1. include <iostream>

int main( ) {

  QByteArray text ( "http://foo bar/" ) ;
  QByteArray encoded( text.toPercentEncoding( ) ) ;
  std::cout << encoded.data( ) << '\n' ;
  return 0 ;

}</lang> Output:



<lang c sharp>using System;

namespace URLEncode {

   internal class Program
       private static void Main(string[] args)
           Console.WriteLine(Encode("http://foo bar/"));
       private static string Encode(string uri)
           return Uri.EscapeDataString(uri);





<lang go>package main

import (



func main() {

   fmt.Println(url.QueryEscape("http://foo bar/"))


Icon and Unicon

<lang Icon>link hexcvt

procedure main() write("text = ",image(u := "http://foo bar/")) write("encoded = ",image(ue := encodeURL(u))) end

procedure encodeURL(s) #: encode data for inclusion in a URL/URI static en initial { # build lookup table for everything

  en := table()
  every en[c := !string(~(&digits++&letters))] := "%"||hexstring(ord(c),2)
  every /en[c := !string(&cset)] := c

every (c := "") ||:= en[!s] # re-encode everything return c end </lang>

hexcvt provides hexstring


text    = "http://foo bar/"
encoded = "http%3A%2F%2Ffoo%20bar%2F"


J has a urlencode in the gethttp package, but this task requires that all non-alphanumeric characters be encoded.

Here's an implementation that does that:

<lang j>require'strings convert' urlencode=: rplc&((#~2|_1 47 57 64 90 96 122 I.i.@#)a.;"_1'%',.hfd i.#a.)</lang>

Example use:

<lang j> urlencode 'http://foo bar/' http%3A%2F%2Ffoo%20bar%2F</lang>


The built-in URLEncoder in Java converts the space " " into a plus-sign "+" instead of "%20": <lang java>import java.io.UnsupportedEncodingException; import java.net.URLEncoder;

public class Main {

   public static void main(String[] args) throws UnsupportedEncodingException
       String normal = "http://foo bar/";
       String encoded = URLEncoder.encode(normal, "utf-8");





Confusingly, there are 3 different URI encoding functions in JavaScript: escape(), encodeURI(), and encodeURIComponent(). Each of them encodes a different set of characters. See this article and this article for more information and comparisons. <lang javascript>var normal = 'http://foo/bar/'; var encoded = encodeURIComponent(normal);</lang>


Works with: Cocoa version Mac OS X 10.3+

<lang objc>NSString *normal = @"http://foo bar/"; NSString *encoded = [normal stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"%@", encoded);</lang>

The Core Foundation function CFURLCreateStringByAddingPercentEscapes() provides more options.


<lang perl>use URI::Escape;

my $s = 'http://foo/bar/'; print uri_escape($s);</lang>

Use standard CGI module: <lang perl>use 5.10.0; use CGI;

my $s = 'http://foo/bar/'; say $s = CGI::escape($s); say $s = CGI::unescape($s);</lang>

Perl 6

<lang perl6>my $url = 'http://foo bar/';

say $url.subst(/<-[ A..Z a..z 0..9 ]>/, *.ord.fmt("%%%02X"), :g);</lang>




<lang php><?php $s = 'http://foo/bar/'; $s = rawurlencode($s); ?></lang> There is also urlencode(), which also encodes spaces as "+" signs


<lang PicoLisp>(de urlEncodeTooMuch (Str)

           (if (or (>= "9" C "0") (>= "Z" (uppc C) "A"))
              (list '% (hex (char C))) ) )
        (chop Str) ) ) )</lang>


: (urlEncodeTooMuch "http://foo bar/")
-> "http%3A%2F%2Ffoo%20bar%2F"


<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>


<lang python>import urllib s = 'http://foo/bar/' s = urllib.quote(s)</lang> There is also urllib.quote_plus(), which also encodes spaces as "+" signs


CGI.escape encodes all characters except '-.0-9A-Z_a-z'.

<lang ruby>require 'cgi' puts CGI.escape("http://foo bar/").sub("+", "%20")

  1. => "http%3A%2F%2Ffoo%20bar%2F"</lang>

URI.encode_www_form_component is a new method from Ruby 1.9.2. It obeys HTML 5 and encodes all characters except '-.0-9A-Z_a-z' and '*'.

Works with: Ruby version 1.9.2

<lang ruby>require 'uri' puts URI.encode_www_form_component("http://foo bar/").sub("+", "%20")

  1. => "http%3A%2F%2Ffoo%20bar%2F"</lang>

Programs should not call URI.escape (alias URI.encode), because it fails to encode some characters. URI.escape is obsolete since Ruby 1.9.2.


The library encoding.s7i defines functions to handle URL respectively percent encoding. The function toPercentEncoded encodes every character except 0-9, A-Z, a-z and the characters '-', '.', '_', '~'. The function toUrlEncoded works like toPercentEncoded and additionally encodes a space with '+'. Both functions work for byte sequences (characters beyond '\255\' raise the exception RANGE_ERROR). To encode Unicode characters it is necessary to convert them to UTF-8 with striToUtf8 before. <lang seed7>$ include "seed7_05.s7i";

 include "encoding.s7i";

const proc: main is func

   writeln(toPercentEncoded("http://foo bar/"));
   writeln(toUrlEncoded("http://foo bar/"));
 end func;</lang>




<lang tcl># Encode all except "unreserved" characters; use UTF-8 for extended chars.

  1. See http://tools.ietf.org/html/rfc3986 §2.4 and §2.5

proc urlEncode {str} {

   set uStr [encoding convertto utf-8 $str]
   set chRE {[^-A-Za-z0-9._~\n]};		# Newline is special case!
   set replacement {%[format "%02X" [scan "\\\0" "%c"]]}
   return [string map {"\n" "%0A"} [subst [regsub -all $chRE $uStr $replacement]]]

}</lang> Demonstrating: <lang tcl>puts [urlEncode "http://foo bar/"]</lang> Output:



<lang tuscript> $$ MODE TUSCRIPT text="http://foo bar/" BUILD S_TABLE spez_char="::>/:</::<%:" spez_char=STRINGS (text,spez_char) LOOP/CLEAR c=spez_char c=ENCODE(c,hex),c=concat("%",c),spez_char=APPEND(spez_char,c) ENDLOOP url_encoded=SUBSTITUTE(text,spez_char,0,0,spez_char) print "text: ", text PRINT "encoded: ", url_encoded </lang> Output:

text:    http://foo bar/
encoded: http%3A%2F%2Ffoo%20bar%2F