Class idna_convert

Description

Encode/decode Internationalized Domain Names.

The class allows to convert internationalized domain names (see RFC 3490 for details) as they can be used with various registries worldwide to be translated between their original (localized) form and their encoded form as it will be used in the DNS (Domain Name System).

The class provides two public methods, encode() and decode(), which do exactly what you would expect them to do. You are allowed to use complete domain names, simple strings and complete email addresses as well. That means, that you might use any of the following notations:

  • www.nörgler.com
  • xn--nrgler-wxa
  • xn--brse-5qa.xn--knrz-1ra.info
Unicode input might be given as either UTF-8 string, UCS-4 string or UCS-4 array. Unicode output is available in the same formats. You can select your preferred format via set_paramter().

ACE input and output is always expected to be ASCII.

  • author: Matthias Sommerfeld <mso@phlylabs.de>
  • version: 0.5.1
  • copyright: 2004-2007 phlyLabs Berlin, http://phlylabs.de

Located in /libraries/simplepie/idn/idna_convert.class.php (line 54)


	
			
Direct descendents
Class Description
Net_IDNA_php4 Adapter class for aligning the API of idna_convert with that of Net_IDNA
Variable Summary
array $NP
mixed $_base
mixed $_damp
mixed $_error
mixed $_initial_n
mixed $_lbase
mixed $_lcount
mixed $_max_ucs
mixed $_ncount
mixed $_sbase
mixed $_scount
mixed $_skew
mixed $_tbase
mixed $_tcount
mixed $_tmax
mixed $_tmin
mixed $_vbase
mixed $_vcount
Method Summary
idna_convert idna_convert ([ $options = false])
string decode (string $input, [ $one_time_encoding = false])
string encode (string $decoded, [ $one_time_encoding = false])
string get_last_error (void 0)
boolean set_parameter (mixed $option, [string $value = false])
void _adapt ( $delta,  $npoints,  $is_first)
array _apply_cannonical_ordering (array $input)
array _combine (array $input)
void _decode ( $encoded)
void _decode_digit ( $cp)
void _encode ( $decoded)
void _encode_digit ( $d)
void _error ([ $error = ''])
integer _get_combining_class (integer $char)
array _hangul_compose (array $input)
array _hangul_decompose (integer $char)
string _nameprep (array $input)
void _ucs4_string_to_ucs4 ( $input)
void _ucs4_to_ucs4_string ( $input)
void _ucs4_to_utf8 ( $input)
void _utf8_to_ucs4 ( $input)
Variables
array $NP = array() (line 63)

Holds all relevant mapping tables, loaded from a seperate file on construct See RFC3454 for details

  • access: private
mixed $_allow_overlong = false (line 90)
mixed $_api_encoding = 'utf8' (line 89)
mixed $_base = 36 (line 69)
mixed $_damp = 700 (line 73)
mixed $_error = false (line 85)
mixed $_initial_bias = 72 (line 74)
mixed $_initial_n = 0x80 (line 75)
mixed $_invalid_ucs = 0x80000000 (line 67)
mixed $_lbase = 0x1100 (line 77)
mixed $_lcount = 19 (line 80)
mixed $_max_ucs = 0x10FFFF (line 68)
mixed $_ncount = 588 (line 83)
mixed $_punycode_prefix = 'xn--' (line 66)
mixed $_sbase = 0xAC00 (line 76)
mixed $_scount = 11172 (line 84)
mixed $_skew = 38 (line 72)
mixed $_strict_mode = false (line 91)
mixed $_tbase = 0x11A7 (line 79)
mixed $_tcount = 28 (line 82)
mixed $_tmax = 26 (line 71)
mixed $_tmin = 1 (line 70)
mixed $_vbase = 0x1161 (line 78)
mixed $_vcount = 21 (line 81)
Methods
Constructor idna_convert (line 94)
idna_convert idna_convert ([ $options = false])
  • $options
decode (line 165)

Decode a given ACE domain name

  • return: Decoded Domain name (UTF-8 or UCS-4)
  • access: public
string decode (string $input, [ $one_time_encoding = false])
  • string $input: Domain name (ACE string) [@param string Desired output encoding, see set_parameter]
  • $one_time_encoding
encode (line 267)

Encode a given UTF-8 domain name

  • return: Encoded Domain name (ACE string)
  • access: public
string encode (string $decoded, [ $one_time_encoding = false])
  • string $decoded: Domain name (UTF-8 or UCS-4) [@param string Desired input encoding, see set_parameter]
  • $one_time_encoding
get_last_error (line 351)

Use this method to get the last error ocurred

  • return: The last error, that occured
  • access: public
string get_last_error (void 0)
  • void 0
set_parameter (line 125)

Sets a new option value. Available options and values:

[encoding - Use either UTF-8, UCS4 as array or UCS4 as string as input ('utf8' for UTF-8, 'ucs4_string' and 'ucs4_array' respectively for UCS4); The output is always UTF-8] [overlong - Unicode does not allow unnecessarily long encodings of chars, to allow this, set this parameter to true, else to false; default is false.] [strict - true: strict mode, good for registration purposes - Causes errors on failures; false: loose mode, ideal for "wildlife" applications by silently ignoring errors and returning the original input instead

  • return: true on success, false otherwise
  • access: public
boolean set_parameter (mixed $option, [string $value = false])
  • mixed $option: Parameter to set (string: single parameter; array of Parameter => Value pairs)
  • string $value: Value to use (if parameter 1 is a string)
_adapt (line 517)

Adapt the bias according to the current code point and position

  • access: private
void _adapt ( $delta,  $npoints,  $is_first)
  • $delta
  • $npoints
  • $is_first
_apply_cannonical_ordering (line 718)

Apllies the cannonical ordering of a decomposed UCS4 sequence

  • return: Ordered USC4 sequence
  • access: private
array _apply_cannonical_ordering (array $input)
  • array $input: Decomposed UCS4 sequence
_combine (line 751)

Do composition of a sequence of starter and non-starter

  • return: Ordered USC4 sequence
  • access: private
array _combine (array $input)
  • array $input: UCS4 Decomposed sequence
_decode (line 360)

The actual decoding algorithm

  • access: private
void _decode ( $encoded)
  • $encoded
_decode_digit (line 540)

Decode a certain digit

  • access: private
void _decode_digit ( $cp)
  • $cp
_encode (line 419)

The actual encoding algorithm

  • access: private
void _encode ( $decoded)
  • $decoded
_encode_digit (line 531)

Encoding a certain digit

  • access: private
void _encode_digit ( $d)
  • $d
_error (line 550)

Internal error handling method

  • access: private
void _error ([ $error = ''])
  • $error
_get_combining_class (line 707)

Returns the combining class of a certain wide char

  • return: Combining class if found, else 0
  • access: private
integer _get_combining_class (integer $char)
  • integer $char: Wide char to check (32bit integer)
_hangul_compose (line 665)

Ccomposes a Hangul syllable

(see http://www.unicode.org/unicode/reports/tr15/#Hangul

  • return: UCS4 sequence with syllables composed
  • access: private
array _hangul_compose (array $input)
  • array $input: Decomposed UCS4 sequence
_hangul_decompose (line 645)

Decomposes a Hangul syllable

(see http://www.unicode.org/unicode/reports/tr15/#Hangul

  • return: Either Hangul Syllable decomposed or original 32bit value as one value array
  • access: private
array _hangul_decompose (integer $char)
  • integer $char: 32bit UCS4 code point
_nameprep (line 561)

Do Nameprep according to RFC3491 and RFC3454

  • return: Unicode Characters, Nameprep'd
  • access: private
string _nameprep (array $input)
  • array $input: Unicode Characters
_ucs4_string_to_ucs4 (line 918)

Convert UCS-4 strin into UCS-4 garray

  • access: private
void _ucs4_string_to_ucs4 ( $input)
  • $input
_ucs4_to_ucs4_string (line 902)

Convert UCS-4 array into UCS-4 string

  • access: private
void _ucs4_to_ucs4_string ( $input)
  • $input
_ucs4_to_utf8 (line 865)

Convert UCS-4 string into UTF-8 string

See _utf8_to_ucs4() for details

  • access: private
void _ucs4_to_utf8 ( $input)
  • $input
_utf8_to_ucs4 (line 788)

This converts an UTF-8 encoded string to its UCS-4 representation By talking about UCS-4 "strings" we mean arrays of 32bit integers representing each of the "chars". This is due to PHP not being able to handle strings with bit depth different from 8. This apllies to the reverse method _ucs4_to_utf8(), too.

The following UTF-8 encodings are supported: bytes bits representation

  1. 7 0xxxxxxx
  2. 11 110xxxxx 10xxxxxx
  3. 16 1110xxxx 10xxxxxx 10xxxxxx
  4. 21 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
  5. 26 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
  6. 31 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
Each x represents a bit that can be used to store character data. The five and six byte sequences are part of Annex D of ISO/IEC 10646-1:2000

  • access: private
void _utf8_to_ucs4 ( $input)
  • $input

Documentation generated on Mon, 25 Jun 2012 13:55:39 -0500 by phpDocumentor 1.4.4