SUPPORT THE WORK

GetWiki

delimiter

ARTICLE SUBJECTS
aesthetics  →
being  →
complexity  →
database  →
enterprise  →
ethics  →
fiction  →
history  →
internet  →
knowledge  →
language  →
licensing  →
linux  →
logic  →
method  →
news  →
perception  →
philosophy  →
policy  →
purpose  →
religion  →
science  →
sociology  →
software  →
truth  →
unix  →
wiki  →
ARTICLE TYPES
essay  →
feed  →
help  →
system  →
wiki  →
ARTICLE ORIGINS
critical  →
discussion  →
forked  →
imported  →
original  →
delimiter
[ temporary import ]
please note:
- the content below is remote from Wikipedia
- it has been imported raw for GetWiki
{{hatnote|This article is about delimiters in computing. For delimiters in actual human use, see Word divider and digit grouping.}}File:csv delimited000.svg|thumb|right|200px|A stylistic depiction of a fragment from a CSV-formatted text file. The commas (shown in red) are used as field delimiters.]]A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams.Federal Standard 1037C - Telecommunications: Glossary of Telecommunication Terms An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values. Another example of a delimiter is the time gap used to separate letters and words in the transmission of Morse code.Delimiters represent one of various means to specify boundaries in a data stream. Declarative notation, for example, is an alternate method that uses a length field at the start of a data stream to specify the number of characters that the data stream contains.BOOK, Rohl, Jeffrey S., Programming in Fortran, Oxford University Press, Oxford Oxfordshire, 1973, 978-0-7190-0555-8, describing the method in Hollerith notation under the Fortran programming language.

Overview

Delimiters can be broken down into:
  • Field and record delimiters; and
  • Bracket delimiters.

Field and record delimiters

Field delimiters separate data fields. Record delimiters separate groups of fields.BOOK, de Moor, Georges J., Progress in Standardization in Health Care Informatics, IOS Press, 1993, 90-5199-114-2, p. 141For example, the CSV file format uses a comma as the delimiter between fields, and an end-of-line indicator as the delimiter between records. For instance:fname,lname,age,salarynancy,davolio,33,$30000erin,borakova,28,$25250tony,raphael,35,$28700specifies a simple flat file database table using the CSV file format.

Bracket delimiters

Bracket delimiters (also called block delimiters, region delimiters or balanced delimiters) mark both the start and end of a region of text.BOOK, Friedl, Jeffrey E. F., Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools, O'Reilly, 2002, 0-596-00289-0, p. 319BOOK, Programming Language Pragmatics, Michael Lee, Scott, Morgan Kaufmann, 1999, 1-55860-442-1, Common examples of bracket delimiters include:BOOK, Programming Perl, Third, O'Reilly, July 2000, 0-596-00027-8, Wall, Larry, Jon Orwant, Larry Wall, Jon Orwant, {| class="wikitable"! Delimiters! style="text-align:left" | Description
! ( )
Bracket#Parentheses .28 .29>Parentheses. The Lisp programming language syntax is cited as recognizable primarily by its use of parentheses.BOOK, Computer-Aided Reasoning: An Approach, Matt, Kaufmann, Springer, 2000, 0-7923-7744-3, p. 3
! { }
Bracket#Curly brackets or braces .7B .7D>curly brackets.MEYER > FIRST = MARK PUBLISHER = OXFORD UNIVERSITY PRESS YEAR = 2005, 978-0-7637-3832-7, references C-style programming languages prominently featuring curly brackets and semicolons.)
! [ ]| Brackets (commonly used to denote a subscript)
! < >
Bracket#Angle brackets or chevrons ⟨ ⟩>Angle brackets.DILLIGAN > FIRST = ROBERT PUBLISHER = OXFORD UNIVERSITY PRESS YEAR = 1998, 978-0-306-45972-6, Describes syntax and delimiters used in HTML.
! " "
string literals.SCHWARTZ TITLE = LEARNING PERL LOCATION = OXFORD OXFORDSHIRE ISBN = 978-0-596-10105-3, Describes string literals.
! ' '| commonly used to denote character literals.
!
processing instructions.WATT TITLE = SAMS TEACH YOURSELF XML IN 10 MINUTES LOCATION = OXFORD OXFORDSHIRE ISBN = 978-0-672-32471-0, Describes XML processing instruction. p. 21.
! /* */
comment (computer programming)>comments in some programming languages.CABRERA > FIRST = HAROLD PUBLISHER = OXFORD UNIVERSITY PRESS YEAR = 2002, 978-1-931836-54-8, Describes single-line and multi-line comments. p. 72.
! | used in some web templates to specify language boundaries. These are also called template delimiters.WEB, Smarty Template Documentation,weblink 2010-03-12, See e.g., Smarty template system documentation, "escaping template delimiters".

Conventions

Computing platforms historically use certain delimiters by convention.WEB, International Organization for Standardization, December 1, 1975,weblink The set of control characters for ISO 646, WEB, International Organization for Standardization, December 1, 1975,weblink ASCII graphic character set, The following tables depict just a few examples for comparison.Programming languages(See also, Comparison of programming languages (syntax)).{| class="wikitable"! !! String Literal !! End of Statement! Pascal
| semicolon
! Python
| end of line (EOL)
Field and Record delimiters (See also, ASCII, Control character).{| class="wikitable"! !! End of Field !! End of Record !! End of File! Unix-like systems including macOS, AmigaOS
Tab key>Tab Line feed >| none
! Windows, MS-DOS, OS/2, CP/M
Tab key>Tab CRLF none (except in CP/M), Control-ZLEWINE > FIRST = DONALD PUBLISHER = OXFORD UNIVERSITY PRESS YEAR = 1991, 978-0-937175-73-6, Describes use of control-z. p. 156,
! Classic Mac OS, Apple DOS, ProDOS, GS/OS
Tab key>Tab Carriage return >| none
! ASCII/Unicode
| FILE SEPARATORPosition 28 (U+001C)

Delimiter collision

Delimiter collision is a problem that occurs when an author or programmer introduces delimiters into text without actually intending them to be interpreted as boundaries between separate regions.BOOK, Friedl, Jeffrey, Mastering Regular Expressions, Oxford University Press, Oxford Oxfordshire, 2006, 978-0-596-52812-6, describing solutions for embedded-delimiter problems p. 472. In the case of XML, for example, this can occur whenever an author attempts to specify an angle bracket character. In most file types there is both a field delimiter and a record delimiter, both of which are subject to collision. In the case of comma-separated values files, for example, field collision can occur whenever an author attempts to include a comma as part of a field value (e.g., salary = "$30,000"), and record delimiter collision would occur whenever a field contained multiple lines. Both record and field delimiter collision occur frequently in text files.In some contexts, a malicious user or attacker may seek to exploit this problem intentionally. Consequently, delimiter collision can be the source of security vulnerabilities and exploits. Malicious users can take advantage of delimiter collision in languages such as SQL and HTML to deploy such well-known attacks as SQL injection and cross-site scripting, respectively.

Solutions

Because delimiter collision is a very common problem, various methods for avoiding it have been invented. Some authors may attempt to avoid the problem by choosing a delimiter character (or sequence of characters) that is not likely to appear in the data stream itself. This ad hoc approach may be suitable, but it necessarily depends on a correct guess of what will appear in the data stream, and offers no security against malicious collisions. Other, more formal conventions are therefore applied as well.

ASCII delimited text

The ASCII and Unicode character sets were designed to solve this problem by the provision of non-printing characters that can be used as delimiters. These are the range from ASCII 28 to 31.{| class="wikitable"! ASCII Dec! Symbol! Unicode Name! Common Name! Usage
! 28! {{resize|200%|␜}}| INFORMATION SEPARATOR FOUR| file separator| End of file. Or between a concatenation of what might otherwise be separate files.
! 29! {{resize|200%|␝}}| INFORMATION SEPARATOR THREE| group separator| Between sections of data. Not needed in simple data files.
! 30! {{resize|200%|␞}}| INFORMATION SEPARATOR TWO| record separator| End of a record or row.
! 31! {{resize|200%|␟}}| INFORMATION SEPARATOR ONE| unit separator| Between fields of a record, or members of a row.
The use of ASCII 31 Unit separator as a field separator and ASCII 30 Record separator solves the problem of both field and record delimiters that appear in a text data stream.Discussion on ASCII Delimited Text vs CSV and Tab Delimited

Escape character

One method for avoiding delimiter collision is to use escape characters. From a language design standpoint, these are adequate, but they have drawbacks:
  • text can be rendered unreadable when littered with numerous escape characters, a problem referred to as leaning toothpick syndrome (due to use of to escape / in Perl regular expressions, leading to sequences such as "//");
  • text becomes difficult to parse through regular expression
  • they require a mechanism to "escape the escapes" when not intended as escape characters; and
  • although easy to type, they can be cryptic to someone unfamiliar with the language.BOOK


, Automating InDesign with Regular Expressions
, Peter
, Kahrel
, O'Reilly
, 2006
, 0-596-52937-6
, p. 11
  • they do not protect against injection attacks {{citation needed|date=March 2014}}

Escape sequence

Escape sequences are similar to escape characters, except they usually consist of some kind of mnemonic instead of just a single character. One use is in string literals that include a doublequote (") character. For example in Perl, the code:print "Nancy said x22Hello World!x22 to the crowd."; ### use x22produces the same output as:print "Nancy said "Hello World!" to the crowd."; ### use escape charOne drawback of escape sequences, when used by people, is the need to memorize the codes that represent individual characters (see also: character entity reference, numeric character reference).

Dual quoting delimiters

In contrast to escape sequences and escape characters, dual delimiters provide yet another way to avoid delimiter collision. Some languages, for example, allow the use of either a single quote (') or a double quote (") to specify a string literal. For example, in Perl:print 'Nancy said "Hello World!" to the crowd.';
produces the desired output without requiring escapes. This approach, however, only works when the string does not contain both types of quotation marks.

Padding quoting delimiters

In contrast to escape sequences and escape characters, padding delimiters provide yet another way to avoid delimiter collision. Visual Basic, for example, uses double quotes as delimiters. This is similar to escaping the delimiter.print "Nancy said ""Hello World!"" to the crowd."
produces the desired output without requiring escapes. Like regular escaping it can, however, become confusing when many quotes are used.The code to print the above source code would look more confusing:print "print ""Nancy said """"Hello World!"""" to the crowd."""

Configurable alternate quoting delimiters

In contrast to dual delimiters, multiple delimiters are even more flexible for avoiding delimiter collision.BOOK, Programming Perl, Third, O'Reilly, July 2000, 0-596-00027-8, Wall, Larry, Jon Orwant, Larry Wall, Jon Orwant, 63, For example, in Perl:print qq^Nancy doesn't want to say "Hello World!" anymore.^;print qq@Nancy doesn't want to say "Hello World!" anymore.@;print qq(Nancy doesn't want to say "Hello World!" anymore.);all produce the desired output through use of quote operators, which allow any convenient character to act as a delimiter. Although this method is more flexible, few languages support it. Perl and Ruby are two that do.BOOK, Programming Perl, Third, O'Reilly, July 2000, 0-596-00027-8, Wall, Larry, Jon Orwant, Larry Wall, Jon Orwant, 62, BOOK, Yukihiro, Matsumoto, Ruby in a Nutshell, O'Reilly, 2001, 0-596-00214-9, In Ruby, these are indicated as general delimited strings. p. 11

Content boundary

A content boundary is a special type of delimiter that is specifically designed to resist delimiter collision. It works by allowing the author to specify a sequence of characters that is guaranteed to always indicate a boundary between parts in a multi-part message, with no other possible interpretation.BOOK, Javvin Technologies, Incorporated, Network Protocols Handbook, Javvin Technologies Inc., 2005, 0-9740945-2-8, p. 26The delimiter is frequently generated from a random sequence of characters that is statistically improbable to occur in the content. This may be followed by an identifying mark such as a UUID, a timestamp, or some other distinguishing mark. Alternatively, the content may be scanned to guarantee that a delimiter does not appear in the text. This may allow the delimiter to be shorter or simpler, and increase the human readability of the document. (See e.g., MIME, Here documents).

Whitespace or indentation

Some programming and computer languages allow the use of whitespace delimiters or indentation as a means of specifying boundaries between independent regions in text.BOOK, 200, Cicling, Computational Linguistics and Intelligent Text Processing, Oxford University Press, Oxford Oxfordshire, 2001, 978-3-540-41687-6, Describes whitespace delimiters. p. 258.

Regular expression syntax

{{see also|Regular expression examples}}In specifying a regular expression, alternate delimiters may also be used to simplify the syntax for match and substitution operations in Perl.BOOK, Friedl, Jeffrey, Mastering Regular Expressions, Oxford University Press, Oxford Oxfordshire, 2006, 978-0-596-52812-6, page 472.For example, a simple match operation may be specified in Perl with the following syntax:$string1 = 'Nancy said "Hello World!" to the crowd.'; # specify a target stringprint $string1 =~ m/[aeiou]+/; # match one or more vowelsThe syntax is flexible enough to specify match operations with alternate delimiters, making it easy to avoid delimiter collision:$string1 = 'Nancy said "http://Hello/World.htm" is not a valid address.'; # target string
print $string1 =~ mweblink # match using alternate regular expression delimiterprint $string1 =~ m{http://}; # same as previous, but different delimiterprint $string1 =~ mweblink # same as previous, but different delimiter.

Here document

A Here document allows the inclusion of arbitrary content by describing a special end sequence. Many languages support this including PHP, bash scripts, ruby and perl. A here document starts by describing what the end sequence will be and continues until that sequence is seen at the start of a new line.Perl operators and precedenceHere is an example in perl:print

- content above as imported from Wikipedia
- "delimiter" does not exist on GetWiki (yet)
- time: 1:23am EDT - Tue, Jul 17 2018
[ this remote article is provided by Wikipedia ]
LATEST EDITS [ see all ]
GETWIKI 09 MAY 2016
GETWIKI 18 OCT 2015
M.R.M. Parrott
Biographies
GETWIKI 20 AUG 2014
GETWIKI 19 AUG 2014
GETWIKI 18 AUG 2014
Wikinfo
Culture
CONNECT