XML Xdoc - easy data conversion from delimited text to XML
Home

Download/Install

Case Studies

Features

Why XML Xdoc?

Online Manual

Pricing/Purchase

FAQ

CSV File Format

About XML

Feedback/Contact us

CSV and delimited file formats

Delimited files

CSV data is the most common form of delimited file and has fields delimited by a comma. eg data,data,data
In addition to data delimited by a comma it is also possible to have files delimited with other characters such as a tab.

The CSV 'standard' file format

Almost all applications are able to output data in "CSV (comma delimited)(.csv)" format.
However, there is no defined 'standard' for the csv file format and this means that there are variations of how data are formatted.

XML Xdoc is able to process a wide variety of variations of csv, and delimited, file formats.
It even has a special check facility to identify problems.

A simple example

In its basic form a delimited file consits of rows of data fields which are separated by a delimiter.
DataDelimiterData
eg 123,456 or John,72

Think of fence posts and panels: PostPanelPost

Header row

CSV, and other delimited, files can contain an optional header row consisting of the names of the fields in the data following.
XML Xdoc can include, or exclude, a header when converting to XML.
Furthermore, XML Xdoc can also create XML tags using the names supplied in the header row.

Empty data

When data are empty they can be expressed in one of two ways.
eg 123,,456 or 123,"",456

XML Xdoc can handle both these formats of empty data.

Empty space padding

There is no standard as to whether empty spaces should be included, or excluded, from data.
eg 123 , 456 or John , 72

XML Xdoc is able to remove leading spaces, trailing spaces or leading and trailing spaces.

Where standards differ most

The real variation of the CSV, or delimited file format, is in the treatment of how double quotes, commas or delimiters within data are treated.

Here is a three field csv delimited file with the second field containing a comma.
The comma as a data item is surrounded by double quotes.
eg 123,",",456

However, some software would treat the above example as four fields with the second and third fields containing a double quote.

Here is a three field csv delimited file with the second field containing a double quote.
The double quote surrounded by double quotes.
eg 123,""",456

This data can also be represented by escaping the quote
eg 123,",456

Microsoft's Excel spreadsheet has an option to export to comma separated value format. Excel escapes quote literals with two quotes. It also does not quote values that have leading or trailing whitespace.

Some software does not quote literals with two quotes but uses the backslash as an escape character.

XML Xdoc

XML Xdoc uses a working definition of CSV format which has been adopted and used within software such as Microsoft Excel.

  1. Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde).
    In binary mode all characters are accepted, at least in quoted fields.
  2. A field within CSV may be surrounded by double-quotes.
  3. A field within CSV must be surrounded by double-quotes to contain a comma. (or other delimiter char)
  4. A field within CSV must be surrounded by double-quotes to contain an embedded double-quote, represented by a pair of consecutive double-quotes.
    In binary mode you may additionally use the sequence "0 for representation of a NUL byte.
  5. A CSV string may be terminated by 0x0A (line feed) or by 0x0D,0x0A (carriage return, line feed).
www.xdoc.co.uk
Also from Trah, StarterFile: software to autorun files from CD. Copyright © 2001-2006 Trah®