blob: 69d0419448b95a2fe2c44b16c67665ef5980576c [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="texttables-chapt"
xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:ns5="http://www.w3.org/1999/xhtml"
xmlns:ns4="http://www.w3.org/2000/svg"
xmlns:ns3="http://www.w3.org/1998/Math/MathML"
xmlns:ns2="http://www.w3.org/1999/xlink"
xmlns:ns="http://docbook.org/ns/docbook">
<title xml:id="texttables-title">Text Tables</title>
<subtitle>Text Tables as a Standard Feature of Hsqldb</subtitle>
<info>
<authorgroup>
<author>
<personname><firstname>Bob</firstname>
<surname>Preston</surname></personname>
<affiliation>
<orgname>The HSQL Development Group</orgname>
</affiliation>
</author>
<author>
<personname><firstname>Fred</firstname>
<surname>Toussi</surname></personname>
<affiliation>
<orgname>The HSQL Development Group</orgname>
</affiliation>
</author>
</authorgroup>
<releaseinfo>$Revision: 3601 $</releaseinfo>
<pubdate>$Date: 2010-05-31 20:17:47 -0400 (Mon, 31 May 2010) $</pubdate>
<keywordset>
<keyword>HyperSQL</keyword>
<keyword>HSQLDB</keyword>
<keyword>SQL</keyword>
<keyword>Text</keyword>
<keyword>Tables</keyword>
</keywordset>
<legalnotice>
<para>Copyright 2002-2010 Bob Preston and Fred Toussi. Permission is
granted to distribute this document without any alteration under the
terms of the HSQLDB license. Additional permission is granted to the
HSQL Development Group to distribute this document with or without
alterations under the terms of the HSQLDB license.</para>
</legalnotice>
</info>
<section xml:id="texttables_overview-sect">
<title>Overview</title>
<para>Text Table support for HSQLDB was originally developed by Bob
Preston independently from the Project. Subsequently Bob joined the
Project and incorporated this feature into version 1.7.0, with a number of
enhancements, especially the use of conventional SQL commands for
specifying the files used for Text Tables.</para>
<para>In a nutshell, Text Tables are CSV or other delimited files treated
as SQL tables. Any ordinary CSV or other delimited file can be used. The
full range of SQL queries can be performed on these files, including
SELECT, INSERT, UPDATE and DELETE. Indexes and unique constraints can be
set up, and foreign key constraints can be used to enforce referential
integrity between Text Tables themselves or with conventional
tables.</para>
<para>The delimited file can be created by the engine, or an existing file
can be used.</para>
<para>HyperSQL with Text Table support is the only comprehensive solution
that employs the power of SQL and the universal reach of JDBC to handle
data stored in text files.</para>
</section>
<section xml:id="texttables_impl-sect">
<title>The Implementation</title>
<section>
<title>Definition of Tables</title>
<para>Text Tables are defined similarly to conventional tables with the
added TEXT keyword:</para>
<programlisting> CREATE TEXT TABLE &lt;tablename&gt; (&lt;column definition&gt; [&lt;constraint definition&gt;])</programlisting>
<para>The table is at first empty and cannot be written to. An
additional SET command specifies the file and the separator character
that the Text table uses:</para>
<programlisting> SET TABLE &lt;tablename&gt; SOURCE &lt;quoted_filename_and_options&gt; [DESC]</programlisting>
<para>Text Tables cannot be created in <glossterm>mem:</glossterm>
(all-in-memory) databases (databases that have no script file).</para>
</section>
<section>
<title>Scope and Reassignment</title>
<itemizedlist>
<listitem>
<para>A Text table without a file assigned to it is READ ONLY and
EMPTY.</para>
</listitem>
<listitem>
<para>Reassigning a Text Table definition to a new file has
implications in the following areas:</para>
<orderedlist>
<listitem>
<para>The user is required to be an administrator.</para>
</listitem>
<listitem>
<para>Existing transactions are committed at this point.</para>
</listitem>
<listitem>
<para>Constraints, including foreign keys referencing this
table, are kept intact. It is the responsibility of the
administrator to ensure their integrity.</para>
</listitem>
</orderedlist>
<para>The new source file is scanned and indexes are built when it
is assigned to the table. At this point any violation of NOT NULL,
UNIQUE or PRIMARY KEY constraints are caught and the assignment is
aborted. However, foreign key constraints are not checked at the
time of assignment or reassignment of the source file.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Null Values in Columns of Text Tables</title>
<itemizedlist>
<listitem>
<para>Empty fields are treated as NULL. These are fields where there
is nothing or just spaces between the separators.</para>
</listitem>
<listitem>
<para>Quoted empty strings are treated as empty strings.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Configuration</title>
<para>The default field separator is a comma (,). A different field
separator can be specified within the SET TABLE SOURCE statement. For
example, to change the field separator for the table mytable to a
vertical bar, place the following in the SET TABLE SOURCE statement, for
example:</para>
<informalexample>
<programlisting> SET TABLE mytable SOURCE "myfile;fs=|"</programlisting>
</informalexample>
<para>Since HSQLDB treats CHAR and VARCHAR strings the same, the ability
to assign a different separator to the latter is provided. When a
different separator is assigned to a VARCHAR, it will terminate any CSV
field of that type. For example, if the first field is CHAR, and the
second field VARCHAR, and the separator fs has been defined as the pipe
(|) and vs as the period (.) then the data in the CSV file for a row
will look like:</para>
<screen> First field data|Second field data.Third field data</screen>
<para>This facility in effect offers an extra, special separator which
can be used in addition to the global separator. The following example
shows how to change the default separator to the pipe (|), VARCHAR
separator to the period (.) within a SET TABLE SOURCE statement:</para>
<informalexample>
<programlisting> SET TABLE mytable SOURCE "myfile;fs=|;vs=."</programlisting>
</informalexample>
<para>HSQLDB also recognises the following special indicators for
separators:</para>
<variablelist>
<title>special indicators for separators</title>
<varlistentry>
<term>\semi</term>
<listitem>
<para>semicolon</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\quote</term>
<listitem>
<para>single-quote</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\space</term>
<listitem>
<para>space character</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\apos</term>
<listitem>
<para>apostrophe</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\n</term>
<listitem>
<para>newline - Used as an end anchor (like $ in regular
expressions)</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\r</term>
<listitem>
<para>carriage return</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\t</term>
<listitem>
<para>tab</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\\</term>
<listitem>
<para>backslash</para>
</listitem>
</varlistentry>
<varlistentry>
<term>\u####</term>
<listitem>
<para>a Unicode character specified in hexadecimal</para>
</listitem>
</varlistentry>
</variablelist>
<para>Furthermore, HSQLDB provides csv file support with three
additional boolean options: <varname>ignore_first</varname>,
<varname>quoted</varname> and <varname>all_quoted</varname>. The
<varname>ignore_first</varname> option (default false) tells HSQLDB to
ignore the first line in a file. This option is used when the first line
of the file contains column headings. The <varname>all_quoted</varname>
option (default false) tells the program that it should use quotes
around all character fields when writing to the source file. The
<varname>quoted</varname> option (default true) uses quotes only when
necessary to distinguish a field that contains the separator character.
It can be set to false to prevent the use of quoting altogether and
treat quote characters as normal characters. These options may be
specified within the <literal>SET TABLE SOURCE</literal>
statement:</para>
<programlisting> SET TABLE mytable SOURCE "myfile;ignore_first=true;all_quoted=true"</programlisting>
<para>When the default options <literal>all_quoted=</literal>
<literal>false</literal> and <literal>quoted=true</literal> are in
force, fields that are written to a line of the csv file will be quoted
only if they contain the separator or the quote character. The quote
character is doubled when used inside a string. When
<literal>all_quoted=false</literal> and <literal>quoted=false</literal>
the quote character is not doubled. With this option, it is not possible
to insert any string containing the separator into the table, as it
would become impossible to distinguish from a separator. While reading
an existing data source file, the program treats each individual field
separately. It determines that a field is quoted only if the first
character is the quote character. It interprets the rest of the field on
this basis.</para>
<para>The character encoding for the source file is<literal> ASCII
</literal>by default. To support UNICODE or source files prepared with
different encodings this can be changed to <literal>UTF-8</literal> or
any other encoding. The default is <literal>encoding=ASCII </literal>and
the option <literal>encoding=UTF-8</literal> or other supported
encodings can be used.</para>
<para>Finally, HSQLDB provides the ability to read a text file as READ
ONLY, by placing the keyword "DESC" at the end of the SET TABLE SOURCE
statement:</para>
<programlisting> SET TABLE mytable SOURCE "myfile" DESC</programlisting>
<para>Text table source files are cached in memory. The maximum number
of rows of data that are in memory at any time is controlled by the
<varname>textdb.cache_scale</varname> property. The default value for
<varname>textdb.cache_scale</varname> is 10 and can be changed by
setting the property in the .properties file for the database. The
number of rows in memory is calculated as 3*(2**scale), which translates
to 3072 rows for the default textdb.cache_scale setting (10). The
property can also be set for individual text tables:</para>
<programlisting> SET TABLE mytable SOURCE "myfile;ignore_first=true;all_quoted=true;cache_scale=12"</programlisting>
</section>
<section xml:id="disconnecting_text_tables">
<title>Disconnecting Text Tables</title>
<para>Text tables may be <glossterm>disconnected</glossterm> from their
underlying data source, i.e. the text file.</para>
<para>You can explicitly disconnect a text table from its file by
issuing the following statement: <programlisting> SET TABLE mytable SOURCE OFF</programlisting></para>
<para>Subsequently, <literal>mytable</literal> will be empty and
read-only. However, the data source description will be preserved, and
the table can be re-connected to it with <programlisting> SET TABLE mytable SOURCE ON</programlisting></para>
<para>When a database is opened, if the source file for an existing text
table is missing the table remains disconnected from its data source,
but the source description is preserved. This allows the missing source
file to be added to the directory and the table re-connected to it with
the above command.</para>
<para>Disconnecting text tables from their source has several uses.
While disconnected, the text source can be edited outside HSQLDB
provided data integrity is respected. When large text sources are used,
and several constraints or indexes need to be created on the table, it
is possible to disconnect the source during the creation of constraints
and indexes and reduce the time it takes to perform the
operation.</para>
</section>
</section>
<section xml:id="texttables_issues-sect">
<title>Text File Usage</title>
<para>The following information applies to the usage of text
tables.</para>
<itemizedlist>
<title>Text File Issues</title>
<listitem>
<para>File locations are restricted to below the directory that
contains the database, unless the
<varname>textdb.allow_full_path</varname> property is set true as a
Java system property. This feature is for security, otherwise an admin
database user may be able to open random files.</para>
</listitem>
<listitem>
<para>Blank lines are allowed anywhere in the text file, and are
ignored.</para>
</listitem>
<listitem>
<para>It is possible to define a primary key, identity column, unique,
foreign key and check constraints for text tables.</para>
</listitem>
<listitem>
<para>When a table source file is used with the<literal>
ignore_first=true </literal>option, the first, ignored line is
replaced with a blank line after a SHUTDOWN COMPACT, unless the SOURCE
HEADER statement has been used.</para>
</listitem>
<listitem>
<para>An existing table source file may include CHARACTER fields that
do not begin with the quote character but contain instances of the
quote character. These fields are read as literal strings.
Alternatively, if any field begins with the quote character, then it
is interpreted as a quoted string that should end with the quote
character and any instances of the quote character within the string
is doubled. When any field containing the quote character or the
separator is written out to the source file by the program, the field
is enclosed in quote character and any instance of the quote character
inside the field is doubled.</para>
</listitem>
<listitem>
<para>Inserts or updates of CHARACTER type field values are allowed
with strings that contains the linefeed or the carriage return
character. This feature is disabled when both quoted and all_quoted
properties are false.</para>
</listitem>
<listitem>
<para>ALTER TABLE commands that add or drop columns or constraints
(apart from check constraints) are not supported with text tables that
are connected to a source. First use the SET TABLE &lt;name&gt; SOURCE
OFF, make the changes, then turn the source ON.</para>
</listitem>
</itemizedlist>
</section>
<section xml:id="texttables_globalprops-sect">
<title>Text File Global Properties</title>
<itemizedlist>
<title>Complete list of supported global properties in *.properties
files</title>
<listitem>
<para><varname>textdb.fs</varname></para>
</listitem>
<listitem>
<para><varname>textdb.vs</varname></para>
</listitem>
<listitem>
<para><varname>textdb.quoted</varname></para>
</listitem>
<listitem>
<para><varname>textdb.all_quoted</varname></para>
</listitem>
<listitem>
<para><varname>textdb.ignore_first</varname></para>
</listitem>
<listitem>
<para><varname>textdb.encoding</varname></para>
</listitem>
<listitem>
<para><varname>textdb.cache_scale</varname></para>
</listitem>
<listitem>
<para><varname>textdb.allow_full_path</varname></para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Transactions</title>
<para>Text tables fully support transactions. New or changed rows that
have not been committed are not updated in the source file. Therefore the
source file always contains committed rows.</para>
<para>However, text tables are not as resilient to machine crashes as
other types of tables. If the crash happens while the text source is being
written to, the text source may contain only some of the changes made
during a committed transaction. With other types of tables, additional
mechanisms ensure the integrity of the data and this situation will not
arise.</para>
</section>
</chapter>