Anyone familiar with spreadsheets has probably had to import from or export to a CSV file at one time or another. The CSV file format is very popular for conveyance of two-dimensional data between software applications. Unfortunately, what is considered a valid CSV file varies among applications and systems. This variation can cause incompatibilities between software applications and often considerable frustration.
CSV is widely considered a simple, common format to exchange tabular data with spreadsheet applications such as Microsoft Excel and OpenOffice. It is strictly a two-dimensional format: rows, columns, and cell contents at each intersection. It is not intended to represent three or more dimensions and makes a poor choice when forced to do so.
Many users often attempt to capture multi-dimensional data by creating additional columns (e.g. Phone1, Phone2, Phone3). It is Aligni’s policy to not encourage or support this functionality. Instead, we direct you to the Aligni API to export multi-dimensional data and manipulate it however your application requires.
The official specification for CSV file formats is RFC 4180. We’ve summarized some of the main points below.
- CRLF – In the notation below, “CRLF” refers to a line break or carriage return / line feed. Note that UNIX (including Mac) and Windows systems have slightly different standards for CRLF. Luckily, this is often taken care of at the browser level when uploading or downloading a CSV file.
- Record – A record in a CSV file corresponds to a single row in the source data or spreadsheet and contains one or more fields. In many cases, it will also correspond to a single line of text in the file, but it’s important to note that CSV does support embedded CRLF or line breaks within a field.
- Field – A field in a CSV file corresponds to the contents of a single row/column location in the source data or spreadsheet.
- Each record is located on a separate line, delimited by a CRLF.
- Each record must contain the same number of fields throughout the file.
- Spaces are considered part of a field and are not ignored.
- Each field may or may not be enclosed by double quotes. IMPORTANT NOTE: Microsoft Excel does not use double quotes at all. This may lead to incompatible CSV files that must be edited prior to importing into Aligni.
- Fields containing CRLF, double quotes, and commas must be enclosed in double-quotes.
- Double quote characters within a field must be “escaped” by including an additional double quote character before them.
Aligni considers true (case insensitive) to be in the affirmative for any imported booleans. Any other value will imply false including: false, no, Yes, Tru, true43, Harold, and so on. Aligni will always export true or false (lower case) when exporting booleans.
The term File Encoding or Character Encoding refers to the set of characters that are allowed to be in the file. This is established so that any software reading or writing to that file can operate with the contents. Common examples are: ASCII, ISO-8859 and Unicode. ASCII is a character set common on American computers but is limited to English characters.
Microsoft Excel exports CSV files in ANSI encoding. While ANSI itself is not a character set, it often refers to one of several ISO-8859 encodings which are locale-specific. Excel is supposed to save files in the machine-specific ANSI encoding, but doesn’t always do so due to bugs.
One workaround is to open the resulting CSV file in Notepad, then click “File > Save As” and select UTF-8 as the encoding format.
Importing CSV Files into Aligni
When importing a CSV file, Aligni will attempt to import the file in UTF-8, ASCII, and then ISO8859-1, in that order.
An example CSV file is shown below along with the table form of its parsed contents to illustrate how Aligni will interpret this file. A few points are notable of the contents of this example:
- The description for part 100123 includes a comma so the description field must be enclosed in double quotes.
- The description for part 100124 includes a CRLF so the description field must be enclosed in double quotes.
- The description for part 100125 includes a double quote. Therefore, the quote character is replaced by a two double quotes. This is called “escaping” the double quote.
Part Number,Manufacturer,"Manufacturer P/N" CRLF 100123,Texas Instruments,TPS73218DBV,"1.8 volt regulator, 250 mA" CRLF 100124,Analog Devices,AD860,"Something Second Line Description" CRLF 100125,Consolidated Diversified,3055BK-3IN,"4"" wire segment" CRLF
|PART NUMBER||MANUFACTURER||MANUFACTURER P/N||DESCRIPTION|
|100123||Texas Instruments||TPS73218DBV||1.8 volt regulator, 250 mA|
Second Line Description
|100125||Consolidated Diversified||3055BK-3IN||3.25″ wire segment|
Aligni CSV Usage and Multidimensional Data
CSV has become a very common way to exchange tabular or “tabular-like” data between software applications. It is a familiar and adequate way to represent mostly text in tabular form since it works reasonably well with spreadsheet applications like Excel and, well, who isn’t familiar with Excel? We consider CSV suitable for two-dimensional data sets but draw the line at exporting or importing multi-dimensional data.
For example, consider that a part record has a single manufacturer and that manufacturer has distribution at multiple vendors. If we were to export (or import) data with the vendor information, we would need to represent that in a third dimension (rows are the part records, columns are the associated record attributes). There are several ways to hack this third column into a spreadsheet and we’ve seen them all. It is common, for example, to add “Vendor 1”, “Vendor 2”, “Vendor 3” to the columns. But where does this madness stop? It stops at what most people would consider a reasonable N, until someone says they need N+1. Another solution would be to put all the vendors in a single cell and separate them by a newline. Which method is right? (Hint: they’re both wrong — CSV is not suitable for multidimensional representation)
So our policy is to take a stand against multidimensional data representation in CSV. There are better representations available for that. Our Aligni API is the best way to transfer multi-dimensional data between Aligni and another software application and we encourage that.