Introduction
csv++ is a superset of CSV which aims to make spreadsheet authoring more akin to traditional software development with a compiler that can output to Excel, Google Sheets or CSV. That means you can edit your spreadsheet in plain text, share functions between various spreadsheets and commit your code to version control.
Installation
Just go to releases on github, download the latest release, unpack it and put it in your $PATH.
Installing From Source
To install from source check out the csv++ repository and run:
cargo install --path .
This assumes you've installed Rust
Google Sheets Setup
To publish to Google Sheets you will need to authenticate using the gcloud
CLI tools. First you
need to install the gcloud CLI and then run:
$ gcloud init
$ gcloud auth login --enable-gdrive-access --update-adc
Language Reference
While csv++ is a superset of CSV, it adds several features that we'll explore.
- Re-usable functions and variables
- A module-loading import system
- The ability to specify cell-level options (text formatting, alignment, etc)
- A
fill
feature that is akin to dragging the bottom-right of a cell in a spreadsheet.
File Structure
Each csv++ source code file contains two distinct parts: an optional code section and a CSV section.
Code Section
The code section is where variables and functions can be defined and subsequently used in the CSV
section. It is terminated by the ---
separator and anything following that will be treated as the
CSV section.
Functions
Functions basically act as macros, where variables are interpolated but they are otherwise inserted into the spreadsheet without being evaluated. The language is a superset of the formula language used in spreadsheets - you should be able to do everything you can in a spreadsheet but also define functions and variables.
Functions are defined by:
fn <function-name> ( <arg1>, <arg2>, ... )
They can have any number of arguments and will be evaluated in the context of the cell in which they are called.
Examples
fn fees(quantity)
quantity * 0.10
fn profit(price, quantity)
(price * quantity) - fees(quantity)
---
Variables
Variables can be defined in the code section by giving a name (a combination of letters, numbers
and underscores), the expression :=
and followed with a value:
<variable-name> := <expression>
To reference a variable you just use it by it's name - there is no special operator to dereference.
Examples
foo := 42
bar := foo * 2
---
=foo,=bar,
will evaluate to:
=42,=(42 * 2),
Cell References
You can use A1-style cell references in the code section and within function definitions. Since the
syntax for variable references and for cell references are overlapping (for example ABC
is both a
valid cell reference and variable reference), csv++ will only interpolate the variable if it is
defined. Otherwise it will be left alone and treated as a cell reference.
CSV Section
The CSV section defines the stucture of your spreadsheet and makes use of the functions and variables defined above.
Variables
An important feature of the CSV section is being able to assign variables to cells and reference
them elsewhere. You do this using the [[var=<var-name>]]
syntax:
profit := income - fee
---
[[var=fee]], [[var=income]], =profit,
which will evaluate to:
,,=(B1 - A1)
Row Variables
You can also assign a variable to reference an entire row using the row option syntax (![[
/]]
):
![[var=row_1]],,,
Fills
A common feature of spreadsheets is to drag the bottom right corner and that formula will be applied
over a range of rows. The ![[fill=<number-of-rows>]]
provides similar functionality.
![[fill=3]]foo,bar,baz,
evaluates to:
foo,bar,baz,
foo,bar,baz,
foo,bar,baz,
which is not very helpful but becomes interesting when you combine it with variables.
# fees are a fixed $2.00 per trade
fees := 2
fn profit(quantity, price_each)
(price_each * quantity) - fees
---
![[fill=3]][[var=number_of_shares]], [[var=price]], "=profit(number_of_shares, price)",
becomes:
,,"=(B1 * A1) - 2",
,,"=(B2 * A2) - 2",
,,"=(B3 * A3) - 2",
A full list of cell & row options will be discussed in the Cell & Row Options section.
Comments
Any line in the CSV section will be treated as a comment and ignored if it starts with a #
:
foo,bar,baz,
# this is a comment
foo1,bar1,baz1,
Multi-line Cells
When using lots of cell options the lines can get pretty long. To help with this you can split a
single spreadsheet row over multiple lines using the \\
syntax like so:
---
![[halign=center \
valign=top \
text=bold \
text=underline \
]] This, Is, A, Header, Row,
[[var=one]] , \
[[var=two]] , \
[[var=three]] , \
[[var=four]] ,
which is equivalent to:
---
![[halign=center valign=top text=bold text=underline]] This, Is, A, Header, Row,
[[var=one]] , [[var=two]] , [[var=three]] , [[var=four]] ,
Modules
Modules are the basic building block of making reusable spreadsheets. Module names are derived from
the name of the file - so for example if you have foo.csvpp
, it's module name is foo
.
Requiring Modules
If you want to re-use variables or functions from another file, you can just import that module:
foo.csvpp
my_constant := 42
---
bar.csvpp
use foo
fn my_fn(a) a * my_constant
---
[[var=a]],=my_fn(a),
Currently when importing a module, all of the variables
The Main Module
Similar to C/Java/Rust etc, csv++ requires that there is a "main" module. This is generally implicit as the file being compiled without a need to specify it. Only the spreadsheet section of the main module will be built - for any required/non-main modules the spreadsheet section will be ignored and only the code section brought into scope.
Variable Scoping
The variable scoping semantics are pretty unique because every function call is evaluated relative
to the cell where it is used. As you've seen so far you can use [[var=...]]
to bind a variable
name to a given cell. As an example of scoping semantics we'll use this csv++ template:
foo_from_code_section := 42
---
[[var=bar_outside_fill]] ,,,,,
![[fill=2]]bar , \
[[var=bar_in_fill]] , \
=bar_in_fill , \
=bar_outside_fill , \
=foo_from_code_section ,
which will compile to:
, , , ,
bar , , =B2 , =A1 , =42
bar , , =B3 , =A1 , =42
Breaking this down:
foo_from_code_section
- Is always42
no matter where it is used.bar_in_fill
- Since it is defined within an![[fill]]
, it's value depends on the final row, which will beB2
orB3
bar_outside_fill
- Will always beA1
, pointing to the cell where it was defined. There is no relative aspect to it since it's not defined in anfill
.
Cell & Row Options
Options
You can change the formatting of a cell or row, bind cells to variables, apply validation, change
alignment, etc. All of the normal rules of CSV apply with the addition that each cell can have
options specified in [[
/]]
for cells and ![[
/]]
for rows:
foo,[[...]]bar,baz
specifying formatting or various other options to the cell. Additionally a row can start with:
![[...]]foo,bar,baz
which will apply to all cells in the row.
All Options
border = all | top | bottom | left | right
Sets a border on the given side (or all four sides if all
). Can be repeated multiple times to set
multiple borders.
Alias
b = a | t | b | l | r
bordercolor = HEX_COLOR
The color of the border, where HEX_COLOR
is a 6-character hex color code.
Alias
bc = HEX_COLOR
borderstyle = dashed | dotted | double | solid | solid_medium | solid_thick
The style of the border. solid
is the default if a border is set and it is not specified.
Alias
bs = dash | dot | dbl | 1 | 2 | 3
color = HEX_COLOR
The color of the cell, where HEX_COLOR
is a 6-character hex color code.
Alias
c = HEX_COLOR
fill
fill = AMOUNT
Duplicate the row AMOUNT
times. If AMOUNT
is not supplied, the row will be repeated for the rest of the sheet.
Alias
f = AMOUNT
(optional)
fontcolor = HEX_COLOR
The color of the font, where HEX_COLOR
is a 6-character hex color code.
Alias
fc = HEX_COLOR
fontfamily = Arial | Helvetica | ...
The font family to use. It must be a valid font, compatible with your target spreadsheet
Alias
ff = FONT_FAMILY
fontsize = INTEGER
The font size to use, as a whole number.
Alias
fs = INTEGER
halign = left | center | right
The horizontal alignment.
Alias
ha = l | c | r
lock
Prevent the cell or row from being modified.
Alias
l
note = STRING
A note to associate with the cell. The STRING
should be quoted with single quotes and you can escape quotes like: note='You\\'re taking a note'
Alias
n = STRING
numberformat = currency | date | datetime | number | percent | text | time | scientific
The number format to apply to the cell.
Alias
nf = c | d | dt | n | p | text | t | s
text = bold | italic | strikethrough | underline
Applies the given format. Can be repeated multiple times to set multiple formats.
Alias
t = b | i | s | u
validate
Validations that can be applied to the data in the cell.
validate=custom(FORMULA)
(alias:validate=c(FORMULA)
)validate=date_after(DATE)
(alias:validate=date_gt(DATE)
)validate=date_before(DATE)
(alias:validate=date_lt(DATE)
)validate=date_between(DATE DATE)
(alias:validate=date_btwn(DATE DATE)
)validate=date_equal_to(DATE)
(alias:validate=date_eq(DATE)
)validate=in_list(..)
validate=in_range(A1)
validate=date_is_valid
(alias:validate=is_date
)validate=is_valid_email
(alias:validate=is_email
)validate=is_valid_url
(alias:validate=is_url
)validate=date_not_between(DATE DATE)
(alias:validate=date_nbtwn(DATE DATE)
)validate=date_on_or_after(DATE)
(alias:validate=date_gte(DATE)
)validate=date_on_or_before(DATE)
(alias:validate=date_lte(DATE)
)validate=number_between(NUMBER NUMBER)
(alias:validate=number_btwn(NUMBER NUMBER)
)validate=number_equal_to(NUMBER)
(alias:validate=number_eq(NUMBER)
)validate=number_greater_than(NUMBER)
(alias:validate=number_gt(NUMBER)
)validate=number_greater_than_or_equal_to(NUMBER)
(alias:validate=number_gte(NUMBER)
)validate=number_less_than(NUMBER)
(alias:validate=number_lt(NUMBER)
)validate=number_less_than_or_equal_to(NUMBER)
(alias:validate=number_lte(NUMBER)
)validate=number_not_between(NUMBER NUMBER)
(alias:validate=number_nbtwn(NUMBER NUMBER)
)validate=number_not_equal_to(NUMBER)
(alias:validate=number_neq(NUMBER)
)validate=text_contains(TEXT)
validate=text_does_not_contain(TEXT)
validate=text_equal_to(TEXT)
(alias:text_eq
)
valign = bottom | center | top
The vertical alignment.
Alias
va = b | c | t
var = VARIABLE_ID
Bind a variable (specified by VARIABLE_ID
) to reference this cell.
Alias
v = VARIABLE_ID
wrap = wrap | overflow | clip
Whether to wrap the text in the cell. By default it is wrap
Alias
w = w | o | c
Examples
- Align the second cell left, align the last cell to the center and make it bold and italicized:
Date,[[ha=l]]Amount,Quantity,[[ha=c t=b t=i]]Price
- Underline and center-align an entire row:
![[ha=c t=u]]Date,Amount,Quantity,Price
- A header for the first row, then some formulas that repeat for each row for the rest of the spreadsheet:
![[ha=c t=b]]Date ,Price ,Quantity ,Profit
![[fill]] ,[[var=price]] ,[[var=quantity]] ,"=MULTIPLY(price, quantity)"
Using the CLI
The most common use of the CLI is just to compile a csv++ source file to you desired target. The
target is inferred from the CLI options. Given that you have your source code in my_sheet.csvpp
:
Building to Excel
$ csvpp -o my_sheet.xlsx my_sheet.csvpp
Building to Google Sheets
$ csvpp --google-sheet-id "the-google-sheet-id" my_sheet.csvpp
This assumes that you have set up Google Sheets access.
Building to CSV
While you will lose all formatting options, you can also compile back to CSV:
$ csvpp -o my_sheet.csv my_sheet.csvpp
Increasing Verbosity
If you're having trouble debugging an issue, you can increase the verbosity of the output by adding
the -vvvv
option.
Coding Conventions
While you're free to use whatever coding conventions make sense for you and your existing spreadsheets, these recommendations provide a good starting point.
Variables and Functions
Variables and functions should be lowercase and contain only letters, numbers and underscores (\_
).
Good
foo := 1
foo_bar := "value"
fn my_long_descriptive_function(a)
a * foo
---
Bad
FOO := 1 # not preferred, but this will compile
foo-bar := "value" # this will not compile!
# we prefer snake_case over camelCase but this will compile
fn myLongDescriptiveFunction(a)
a * foo
---
Module Names
Similar to variables & functions, module names should only contain letters, numbers and underscores
and can be joined with a /
if it resides in a sub-module.
Indentation
Indenting with 2 spaces is preferred.
Common Patterns
Building a Multi-Sheet Workbook
TODO
Referencing Another Sheet in the Same Workbook
TODO
Build Systems
Other than being a compiler csv++ does not ship with any kinds of build systems or package
management. The simplest approach is to use a Makefile
to build your spreadsheets. Here is one
that will build all .csvpp
files in the current directory into CSV and Excel files:
srcs := $(wildcard *.csvpp)
xlsx_files := $(srcs:%.csvpp=%.xlsx)
csv_files := $(srcs:%.csvpp=%.csv)
all: $(xlsx_files) $(csv_files)
%.xlsx: %.csvpp
csvpp $(CSVPPFLAGS) -o $@ $^
%.csv: %.csvpp
csvpp $(CSVPPFLAGS) -o $@ $^
clean:
rm -f *.csv *.csvpo *.xlsx
.PHONY: all clean
You can see more examples in the csv++ examples repo.
Why Rust?
There were several factors deciding to use Rust to write the csv++ compiler:
- Fast
- Fearless concurrency. Compiling a spreadsheet cell-by-cell can take advantage of concurrency, so we needed a language that can support it.
- Can build for native compilation targets. We want to be able to distribute the compiler with zero or minimal dependencies (preferably no runtime) and compile a static binary for each target OS.
- Spreadsheet library support. We need to be able to write Excel files and to the Google Sheets API which means we need a language that has libraries to support that.