Introduction

csv++ is a superset of CSV which aims to make spreadsheet authoring more akin to traditional software development with a compiler that can output to Excel, Google Sheets or CSV. That means you can edit your spreadsheet in plain text, share functions between various spreadsheets and commit your code to version control.

Installation

Just go to releases on github, download the latest release, unpack it and put it in your $PATH.

Installing From Source

To install from source check out the csv++ repository and run:

cargo install --path .

This assumes you've installed Rust

Google Sheets Setup

To publish to Google Sheets you will need to authenticate using the gcloud CLI tools. First you need to install the gcloud CLI and then run:

$ gcloud init
$ gcloud auth login --enable-gdrive-access --update-adc

Language Reference

While csv++ is a superset of CSV, it adds several features that we'll explore.

  • Re-usable functions and variables
  • A module-loading import system
  • The ability to specify cell-level options (text formatting, alignment, etc)
  • A fill feature that is akin to dragging the bottom-right of a cell in a spreadsheet.

File Structure

Each csv++ source code file contains two distinct parts: an optional code section and a CSV section.

Code Section

The code section is where variables and functions can be defined and subsequently used in the CSV section. It is terminated by the --- separator and anything following that will be treated as the CSV section.

Functions

Functions basically act as macros, where variables are interpolated but they are otherwise inserted into the spreadsheet without being evaluated. The language is a superset of the formula language used in spreadsheets - you should be able to do everything you can in a spreadsheet but also define functions and variables.

Functions are defined by:

fn <function-name> ( <arg1>, <arg2>, ... )

They can have any number of arguments and will be evaluated in the context of the cell in which they are called.

Examples

fn fees(quantity) 
  quantity * 0.10

fn profit(price, quantity) 
  (price * quantity) - fees(quantity)
---

Variables

Variables can be defined in the code section by giving a name (a combination of letters, numbers and underscores), the expression := and followed with a value:

<variable-name> := <expression>

To reference a variable you just use it by it's name - there is no special operator to dereference.

Examples

foo := 42
bar := foo * 2
---
=foo,=bar,

will evaluate to:

=42,=(42 * 2),

Cell References

You can use A1-style cell references in the code section and within function definitions. Since the syntax for variable references and for cell references are overlapping (for example ABC is both a valid cell reference and variable reference), csv++ will only interpolate the variable if it is defined. Otherwise it will be left alone and treated as a cell reference.

CSV Section

The CSV section defines the stucture of your spreadsheet and makes use of the functions and variables defined above.

Variables

An important feature of the CSV section is being able to assign variables to cells and reference them elsewhere. You do this using the [[var=<var-name>]] syntax:

profit := income - fee
---
[[var=fee]],    [[var=income]],     =profit,

which will evaluate to:

,,=(B1 - A1)

Row Variables

You can also assign a variable to reference an entire row using the row option syntax (![[/]]):

![[var=row_1]],,,

Fills

A common feature of spreadsheets is to drag the bottom right corner and that formula will be applied over a range of rows. The ![[fill=<number-of-rows>]] provides similar functionality.

![[fill=3]]foo,bar,baz,

evaluates to:

foo,bar,baz,
foo,bar,baz,
foo,bar,baz,

which is not very helpful but becomes interesting when you combine it with variables.

# fees are a fixed $2.00 per trade
fees := 2

fn profit(quantity, price_each)
  (price_each * quantity) - fees

---
![[fill=3]][[var=number_of_shares]],    [[var=price]],  "=profit(number_of_shares, price)",

becomes:

,,"=(B1 * A1) - 2",
,,"=(B2 * A2) - 2",
,,"=(B3 * A3) - 2",

A full list of cell & row options will be discussed in the Cell & Row Options section.

Comments

Any line in the CSV section will be treated as a comment and ignored if it starts with a #:

foo,bar,baz,
# this is a comment
foo1,bar1,baz1,

Multi-line Cells

When using lots of cell options the lines can get pretty long. To help with this you can split a single spreadsheet row over multiple lines using the \\ syntax like so:

---
![[halign=center \
  valign=top \
  text=bold \
  text=underline \
  ]] This,  Is, A,  Header, Row,
[[var=one]] , \
  [[var=two]] , \
  [[var=three]] , \
  [[var=four]] ,

which is equivalent to:

---
![[halign=center valign=top text=bold text=underline]] This,  Is, A,  Header, Row,
[[var=one]] , [[var=two]] , [[var=three]] , [[var=four]] ,

Modules

Modules are the basic building block of making reusable spreadsheets. Module names are derived from the name of the file - so for example if you have foo.csvpp, it's module name is foo.

Requiring Modules

If you want to re-use variables or functions from another file, you can just import that module:

foo.csvpp
my_constant := 42
---
bar.csvpp
use foo

fn my_fn(a) a * my_constant
---
[[var=a]],=my_fn(a),

Currently when importing a module, all of the variables

The Main Module

Similar to C/Java/Rust etc, csv++ requires that there is a "main" module. This is generally implicit as the file being compiled without a need to specify it. Only the spreadsheet section of the main module will be built - for any required/non-main modules the spreadsheet section will be ignored and only the code section brought into scope.

Variable Scoping

The variable scoping semantics are pretty unique because every function call is evaluated relative to the cell where it is used. As you've seen so far you can use [[var=...]] to bind a variable name to a given cell. As an example of scoping semantics we'll use this csv++ template:

foo_from_code_section := 42
---
[[var=bar_outside_fill]]    ,,,,,
![[fill=2]]bar              , \
  [[var=bar_in_fill]]       , \
  =bar_in_fill              , \
  =bar_outside_fill         , \
  =foo_from_code_section    ,

which will compile to:

     ,     ,     ,     ,
bar  ,     , =B2 , =A1 , =42
bar  ,     , =B3 , =A1 , =42

Breaking this down:

  • foo_from_code_section - Is always 42 no matter where it is used.
  • bar_in_fill - Since it is defined within an ![[fill]], it's value depends on the final row, which will be B2 or B3
  • bar_outside_fill - Will always be A1, pointing to the cell where it was defined. There is no relative aspect to it since it's not defined in an fill.

Cell & Row Options

Options

You can change the formatting of a cell or row, bind cells to variables, apply validation, change alignment, etc. All of the normal rules of CSV apply with the addition that each cell can have options specified in [[/]] for cells and ![[/]] for rows:

foo,[[...]]bar,baz

specifying formatting or various other options to the cell. Additionally a row can start with:

![[...]]foo,bar,baz

which will apply to all cells in the row.

All Options

border = all | top | bottom | left | right

Sets a border on the given side (or all four sides if all). Can be repeated multiple times to set multiple borders.

Alias

b = a | t | b | l | r

bordercolor = HEX_COLOR

The color of the border, where HEX_COLOR is a 6-character hex color code.

Alias

bc = HEX_COLOR

borderstyle = dashed | dotted | double | solid | solid_medium | solid_thick

The style of the border. solid is the default if a border is set and it is not specified.

Alias

bs = dash | dot | dbl | 1 | 2 | 3

color = HEX_COLOR

The color of the cell, where HEX_COLOR is a 6-character hex color code.

Alias

c = HEX_COLOR

fill

fill = AMOUNT

Duplicate the row AMOUNT times. If AMOUNT is not supplied, the row will be repeated for the rest of the sheet.

Alias

f = AMOUNT (optional)

fontcolor = HEX_COLOR

The color of the font, where HEX_COLOR is a 6-character hex color code.

Alias

fc = HEX_COLOR

fontfamily = Arial | Helvetica | ...

The font family to use. It must be a valid font, compatible with your target spreadsheet

Alias

ff = FONT_FAMILY

fontsize = INTEGER

The font size to use, as a whole number.

Alias

fs = INTEGER

halign = left | center | right

The horizontal alignment.

Alias

ha = l | c | r

lock

Prevent the cell or row from being modified.

Alias

l

note = STRING

A note to associate with the cell. The STRING should be quoted with single quotes and you can escape quotes like: note='You\\'re taking a note'

Alias

n = STRING

numberformat = currency | date | datetime | number | percent | text | time | scientific

The number format to apply to the cell.

Alias

nf = c | d | dt | n | p | text | t | s

text = bold | italic | strikethrough | underline

Applies the given format. Can be repeated multiple times to set multiple formats.

Alias

t = b | i | s | u

validate

Validations that can be applied to the data in the cell.

  • validate=custom(FORMULA) (alias: validate=c(FORMULA))
  • validate=date_after(DATE) (alias: validate=date_gt(DATE))
  • validate=date_before(DATE) (alias: validate=date_lt(DATE))
  • validate=date_between(DATE DATE) (alias: validate=date_btwn(DATE DATE))
  • validate=date_equal_to(DATE) (alias: validate=date_eq(DATE))
  • validate=in_list(..)
  • validate=in_range(A1)
  • validate=date_is_valid (alias: validate=is_date)
  • validate=is_valid_email (alias: validate=is_email)
  • validate=is_valid_url (alias: validate=is_url)
  • validate=date_not_between(DATE DATE) (alias: validate=date_nbtwn(DATE DATE))
  • validate=date_on_or_after(DATE) (alias: validate=date_gte(DATE))
  • validate=date_on_or_before(DATE) (alias: validate=date_lte(DATE))
  • validate=number_between(NUMBER NUMBER) (alias: validate=number_btwn(NUMBER NUMBER))
  • validate=number_equal_to(NUMBER) (alias: validate=number_eq(NUMBER))
  • validate=number_greater_than(NUMBER) (alias: validate=number_gt(NUMBER))
  • validate=number_greater_than_or_equal_to(NUMBER) (alias: validate=number_gte(NUMBER))
  • validate=number_less_than(NUMBER) (alias: validate=number_lt(NUMBER))
  • validate=number_less_than_or_equal_to(NUMBER) (alias: validate=number_lte(NUMBER))
  • validate=number_not_between(NUMBER NUMBER) (alias: validate=number_nbtwn(NUMBER NUMBER))
  • validate=number_not_equal_to(NUMBER) (alias: validate=number_neq(NUMBER))
  • validate=text_contains(TEXT)
  • validate=text_does_not_contain(TEXT)
  • validate=text_equal_to(TEXT) (alias: text_eq)

valign = bottom | center | top

The vertical alignment.

Alias

va = b | c | t

var = VARIABLE_ID

Bind a variable (specified by VARIABLE_ID) to reference this cell.

Alias

v = VARIABLE_ID

wrap = wrap | overflow | clip

Whether to wrap the text in the cell. By default it is wrap

Alias

w = w | o | c

Examples

  • Align the second cell left, align the last cell to the center and make it bold and italicized:
Date,[[ha=l]]Amount,Quantity,[[ha=c t=b t=i]]Price
  • Underline and center-align an entire row:
![[ha=c t=u]]Date,Amount,Quantity,Price
  • A header for the first row, then some formulas that repeat for each row for the rest of the spreadsheet:
![[ha=c t=b]]Date   ,Price          ,Quantity           ,Profit
![[fill]]           ,[[var=price]]  ,[[var=quantity]]   ,"=MULTIPLY(price, quantity)"

Using the CLI

The most common use of the CLI is just to compile a csv++ source file to you desired target. The target is inferred from the CLI options. Given that you have your source code in my_sheet.csvpp:

Building to Excel

$ csvpp -o my_sheet.xlsx my_sheet.csvpp

Building to Google Sheets

$ csvpp --google-sheet-id "the-google-sheet-id" my_sheet.csvpp

This assumes that you have set up Google Sheets access.

Building to CSV

While you will lose all formatting options, you can also compile back to CSV:

$ csvpp -o my_sheet.csv my_sheet.csvpp

Increasing Verbosity

If you're having trouble debugging an issue, you can increase the verbosity of the output by adding the -vvvv option.

Coding Conventions

While you're free to use whatever coding conventions make sense for you and your existing spreadsheets, these recommendations provide a good starting point.

Variables and Functions

Variables and functions should be lowercase and contain only letters, numbers and underscores (\_).

Good

foo := 1
foo_bar := "value"

fn my_long_descriptive_function(a)
  a * foo
---

Bad

FOO := 1            # not preferred, but this will compile
foo-bar := "value"  # this will not compile!

# we prefer snake_case over camelCase but this will compile
fn myLongDescriptiveFunction(a)
  a * foo
---

Module Names

Similar to variables & functions, module names should only contain letters, numbers and underscores and can be joined with a / if it resides in a sub-module.

Indentation

Indenting with 2 spaces is preferred.

Common Patterns

Building a Multi-Sheet Workbook

TODO

Referencing Another Sheet in the Same Workbook

TODO

Build Systems

Other than being a compiler csv++ does not ship with any kinds of build systems or package management. The simplest approach is to use a Makefile to build your spreadsheets. Here is one that will build all .csvpp files in the current directory into CSV and Excel files:

srcs := $(wildcard *.csvpp)
xlsx_files := $(srcs:%.csvpp=%.xlsx)
csv_files := $(srcs:%.csvpp=%.csv)

all: $(xlsx_files) $(csv_files)

%.xlsx: %.csvpp
	csvpp $(CSVPPFLAGS) -o $@ $^

%.csv: %.csvpp
	csvpp $(CSVPPFLAGS) -o $@ $^

clean:
	rm -f *.csv *.csvpo *.xlsx

.PHONY: all clean

You can see more examples in the csv++ examples repo.

Why Rust?

There were several factors deciding to use Rust to write the csv++ compiler:

  • Fast
  • Fearless concurrency. Compiling a spreadsheet cell-by-cell can take advantage of concurrency, so we needed a language that can support it.
  • Can build for native compilation targets. We want to be able to distribute the compiler with zero or minimal dependencies (preferably no runtime) and compile a static binary for each target OS.
  • Spreadsheet library support. We need to be able to write Excel files and to the Google Sheets API which means we need a language that has libraries to support that.