The Passerine Codex

An Extensive Overview of Passerine

Foreword

By Isaac Clayton

Passerine is in an early stage of development, so it's hard to pin down exactly what it will become. In this respect, we have a single point on the wall: as there are a lot of lines that can be drawn through a point, there are a lot of different directions this language could take.

So in the following foreword, I would like to lay out my history with computers, and discuss what I've come to learn about them. Through this, I hope to put another metaphorical 'point on the wall': Two points form a line, of course, and I'm hoping the the line drawn by this codex and the code we have now will be enough to find the 'figure in the marble' and chisel out the rest of Passerine.

Too heavy on the metaphor? Probably.

I first thought of developing a programming language for myself when I was around 12 or 13. At the time, I had just stumbled across lisp, having learned Python a few years before (by working through Downey's Think Python). I was quite enthralled by the possibilities lisp opened in my mind: if programs are sets of transformations on data (Functional Programming), and the programs themselves are just data (Metaprogramming), what can't we create? Wanting to learn more, I learned Common Lisp (through On Lisp), and them Scheme (through the Wizard Book, which I'm still particularly fond of).

So, with lisp on my mind, I wrote one. It was nothing more than a toy, of course—A thin veneer of a language tacked onto Norvig's Scheme in Python—but it was a language nonetheless.

Lisp, of course, is more a category of languages than a single standard; the featureless landscape of parenthesis, while beautiful in all its symmetries, lacks the benefit concise notation brings. I extended the small lisp I wrote earlier into a language called Aspen: a small scheme with inferred parenthesis and richer syntax. This endeavor was simplistic at best—I felt I had exhausted most of what lisp had to offer for me at that time—

So I moved onto other languages. I juggled pointers with C, categorized with Haskell, tore apart data structures with OCaml, threw exceptions with Java (then threw Java out the window), marched fractals in GLSL, beamed messages around with Erlang and Elixir, reinforced agents in Python; this is just the tip of the iceberg. Languages were the tools I used to rip through the universe of computation, exposing new corners for breaking down abstractions, if only to build them back up a bit more airtight later, perhaps. Once you begin to understand how one language maps to another—everything old being new again—minimal languages that exemplify a core aspect of computation stand out above the rest.

The brilliance of lisp stems from the fact that it is, at its core, a LISt Processing language. Everything in lisp is built as a graph of cons—list cells. Because the abstract syntax tree (AST) of the language itself is nested lists of lists, and lisp is a language built around easily manipulating lists, it's no wonder that lisp's macro system is so powerful. Lisp is specifically built for manipulating lists, and lisp itself is lists!

One thing that I think is viewed incorrectly about lisps, though, is homoiconicity, or at least the usual way the word is interpreted. After all, Homoiconicity isn’t the point.

For a metaprogramming system to be powerful, the language representation should match the representation that the language itself is designed to manipulate. Lisp's macro system is powerful because lisp is designed to operate on lists.

What matters in a metaprogramming system is selecting a language representation that the language is built to manipulate. Lisp seems to be one of the only languages to get this right. Lists, however, are not the best tool for modeling a wide range of structures and behaviors.

Data should be presented in the simplest form that still preserves visual uniqueness. The semantic operators that determine how the underlying data is processed should be distinctive enough to easily be recognizable and actionable:

"[This engages] the brain's visual processing machinery, which operates largely subconsciously, and tells the consciousness part of what it sees."

— Guido van Rossum

If what matters in metaprogramming is self-representation, then requiring everything to share the same syntax—as lisp does—seems a tad counterproductive. Our brains come equipped with rich built-in visual processors: by having code and data appear to be the same, it's hard to get a feel for what the code is doing by examining the shape of it.

Abstract data types (ADTs)—structs and enums—are a much richer way to represent data, and ergo programming languages. As lisp is built on the axioms of list processing, MLs (as in Standard ML & OCaml) are built on the axioms of ADTs: pattern matching, type inference, construction.

I guess I find it a little surprising, then, that lisp-style macro systems built on manipulating ADTs are uncommon in ML-style languages. It seems like the natural next step after lisp. Lisp is built for manipulating lists and has a powerful macro system built on it. ADTs are richer than lists, ML is built for manipulating ADTs — wouldn't an ML macro system built on ADTs rival lisp's expressive power?

Languages are, after all, just tools to express computation: no matter how hard we try, there will never be a language that won't make us clarify our ideas.

But we can, at the very least, build languages that make those ideas easier to express.

About two years ago now (as of 2020), I stumbled across Rust (I was 14 at the time). Rust opened my mind yet again to another world of possibilities. Ownership and borrowing allow us to express fine-grained control over resources. Being functional yet imperative, Rust also challenged my assumptions about what it meant to follow a paradigm. In the words of Saoirse:

Pure functional programming is an ingenious trick to show you can code without mutation. Rust is an even cleverer trick to show you can just have mutation.

— Saoirse (withoutboats)

Rust is now my preferred programming language. I'm not going to wax poetic and tell you that it's the best language in existence, but it's certainly a very very nice one. Although it may be superficially complex, it's conceptually elegant (and I mean that in the best way possible). More importantly, the community that has developed around Rust is exceptional, and I'm grateful for the interactions I've had (and have yet to have had) with the Rust community.

Now, what does any of this have to do with Passerine?

Designing a language without taking inspiration from others is a fool's errand. I hope I've shown you the path I have followed to reach the point where I'm at now. Making a programming language is a bit like making a soup: take a little of everything, let it simmer for a while, and serve it while it's hot.

So: over the past few years or so, an idea for a novel programming language has been simmering in my mind. This language is the compounded framework of ideas I've developed since I've started programming, boiled down to a refined core.

This language, of course, is Passerine: a small core language that is easily extensible, with roots in Scheme and ML-flavored languages. Above all else, Passerine is the tool I've always wished I had. I find it to be useful for me, and I hope that it will be useful for you too.

— Isaac Clayton (slightknack)

The Road Ahead

A winding mountain desert road thing, much like the journey we're about to have!

This codex is organized into a bunch of sections. Each section builds on the last. By the end, you should have a pretty good picture of what Passerine is, and what it aspires to be.

We'll start by exploring the language the functional foundations on which Passerine is build. We'll then work towards patterns of building up and tearing data.

With the core in place, it's time so shift gears. We'll rip through Passerine's trait and effect systems, and learn that - maybe - they're not so different after all. We'll also discuss how types and effects can be composed for greater control over the runtime and execution environment.

With that sprint behind us, we'll then take a short detour, and chat about modules and package management. Here you'll meet aspen, and get some real dirt under your nails as we write a few projects together.

With practical experience under our belts, some hidden patterns will slowly become obvious. In the final section of this Codex, we'll dive deep into the heart of macros and compile-time evaluation, and leverage the ultimate abstraction to bend Passerine to our will. If that doesn't sound crazy, I don't know what is!

It's going to be a wild ride, dear reader, so you'd better buckle up!

Getting Started

In this chapter, we will install Passerine and run a simple program to ensure that everything is working.

Installation

Passerine is still very much so a work in progress. We've done a lot, but there's still a so much more to do!

For you pioneers out there, The best way to get a feel for Passerine is to install Aspen¹, Passerine's package manager and CLI.

If you use a *nix-style² system, run³:

sh <(curl -sSf https://www.passerine.io/install.sh)
  1. If you're having trouble getting started, reach out on the community Discord server.
  2. Tested on Arch (btw) and macOS.
  3. Now tested on Windows™!
  4. (Also, experimentally supports Wasm.)
  5. In the future, we plan to distribute prebuilt binaries, but for now, both Git and Cargo are required.

Hello, world!

Let's test our installation!

Open up a terminal and create a new Passerine project in a directory called hello_world by running:

$ aspen new hello_world

Alternatively, you can run aspen new to create a new project in the current directory.

Enter the directory, and open main.pn using your favorite IDE or text editor.

You should see:

println "Hello, Passerine!"

You can execute this file in the terminal by running:

$ aspen run

You should see the output:

Hello, Passerine

Now, let's move on to the core language!

The Core Language

In this chapter, we will take a look at the core language features offered by Passerine.

Syntax

The goal of Passerine's syntax is to make all expressions as concise as possible while still conserving the 'feel' of different types of expressions.

We'll start simple; here's a function that squares a number:

square = x -> x * x
square 4 -- is 16

By the way: Passerine uses -- comment for single-line comments and -{ comment }- for nestable multi-line comments.

There are already some important things we can learn about Passerine from this short example:

Like other programming languages, Passerine uses = for assignment. On the left hand side is a pattern – in this case, just the variable square – which destructures an expression into a set of bindings. On the right hand side is an expression; in this case the expression is a function definition.

Because Passerine is expression-oriented, the distinction between statements and expressions isn't made. In the case that an expression produces no useful value, it should return the Unit type, (). Assignment, for instance, returns Unit.

The function call square 4 may look a bit alien to you; this is because Passerine uses whitespace for function calls. A function call takes the form l e₀ ... eₙ, where l is a function and e is an expression. square 4 is a simple case because square only takes one argument, x... let's try writing a function that takes two arguments!

Using our definition of square, here's a function that returns the Euclidean distance between two points:

distance = (x1, y1) (x2, y2) -> {
    sqrt (square (x1 - x2) + square (y1 - y2))
}

origin = (0, 0)
distance origin (3, 4) -- is 5

Passerine is an expression-oriented language, because of this, it makes sense that all functions are anonymous. All functions take the form p₀ ... pₙ -> e, where p is a pattern and e is an expression. If a function takes multiple arguments, they can be written one after another; a b c -> d is equivalent to a -> (b -> (c -> d)).

The function distance is a bit more complex that square, because the two arguments are bound using tuple destructuring.

As you can see, distance takes two pairs, (x1, y1) and (x2, y2), and destructures each pair into its component parts. For instance, when we call distance in distance origin (3, 4) the function pulls out the numbers that make up the pair:

  • origin, i.e. (0, 0), is matched against (x1, y1), creating the bindings x1 = 0 and y1 = 0.
  • the tuple (3, 4) is matched against (x2, y2), creating the bindings x2 = 3 and y2 = 4.

The body of distance, sqrt (...) is then evaluated in a new scope where the variables defined about are bound. In the case of the above example:

-- call and bind
distance origin (3, 4)
distance (0, 0) (3, 4)

-- substitute and evaluate
sqrt (square (0 - 3) + square (0 - 4))
sqrt (9 + 5)
sqrt 25
5

Now, you may have noticed that distance is actually two functions. It may be more obvious it we remove some syntactic sugar rewrite it like so:

distance = (x1, y1) -> { (x2, y2) -> { ... } }

The first function binds the first argument, then returns a new function that binds the second argument, which evaluates to a value. This is known as currying, and can be really useful when writing functional code.

To leverage currying, function calls are left-associative. The call a b c d is equivalent to ((a b) c) d, not a (b (c d)). This syntax comes from functional languages like Haskell and OCaml, and makes currying (partial application) quite intuitive.

In the above example, we used distance to measure how far away (3, 4) was from the origin. Coincidentally, this is known as the length of a vector. Wouldn't it be nice if we could define length in terms of distance?

length = distance origin
length (5, 12) -- is 13

Because distance is curried, we can call it with only one of its arguments. For this reason, length is a function that takes a pair and returns its distance from the origin. In essence, we can read the above definition of length as:

length is the distance from origin.

Transforming data through the use of and functions and pattern matching is a central paradigm of Passerine. In the following sections, we'll dive deep and show how this small core language is enough to build a powerful and flexible language.

A Quick(-sort) Example

Here's another slightly more complex example – a recursive quick-sort with questionable pivot selection:

sort = list -> match list {
    -- base case
    [] -> []

    -- pivot is head, tail is remaining
    [pivot, ..tail] -> {
        higher = filter { x -> x >= pivot } tail
        lower  = filter { x -> x <  pivot } tail

        (sorted_lower, sorted_higher) = (sort lower, sort higher)

        sorted_lower
            + [pivot]
            + sorted_higher
    }
}

The first thing that you should notice is the use of a match expression. Like ML-style languages, Passerine makes extensive use of pattern matching and destructuring as a driver of computation. A match expression takes the form:

match value {
    pattern₀ -> expression₀
    ...
    patternₙ -> expressionₙ
}

Each pattern -> expression pair is a branch – each value is against each branch in order until a branch successfully matches and evaluates – the match expression takes on the value of the matching branch. We'll take a deep dive into match statements later, so keep this in mind.

You might've also noticed the use of curly braces { ... } after [head, ..tail]. This is a block, a group of expressions executed one after another. Each expression in a block is separated by a newline or semicolon; the block takes on the value of its last expression.

The next thing to notice is this line:

(sorted_lower, sorted_higher) = (sort lower, sort higher)

This is a more complex assignment than the first one we saw. In this example, the pattern (sorted_lower, sorted_higher) is being matched against the expression (sort lower, sort higher). This pattern is a tuple destructuring, if you've ever used Python or Rust, I'm sure you're familiar with it. This assignment is equivalent to:

sorted_lower  = sort lower
sorted_higher = sort higher

There's no real reason to use tuple destructuring here – idiomatically, just using sort lower and sort higher is the better solution. We discuss pattern matching in depth in the next section.

Passerine also supports higher order functions (this should come as no surprise):

filter { x -> x >= pivot } tail

filter takes a predicate (a function) and an iterable (like a list), and produces a new iterable where the predicate is true for all items. Although parenthesis could be used to group the inline function definition after filter, it's stylistically more coherent to use blocks for regions of computation. What's a region of computation? A region of computation is a series of multiple expressions, or a single expression that creates new bindings, like an assignment or a function definition.

Passerine also allows lines to be split around operators to break up long expressions:

sorted_lower
    + [pivot]
    + sorted_higher

Although this is not a particularly long expression, splitting up lines by operations can help improve the legibility of some expressions.

Function Application

Before we move on, here's a clever implementation of FizzBuzz in Passerine:

fizzbuzz = n -> {
    test = d s x
        -> if n % d == 0 {
            _ -> s + x ""
        } else { x }

    fizz = test 3 "Fizz"
    buzz = test 5 "Buzz"
    "{n}" . fizz (buzz (i -> i))
}

1..100 . fizzbuzz . print

. is the function application operator:

-- the composition
a . b c . d

-- is left-associative
(a . (b c)) . d

-- and equivalent to
d ((b c) a)

Pattern Matching

In the last section, we touched on pattern matching a little. I hope to now go one step further and build a strong argument as to why pattern matching in Passerine is such a powerful construct. Patterns are used in in three places:

  1. Assignments,
  2. Functions,
  3. and Type Definitions.

We'll briefly discuss each type of pattern and the context in which they are used.

What are patterns?

Patterns extract data from types by mirroring the structure of those types. The act of applying a pattern to a type is called matching or destructuring – when a pattern matches some data successfully, a number of bindings are produced.

Passerine supports algebraic data types, and all of these types can be matched and destructured against. Here is a table of Passerine's patterns:

In the following table, p is a nested sub-pattern.

patternexampledestructures
variablefooTerminal pattern, binds an variable to a value.
data420.69Terminal pattern, data that must match, raises an error otherwise. See the following section on fibers and concurrency to get an idea of how errors work in Passerine.
discard_Terminal pattern, matches any data, does not produce a binding.
labelBaz pMatches a label, i.e. a named type in Passerine.
tuple(p₀, ...)Matches each element of a tuple, which is a group of elements, of potentially different types. Unit () is the empty tuple.
list[]/[p₀, ..p₁][] Matches an empty list - p₀ matches the head of a list, ..p₁ matches the tail.
record{f₀: p₀, ...}A record, i.e. a struct. This is a series of field-pattern pairs. If a field does not exist in the target record, an error is raised.
enum{p₀; ...}An enumeration, i.e. a union. Matches if any of the patterns hold.
isp₀: p₁A type annotation. Matches against p₀ only if p₁ holds, errors otherwise.
wherep \| eA bit different from the other patterns so far. Matches p only if the expression e is true.

That's quite a lot of information, so let's work through it. The simplest case is a standard assignment:

a = b
-- produces the binding a = b

This is very straightforward and we've already covered this, so let's begin by discussing matching against data. The following function will return the second argument if the first argument passed to the function is true.

true second -> second

If the first argument is not true, say false, Passerine will yell at us:

Fatal Traceback, most recent call last:
In src/main.pn:1:1
   |
 1 | (true second -> second) false "Bananas!"
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
In src/main.pn:1:2
   |
 1 | (true second -> second) false "Bananas!"
   |  ^^^^
   |
Runtime Pattern Matching Error: The data 'false' does not match the expected data 'true'

Discard is another simple pattern – it does nothing. It's most useful when used in conjunction with other patterns:

-- to ensure an fruit has the type Banana:
banana: Banana _ = mystery_fruit

-- to ignore an item in a tuple:
(_, plays_tennis, height_in_feet) = ("Juan Milan", true, 27.5)

-- to ignore a field on a record:
{ name: "Isaac Clayton", age: _, skill } = isaac

A label is a name given to a type. Of course, names do not imply type safety, but they do do a darn good job most of the time:

-- make a soft yellow banana:
banana = Banana ("yellow", "soft")

-- check that the banana flesh is soft:
if { Banana (_, flesh) = banana; flesh == "soft" } {
    print "Delicioso!"
}

Pattern matching on labels is used to extract the raw data that is used to construct that label.

Tuples are fairly simple – we already covered them – so we'll cover records next. A record is a set of fields:

-- define the Person type
type Person {
    name:  String,
    age:   Natural,
    skill, -- polymorphic over skill
}

-- Make a person. It's me!
isaac = Person {
    name:  "Isaac Clayton",
    age:   16,
    skill: Wizard "High enough ¯\_(ツ)_/¯",
}

Here's how you can pattern match on isaac:

Person {
    -- aliasing field `name` as `full_name`
    name: full_name,
    -- `age` is ignored
    age: _,
    -- short for `skill: skill`
    skill,
} = isaac

Of course, the pattern after the field is a full pattern, and can be matched against further.

Is is a type annotation:

Banana (color, _): Banana (_, "soft") = fruit

In this example, color will be bound if fruit is a Banana whose 1nd† tuple item is "soft".

† Read as 'firnd', corresponds to the 1-indexed second item. Zerost, firnd, secord, thirth, fourth, fifth...

Finally, we'll address my favorite pattern, where. Where allows for arbitrary code check the validity of a pattern. This can go a long way. For example, let's define natural numbers in terms of integers:

type Natural n: Integer | n >= 0

This should be interpreted as:

The type Natural is an Integer n where n is greater than 0.

With this definition in place:

-- this is valid
seven = Natural 7

-- this is not
negative_six = Natural -6

Where clauses in patterns ensure that the underlying data of a type can never break an invariant. As you can imagine, this is more powerful than ensuring name-safety through type constructors.

TODO: traits and impls.

Pattern matching and algebraic data types allow for quickly building up and tearing down expressive data schemas. As data (and the transformation applied to it) are the core of any program, constructs for quickly building up and tearing down complex datatypes are an incredible tool for scripting expressive applications.

Fibers

What do prime sieves, exceptions, and for-loops have in common? If you guessed concurrency, you won a bajillion points! Structured concurrency is a difficult problem to tackle, given how pervasive it is in the field language design.

It's important to point out that concurrency is not the same thing as parallelism. Concurrent systems may be parallel, but that's not always the case. Passerine subscribes to the coroutine model of structured concurrency – more succinctly, Passerine uses fibers – as exemplified by Wren. A fiber is a lightweight process of execution that is cooperatively scheduled with other fibers. Each fiber is like a little system unto itself that can pass messages to other fibers.

TODO: Algebraic Effects?

Concurrency

A fiber is a lightweight process of execution that is cooperatively scheduled with other fibers. Each fiber is like a little system unto itself that can pass messages to other fibers.

To create a fiber, use the fiber keyword:

counter = fiber {
    i = 0
    loop { yield i; i = i + 1 }
}

print counter () -> prints 0
print counter () -> prints 1
print counter () -> ...

The yield keyword suspends the current fiber and returns a value to the calling fiber. yield can also be used to pass data into a fiber.

passing_yield = fiber {
    print "hello"
    result = yield "banana"
    print result
    "yes"
}

passing_yield "first"  -- prints "hello" then evaluates to "banana"
passing_yield "second" -- prints "second" then evaluates to "yes"
passing_yield "uh oh"  -- raises an error, fiber is done

To build more complex systems, you can build fibers with functions:

-- a function that returns a fiber
flopper = this that -> fiber {
    loop {
        yield this
        yield that
    }
}

apple_banana = flopper "Apple" "Banana"

apple_banana () -- evaluates to "Apple"
apple_banana () -- evaluates to "Banana"
apple_banana () -- evaluates to "Apple"
apple_banana () -- ... you get the point

Of course, the possibilities are endless.

In addition, fibers, while usually being ran in the context of another, all act as peers to each-other. If you have a reference to a fiber, it's possible to transfer to it forgetting the context in which it was called. To switch to a fiber, use switch.

banana_and_end = fiber {
    print "banana ending!"
}

print "the beginning..."
switch banana_and_end
print "the end."

"the end." is never displayed.

'Tis not the end, 'tis but the beginning... 'tis hackin' time!

Error handling

Passerine uses a combination of exceptions and algebraic data types to handle errors. Errors that are expected to happen should be wrapped in a Result type:

validate_length = n -> {
    if length n < 5 {
        Result.Error "It's too short!"
    } else {
        Result.Ok n
    }
}

Some errors, however, are unexpected. There are an uncountable number of ways software can fail; to account for all external circumstances is flat-out impossible in some cases.

For this reason, in the case that something that isn't expected to fail fails, an exception is raised. For example, trying to open a file that should exist may throw an error if it has been lost, moved, corrupted, or otherwise has incorrect permissions.

config = Config.parse (open "config.toml")

. is the indexing operator. On a list or tuple, items.0 returns the zerost item; on a record, record.field returns the specified field; on a label, Label.method returns the specified associated method.

The reason we don't always need to handle these errors is because Passerine follows a fail-fast, fail-safe philosophy. In this regard, Passerine subscribes to the philosophy of Erlang/Elixir:

"Keep calm and let it crash."

The good news is that crashes are local to the fiber they occur in – a single fiber crashing does not bring down the whole system. The idiomatic way to handle an operation that we know may fail is to try it. try performs the operation in a new fiber and converts any exceptions that may occur into a Result:

config = match try (open "config.toml") {
    Result.Ok    file  -> Config.parse file
    Result.Error error -> Config.default ()
}

We know that some functions may raise errors, but how can we signal that something exceptionally bad has happened? We use the error keyword!

doof = platypus -> {
    if platypus == "Perry" {
        -- crashes the current fiber
        error "What!? Perry the platypus!?"
    } else {
        -- oh, it's just a platypus...
        work_on_inator ()
    }
}

-- oh no!
inator = doof "Perry"

Note that the value of raised errors can be any data. This allows for programmatic recovery from errors:

-- socket must not be disconnected
send_data = (
    socket
    data
) -> match socket.connection {
    -- `Disconnected` is a labeled record
    Option.None -> error Disconnected {
        socket,
        message: "Could not send data; disconnected",
    }
    -- if the connection is open, we send the data
    Option.Some connection -> connection.write data
}

Let's say the above code tries to send some data through a socket. To handle a disconnection, we can try the error:

ping = socket -> try (send_data socket "ping")

socket = Socket.new "isaac@passerine.io:42069" -- whatever
socket.disconnect () -- oh no!

result = ping socket

match result {
    Result.Ok "pong" -> ()
    Result.Error Disconnected socket -> socked.connect
}

Why make the distinction between expected errors (Result) and unexpected errors (fiber crashes)? Programs only produce valid results if the environments they run in are valid. When a fiber crashes, it's signaling that something about the environment it's running in is not valid. This is very useful to developers during development, and very useful to programs in contexts where complex long-running applications may fail for any number of reasons.

Why not only use exceptions then? Because it's perfectly possible for an error to occur that is not exceptional at all. Malformed input, incorrect permissions, missing items – these are all things that can occur and do occur on a regular basis. It's always important to use the right tool for the job; prefer expected errors over unexpected errors.

Macros

Passerine has a rich hygienic* syntactic macro system that extends the language itself.

Syntactic macros, quite simply, are bits of code that hygienically produce more code when invoked at compile time. Macros use a small, simple-yet-powerful set of rules to transform code.

* Having read Doug Hoyte's exellent Let Over Lambda, I understand the raw power of a rich unhygenic macro system. However, such systems are hard to comprehend, and harder to master. Passerine aims to be as simple and powerful as possible without losing transparency: hygienic macro systems are much more transparent then their opaque unhygenic counterparts.

Hygiene

Extensions are defined with the syntax keyword, followed by some argument patterns, followed by the code that the captured arguments will be spliced into. Here's a simple example: we're using a macro to define swap operator:

syntax a 'swap b {
    tmp = a
    a = b
    b = tmp
}

x = 7
y = 3
x swap y

Note that the above code is completely hygienic. the expanded macro looks something like this:

_tmp = x
x = y
x = _tmp

Because tmp was not passed as a macro pattern parameter, all uses of tmp in the macro body are unique unrepresentable variables that do not collide with any other variables currently bound in scope. Essentially:

tmp = 1
x = 2
y = 3
x swap y

Will not affect the value of tmp; tmp will still be 1.

Argument Patterns

So, what is an argument pattern (an arg-pat)? Arg-pats are what go between:

syntax ... { }

Each item between syntax and the macro body is an arg-pat. Arg-pats can be:

  • Syntactic variables, like foo and bar.
  • Literal syntactic identifiers, which are prefixed with a quote ('), like 'let.
  • Nested argument patterns, followed by optional modifiers.

Let's start with syntactic identifiers. Identifiers are literal names that must be present for the pattern to match. Each syntactic extension is required to have at least one. For example, here's a macro that matches a for loop:

syntax 'for binding 'in values do { ... }

In this case, 'for and 'in are syntactic identifiers. This definition could be used as follows:

for a in [1, 2, 3] {
    print a
}

Syntactic variables are the other identifiers in the pattern that are bound to actual values. In the above example, abinding, [1, 2, 3]values, and { print a }do.

Macros can also be used to define operators†:

syntax sequence 'contains value {
    c = seq -> match seq {
        [] -> False
        [head, ..] | head == value -> True
        [_, ..tail] -> c tail
    }

    c sequence
}

This defines a contains operator that could be used as follows:

print {
    if [1, 2, 3] contains 2 {
        "It contains 2"
    } else {
        "It does not contain 2"
    }
}

Evidently, It contains 2 would be printed.

† Custom operators defined in this manner will always have the lowest precedence, and must be explicitly grouped when ambiguous. For this reason, Passerine already has a number of built-in operators (with proper precedence) which can be overloaded. It's important to note that macros serve to introduce new constructs that just happen to be composable – syntactic macros can be used to make custom operators, but they can be used for so much more. I think this is a fair trade-off to make.

Modifiers are postfix symbols that allow for flexibility within argument patterns. Here are some modifiers:

  • Zero or more (...)
  • Optional (?)

Additionally, parenthesis are used for grouping, and { ... } are used to match expressions within blocks. Let's construct some syntactic arguments that match an if else statement, like this one:

if x == 0 {
    print "Zero"
} else if x % 2 == 0 {
    print "Even"
} else {
    print "Not even or zero"
}

The arg-pat must match a beginning if:

syntax 'if { ... }

Then, a condition:

syntax 'if condition { ... }

Then, the first block:

syntax 'if condition then { ... }

Next, we'll need a number of else if <condition> statements:

syntax 'if condition then ('else 'if others do)... { ... }

Followed by a required closing else (Passerine is expression oriented and type-checked, so a closing else ensures that an if expression always returns a value.):

syntax 'if condition then ('else 'if others do)... 'else finally { ... }

Of course, if statements are already baked into the language – let's build something else – a match expression.

Building a match expression

A match expression takes a value and a number of functions, and tries to apply the value to each function until one successfully matches and runs. A match expression looks as like this:

l = Some (1, 2, 3)

match l {
    Some (m, x, b) -> m * x + b
    None           -> 0
}

Here's how we can match a match expression:

syntax 'match value { arms... } {
    -- TODO: implement the body
}

This should be read as:

The syntax for a match expression starts with the pseudokeyword match, followed by the value to match against, followed by a block where each item in the block gets collected into the list arms.

What about the body? Well:

  1. If no branches are matched, an error is raised.
  2. If are some branches, we try the first branch in a new fiber and see if it matches.
  3. If the function doesn't raise a match error, we found a match!
  4. If the function does raise a match error, we try again with the remaining branches.

Let's start by implementing this as a function:

-- takes a value to match against
-- and a list of functions, branches
match_function = value branches -> {
    if branches.is_empty {
        error Match "No branches matched in match expression"
    }

    result = try { (head branches) value }

    if result is Result.Ok _ {
        Result.Ok v = result; v
    } else if result is Result.Error Match _ {
        match_function value (tail branches)
    } else if result is Result.Error _ {
        -- a different error occurred, so we re-raise it
        Result.Error e = result; error e
    }
}

I know the use of if to handle tasks that pattern matching excels at hurts a little, but remember, that's why we're building a match expression! Using base constructs to create higher-level affordances with little overhead is a core theme of Passerine development.

Here's how you could use match_function, by the way:

-- Note that we're passing a list of functions
description = match_function Banana ("yellow", "soft") [
    Banana ("brown", "mushy") -> "That's not my banana!",

    Banana ("yellow", flesh)
        | flesh != "soft"
        -> "I mean it's yellow, but not soft",

    Banana (peel, "soft")
        | peel != "yellow"
        -> "I mean it's soft, but not yellow",

    Banana ("yellow", "soft") -> "That's my banana!",

    Banana (color, texture)
        -> "Hmm. I've never seen a { texture } { color } banana before...",
]

This is already orders of magnitude better, but passing a list of functions still feels a bit... hacky. Let's use our match macro definition from earlier to make this more ergonomic:

syntax 'match value { arms... } {
    match_function value arms
}

We've added match expression to Passerine, and they already feel like language features*! Isn't that incredible? Here's the above example we used with match_function adapted to match†:

description = match Banana ("yellow", "soft") {
    Banana ("brown", "mushy") -> "That's not my banana!"

    Banana ("yellow", flesh)
        | flesh != "soft"
        -> "I mean it's yellow, but not soft"

    Banana (peel, "soft")
        | peel != "yellow"
        -> "I mean it's soft, but not yellow"

    Banana ("yellow", "soft") -> "That's my banana!"

    Banana (color, texture)
        -> "Hmm. I've never seen a { texture } { color } banana before..."
}

† Plot twist: we just defined the match expression we've been using throughout this entire overview.

Modules

Passerine's module system allows large codebases to be broken out into indiviual reusable components. A module is a scopes turned into a struct, and isn't necessarily tied to the file system.

Modules are defined using the mod keyword, which must be followed by a block { ... }. Here's a simple module that defines some math utilities:

circle = mod {
    PI     = 3.14159265358979
    area   = r -> r * r * PI
    circum = r -> r * PI * 2
}

pizza_radius = 12
slices = 8
slice_area = (circle::area pizza_radius) / slices

mod takes all top-level declarations in a block - in this case, PI, area, and circum - and turns them into a struct with those fields. In essence, the above is equivalent to this struct:

circle = {
    PI:     3.14159265358979
    area:   r -> r * r * PI
    circum: r -> r * PI * 2
}

mod is nice because it's an easy way to have multiple returns. In essesence, the mod keyword allows for first-class scoping, by turning scopes into structs:

index = numbers pos
    -> floor (len numbers * pos)

quartiles = numbers -> mod {
    sorted = (sort numbers)
    med = sorted::(index (1/2) sorted)
    q1  = sorted::(index (1/4) sorted)
    q3  = sorted::(index (3/4) sorted)
}

Because we used the mod keyword in the above example, instead of returning a single value from the function, we return a struct containing all values in the fuction:

-- calculate statistics
numbers = [1, 2, 3, 4, 5]
stats   = quartiles numbers

-- use `q1` and `q3` to calculate the interquartile range of `numbers`
iqr     = stats::q3 - stats::q1
print "the IQR of { numbers } is { iqr } "

This is really useful for writing functions that return multiple values.

Aside from allowing us to group sets of related values into a single namespace, modules can be defined in different files, then be imported. Here's a module defined in a different file:

-- list_util.pn
reduce = f start list -> match list {
    [] -> start,
    [head, ..tail] -> f (reduce f tail, head)
}

sum     = reduce { (a, b) -> a + b   } 0
reverse = reduce { (a, b) -> [b.., a]} []

This file defines a number of useful list utilities, defined in a traditional recursive style. If we want to use this module in main.pn, we import it using the use keyword:

-- main.pn
use list_util

numbers = [1, 1, 2, 3, 5]
print (list_util::sum numbers)

Note that the use keyword is essentially the same thing as wrapping the contents of the imported file with the mod keyword:

-- use list_util
list_util = mod { <list_util.pn> }

Once imported, list_util is just a struct. Because of this, features of the module system naturally arise from Passerine's existing semantics for manipulating structs. To import a subset of a module, we can do something like this:

reverse = { use list_util; list_util::reduce }

Likewise, we can import a module within a block scope to rename it:

list_stuff = { use list_util; list_util }

There are a number of nice properties that arise from this module system, we've just scratched the surface. As modules are just structs, the full power of passerine and its macro system are at your disposal for building extensible systems that compose well.

Concluding Thoughts

Thanks for reading this far. Passerine has been a joy for me to work on, and I hope you find it a joy to use.

A few of the features discussed in this overview haven't been implemented yet. We're not trying to sell you short, it's just that developing a programming language and bootstrapping a community around it at the same time is not exactly easy.

If you'd like something added to Passerine, open an issue or pull request, or check out the Project Roadmap to get a feel for what's currently under development.

Contributing

Contributions are welcome! Read our Contribution Guide and join the Discord server to get started!

If you'd like to contribute to the project but don't have much time to spare, consider donating. Thank you!

FAQ

Q: Is Passerine ready for production use?

A: Not yet. Passerine is still in early stages of development, with frequent breaking changes. See the Project Roadmap to get an idea of what's in development.

Q: Is Passerine statically typed?

A: Currently, Passerine is strongly and dynamically¹ typed (technically structurally typed). This is partially out of necessity – Types are defined by patterns, and patterns can be where predicated. However, I've been doing a lot of research into Hindley-Milder type systems, and the various extensions that can be applied to them.

I'm working towards making a compile-time type-checker for the language, based on Hindley-Milner type inference. With this system in place, I can make some assumptions to speed up the interpreter further and perhaps monomorphize/generate LLVM IR / WASM.

This type checker is actually the target of the next release, so stay tuned!

Q: What about algebraic effects and kind-based macros?

A: I'm interested in eventually adding both these things to the language, but first I need to implement a nice type-checker and do some more research. Algebraic Effects would fill the design space of fibers, and kind based macros would provide a more solid base for passerine's macro system. Got any fresh language features you think would complement Passerine's design philosophy? Reach out!

Q: What is vaporization memory management?

A: When I was first designing Passerine, I was big into automatic compile-time memory management. Currently, there are a few ways to do this: from Rust's borrow-checker, to µ-Mitten's Proust ASAP, to Koka's Perceus, there are a lot of new and exciting ways to approach this problem.

Vaporization is an automatic memory management system that allows for Functional but in Place style programming. For vaporization to work, three invariants must hold:

  1. All functions params are passed by value via a copy-on-write reference. This means that only the lifetimes of the returned objects need to be preserved, all others will be deleted when they go out of scope.
  2. A form of SSA is performed, where the last usage of any value is not a copy of that value.
  3. All closure references are immutable copies of a value. These copies may be reference-counted in an acyclical manner.

With these invariants in place, vaporization ensures two things:

  1. Values are only alive where they are still useful.
  2. Code may be written in a functional style, but all mutations occur in-place as per rule 2.

What's most interesting is that this system requires minimal interference from the compiler when used in conjunction with a VM. All the compiler has to do is annotate the last usage of the value of any variables; the rest can be done automatically and very efficiently at runtime.

Why not use this? Mainly because of rule 3: 'closure references are immutable'. Passerine is pass-by-value, but currently allows mutation in the current scope a la let-style redefinition. But this is subject to change; and once it does, it's vaporization all the way, baby!

Q: Aren't there already enough programming languages?

A: Frankly, I think we've barely scratched the surface of programming language design. To say that Programming Language Design is saturated and at a local maxima is to not understand the nature of software development. Passerine is largely a test as to whether I can build a modern compiler pipeline. But what I'm even more interested in is the tooling that surrounds development environments.

Case in point: text-based entry for programming languages has been around forever because it's fast. However, it's not always semantically correct. The number of correct programs is an infinity smaller than the number of possible text files. Yet it's still possible to make text-based entry systems that ensure semantic correctness while encouraging exploration. In the future, we need to develop new tools that more closely blur the line between language and environment. Pharo is a step in the right direction, as are Unison and similar efforts.

I'd like to focus more on this in the future. An interesting project would be an editor/environment like Pharo/Unison for a small minimal language, like Scheme, or perhaps even Passerine.