Schema Definition Language
A language for defining a common language for new commercial networks
The protocol will define data types and messages that can be universally referenced within the network. This document proposes the adoption of a schema definition language called the "Nosh Schema Defintion Language (SDL)" to define data schemas that represent objects related to events in the lifecycle of a commercial transaction. SDL is similar to an OpenAPI specification with added semantic nice-to-haves that are useful for our architecture.
Introduction
The SDL is used for the definition of RPC methods and record types, providing developers with a standardized approach and workflow for crafting and specifying new data structures within the network. Schemas are defined using RDSIDs. RDSIDs facilitate a network-wide domain categorization of methods and Data Models.
Motivation
Our motivation for a schema definition language is largely the same as the raw IPLD Schema motivation. A standard for the networks data will make coordinating groups of developers and their applications much easier. IPLD Schemas have rich support for describing immutable document graphs based on content-addressable linking in distributed systems.
An open network like nosh needs a way to agree on data structures, transport, and semantics. The SDL solves this problem by giving a uniform schema definition language. This primitive allows new markets to permissionlessly emerge.
For example, the company will initially define the basic data models and API definitions for food-delivery, rideshare, and other e-commerce networks but it is unliekly that we will be able to create the schemas for all categories of commercial applications. In order for the network to grow into many new categories independently, we need a common way for developers to describe data. Further, this schema language enables code-generation with types and validation which makes life very easy for developers.
We considered adopting RDF standards but the generality and lack of strictness felt uncomfortable here. RDF is good for generic or easily generalizable use cases but this felt wrong for the highly contractually commercial setting in which applications building here require. We wanted a more strict schema language that was easy for developers to use and offer assurances for strongly typed APIs with runtime correctness validations against over HTTP.
Examples
example namespace methods:
xyz.nosh.buyer.updateAddress()
xyz.nosh.provider.getCatalog()
example record types:
xyz.nosh.buyer.address
xyz.nosh.provider.catalog
example api call:
await nosh.server.buyer.updatePreferences({
user: 'alice',
})
In the above API call, SDL establishes a shared method id (xyz.nosh.buyer.updateAddress
) and the expected query params, input body, and output body. By using SDL, the call inherits runtime checks on the inputs and outputs of all requests which is vital in distributed systems.
Proposal
Below you can see an overview of types. This RFC builds on top of the Data Models RFC.
Overview of Types
SDL Type | Data Model Type | Category |
---|---|---|
null | Null | concrete |
boolean | Boolean | concrete |
integer | Integer | concrete |
string | String | concrete |
bytes | Bytes | concrete |
cid-link | Link | concrete |
blob | Blob | concrete |
array | Array | container |
object | Object | container |
params | container | |
token | meta | |
ref | meta | |
union | meta | |
unknown | meta | |
record | primary | |
query | primary | |
mutation | primary | |
context | primary | |
subscription | primary |
SDL Files (Schema Definition Documents)
SDL files are JSON documents associated with a single RDSID
. Each file includes one or more definitions, each marked by a unique short name. A definition named main
may optionally signify the "primary" definition for the entire document. An SDL file lacking definitions is considered invalid.
An SDL JSON file is an object, akin to a .yaml
or .json
file in OpenAPI. Each SDL JSON file delineates a specific piece of information or communication method for the network. SDL JSON files uniformly define client-to-server and server-to-server interactions within the network.
SDL
(integer, required): indicates the SDL language version. In this version, a fixed value of1
is used.id
(string, required): theRDSID
of the SDL document.revision
(integer, optional): indicates the version of this SDL document, if changes have occurred.description
(string, optional): a description of the SDL document, usually one or two sentences useful for developers to understand.defs
(map of strings-to-objects, required): set of definitions, each with a distinct name (key).
Schema definitions under defs
all have a type
field to distinguish their type. A file can have at most one definition with one of the "primary" types. Primary types should always have the name main
. It is possible for main
to describe a non-primary type.
References to specific definitions within an SDL document use fragment syntax, like com.referenceDomain.defs#someView
. If a main
definition exists, it can be referenced without a fragment, just using the RDSID
. For references in the $type
fields in data objects themselves (e.g., records or contents of a union), this is a "must" (use of a #main
suffix is invalid). For example, com.referenceDomain.record
not com.referenceDomain.record#main
.
Related SDL documents are often grouped together under a RDSID
hierarchy, for example, a Buyer
entity might have its own namespace to group data models under a single Buyer
domain. As a convention, any definitions used by multiple SDL documents are defined in a dedicated *.defs
SDL document (e.g., com.referenceDomain.psn.defs
) within the group. A *.defs
SDL document should generally not include a definition named main
, though it is not strictly invalid to do so.
Primary Type Definitions
The primary types are:
query
: describes an NRPC Query (HTTP GET)mutation
: describes an NRPC Mutation (HTTP POST)subscription
: Event Stream (WebSocket)context
: describe a cryptographically signed binary data object used to validate the integrity of a record and it's creator (signer)record
: describes an object that can be stored in a repository record
Each primary definition schema object includes these fields:
type
(string, required): the type value (eg,record
for records)description
(string, optional): short, usually only a sentence or two
Record
Type-specific fields:
key
(string, required): specifies the Record Key typerecord
(object, required): a schema definition with typeobject
, which specifies this type of record
Query, Mutation, and Context
Type-specific fields:
parameters
(object, optional): a schema definition with typeparams
, describing the HTTP query parameters for this endpointoutput
(object, optional): describes the HTTP response bodydescription
(string, optional): short descriptionencoding
(string, required): MIME type for body contents. Must useapplication/json
for JSON responses.schema
(object, optional): schema definition, either anobject
, aref
, or aunion
of refs. Used to describe JSON encoded responses, though schema is optional even for JSON responses.
input
(object, optional, only formutation
): describes HTTP request body schema, with the same format as theoutput
fielderrors
(array of objects, optional): set of string error codes which might be returnedname
(string, required): short name for the error type, with no whitespacedescription
(string, optional): short description, one or two sentences
Subscription (Events)
Type-specific fields:
parameters
(object, optional): same as Query and Mutationmessage
(object, optional): specifies what messages can bedescription
(string, optional): short descriptionschema
(object, required): schema definition, which must be aunion
of refs
errors
(array of objects, optional): same as Query and Mutation
Subscription schemas (referenced by the schema
field under message
) must be a union
of refs, not an object
type.
Field Type Definitions
Every schema object includes these fields:
type
(string, required): fixed value for each typedescription
(string, optional): short, usually only a sentence or two
null
No additional fields.
boolean
Type-specific fields:
default
(boolean, optional): a default value for this fieldconst
(boolean, optional): a constant value for this field
When included as an HTTP query parameter, should be rendered as true
or false
(raw text with no quotes).
integer
A signed positive (+) or negative (-) integer number. Type-specific fields:
format
(integer, optional): integer format restrictionminimum
(integer, optional): minimum acceptable valuemaximum
(integer, optional): maximum acceptable valueenum
(array of integers, optional): a closed set of allowed valuesdefault
(integer, optional): a default value for this fieldconst
(integer, optional): a fixed (constant) value for this field
string
Type-specific fields:
format
(string, optional): string format restrictionmaxLength
(integer, optional): maximum length of value, in UTF-8 bytesminLength
(integer, optional): minimum length of value, in UTF-8 bytesmaxGraphemes
(integer, optional): maximum length of value, counted as Unicode Grapheme ClustersminGraphemes
(integer, optional): minimum length of value, counted as Unicode Grapheme ClustersknownValues
(array of strings), options: a set of suggested or common values for this field. Values are not limited to this set (aka, not a closed enum).enum
(array of strings, optional): a closed set of allowed valuesdefault
(string, optional): a default value for this fieldconst
(string, optional): a fixed (constant) value for this field
Strings must be in Unicode. If using non-Unicode encodings, switch to bytes. The minLength and maxLength constraints are measured in UTF-8 bytes, but remember, JavaScript defaults to UTF-16 for strings, so conversion for accurate counts is necessary. The minGraphemes and maxGraphemes constraints are based on Grapheme Clusters, essentially "visual characters" such as emojis, which may encompass multiple Unicode codepoints and a larger number of UTF-8 bytes.
format
constrains the string format for further validation assurances. Refer to the Data Model specification for the available format types and their definitions.
default
and const
are mutually exclusive, and therefore invalidated if both are provided.
bytes
Type-specific fields:
minLength
(integer, optional): minimum size of value, as raw bytes with no encodingmaxLength
(integer, optional): maximum size of value, as raw bytes with no encoding
cid-link
No type-specific fields. See Data Model spec for CID restrictions.
array
Type-specific fields:
items
(object, required): describes the schema elements of this arrayminLength
(integer, optional): minimum count of elements in arraymaxLength
(integer, optional): maximum count of elements in array
Although arrays are usually homogeneous (all elements share the same type), the introduction of union types renders this constraint obsolete. Becayse if unions, implementations should not to presume uniformity in element types within an array.
object
A generic object schema Type-specific fields:
properties
(map of strings-to-objects, required): defines the properties (fields) by name, each with their own schemarequired
(array of strings, optional): indicates which properties are requirednullable
(array of strings, optional): indicates which properties can havenull
as a value
Following the data model guidelines, there's a crucial distinction in how data is interpreted based on whether a field is omitted, included with a null
value, or included with a "false-y" value (such as false
, 0
, or an empty array).
blob
Type-specific fields:
accept
(array of strings, optional): Specifies a list of acceptable MIME types. Entries can utilize*
as a wildcard, allowing glob patterns likeimage/*
. To accept any MIME type, use*/*
.maxSize
(integer, optional): Defines the maximum allowable size in bytes for the blob.
params
This type is specifically designed for use with the parameters
field found in the primary types query
, procedure
, and subscription
. It corresponds to HTTP query parameters, indicating a narrow scope.
Type-specific fields:
required
(array of strings, optional): same semantics as field onobject
properties
: similar to properties underobject
, but can only include the typesboolean
,integer
,string
, andunknown
; or anarray
of one of these types
in contrast to the object
type, the params
type does not include a nullable
field. This distinction underscores a specific design choice in how params
handles the presence or absence of values differently from object
.
token
Tokens are like the "symbol" in certain programming languages, serving as unique identifiers separate from strings, variables, keywords, or any other form of identification. They're used for representing discrete values like in a state machine or enumerated a set of categories.
ref
Type-specific fields:
ref
(string, required): reference to another schema definition
Refs serve as a method for re-utilizing a schema definition. The ref
string may either point to a global reference of an SDL type definition, an RDSID
which might include a #
-delimited name to specify a definition distinct from main
, or it can refer to a local definition within the same SDL document. This local reference is indicated by a #
followed by the name, enabling the reuse of definitions either globally across different files or locally within the same document.
union
Type-specific fields:
refs
(array of strings, required): references to schema definitionsclosed
(boolean, optional): indicates if a union is "open" or "closed". defaults tofalse
(open union)
Unions in a schema signify the presence of multiple possible types at a specific location, functioning similarly to polymorphic types. These unions utilize the ref
syntax for referencing, allowing them to point to either global or local schema definitions. A union does not amalgamate fields from different schemas into one or create a new "hybrid" type. Instead, the actual data must match one specific type within the union, which are referred to as variants.
Unions are typically open
. Openness allows for the possibility of adding more types to the list of refs. This design choice suggests that implementations should validate data leniently to accommodate potential updates they haven't yet received.
A closed
boolean flag exists to signify that a union's set of types is permanently fixed, preventing any future amendments.
A unique aspect of unions is the allowance of a schema definition without any refs
, akin to an unknown
type, provided the closed
flag is unset (false by default). Conversely, a union marked as closed
without any refs
constitutes an invalid schema, highlighting the necessity for at least one potential type in a closed union.
For the types within a union
, they are typically structured as objects or types easily translated into objects, like a record
. Each variant within the union is expected to be represented by a CBOR map (or JSON Object) and must include a $type
field to denote the specific variant type.
unknown
Any type of data is permitted to appear at the designated location without undergoing type-specific validation. It's crucial to note that while this offers flexibility, the data must still conform to the overarching data model requirements. This implies that the data cannot include elements unsupported by the model, such as Floats.
String Formats
Strings can optionally be constrained to one of the following format
types:
nosh-uri
: NOSH-URIcid
: CID in string format, details specified in Data Modeldatetime
: timestamp, details specified belowrdsid
: a reverse domain schema identifieruri
: generic URI, details specified belowlanguage
: language code, details specified belowcurrency
: currency code, details specified belowcountry
: country code, details specified beloweth
: thecustody address
for anaccount identifier
h3
: a string of hexidecimal characters representing a geospatial index
datetime
Full-precision date and time, with timezone information.
This format is specifically intended for use with computer-generated timestamps after the UNIX epoch. Datetimes before year zero or in distant future times are disallowed and you should opt for a different format.
Datetime format standards vary widely and often overlap. Datetime strings are expected to conform to the intersecting requirements of RFC 3339, ISO 8601, and WHATWG HTML datetime standards.
The character separating "date" and "time" parts must be an upper-case T
.
Timezone specification is mandatory. It is highly recommended to use the UTC timezone and represent it with a capital Z
suffix. The use of lowercase z
is not allowed. While the hour/minute suffix syntax (e.g., +01:00
or -10:30
) is supported, "negative zero" (-00:00
) is disallowed according to ISO 8601.
Whole seconds precision is mandatory, with the allowance for arbitrary fractional precision digits. It's recommended to adhere to at least millisecond precision and fill with zeros to match the generated precision. For instance, use trailing :12.340Z
instead of :12.34Z
.
Implementations "should" ensure that the semantics of the datetime are valid. For instance, dates with a month or day set to 00 should be considered invalid.
Valid examples:
# preferred
1985-04-12T23:20:50.123Z
1985-04-12T23:20:50.123456Z
1985-04-12T23:20:50.120Z
1985-04-12T23:20:50.120000Z
# supported
1985-04-12T23:20:50.12345678912345Z
1985-04-12T23:20:50Z
1985-04-12T23:20:50.0Z
1985-04-12T23:20:50.123+00:00
1985-04-12T23:20:50.123-07:00
Invalid examples:
1985-04-12
1985-04-12T23:20Z
1985-04-12T23:20:5Z
1985-04-12T23:20:50.123
+001985-04-12T23:20:50.123Z
23:20:50.123Z
-1985-04-12T23:20:50.123Z
1985-4-12T23:20:50.123Z
01985-04-12T23:20:50.123Z
1985-04-12T23:20:50.123+00
1985-04-12T23:20:50.123+0000
# ISO-8601 strict capitalization
1985-04-12t23:20:50.123Z
1985-04-12T23:20:50.123z
# RFC-3339, but not ISO-8601
1985-04-12T23:20:50.123-00:00
1985-04-12 23:20:50.123Z
# timezone is required
1985-04-12T23:20:50.123
# syntax looks ok, but datetime is not valid
1985-04-12T23:99:50.123Z
1985-00-12T23:20:50.123Z
rdsid
Represents a syntactically valid Reverse Domain Schema Identifier
Examples:
nosh.example.fooBar
users.alice.hello
a-0.b-1.f
x.y.z
xn.2.test.thing
uri
Flexible to any URI schema, following the generic RFC-3986 on URIs. This includes, but isn’t limited to: https
, wss
, ipfs
(for CIDs), dns
, and nosh
. Maximum length is 8 KBytes.
language
A string formatted as an IETF Language Tag should adhere to the BCP 47 standard outlined in RFC 5646. This standard is widely used in web technologies like HTTP and HTML for language identification. The string must be a "well-formed" language tag as defined by the RFC. Clients should disregard "well-formed" tags that are not considered "valid" according to the RFC's specifications.
Language tags can include ISO 639 code. These codes may be extended with regional sub-tags (e.g., pt-BR
for Brazilian Portuguese) and additional subtags (e.g., hy-Latn-IT-arevela
).
Examples
en
for Englishfr-CA
for Canadian Frenchzh-Hant
for Traditional Chinese
currency
A currency code in ISO 4217 format. This format is used internationally to define the codes of currencies. The value should be a three-letter uppercase string that adheres to the ISO 4217 standard.
Examples
The US dollar is represented as USD
– the US coming from the ISO 3166 country code and the D for dollar.
The Swiss franc is represented by CHF
– the CH being the code for Switzerland in the ISO 3166 code and F for franc.
country
A country code, in the two-letter format of ISO 3166. These codes are internationally recognized codes assigned to each country and certain territories. They are two-letter codes written in uppercase. This is often used for setting locales, addressing, and other internationalization functions.
Examples
US
for United StatesJP
for JapanGB
for United Kingdom
eth
An custody address
representing the custody address
or recovery address
of a registered account in the nosh-protocol network.
Example
0xb794f5ea0ba39494ce839613fffba74279579268
- represents a hexagonal representation of a physical global position
h3
An array of H3 geospatial indices. The H3 system is a framework for geospatial indexing that divides the world into a hexagonal grid. Each cell in the grid is identified by a unique index, represented as a 15-character hexadecimal string. This array can be used for various applications, including mapping, spatial analysis, and geospatial data management.
Example
8f2830828052d25
- represents a hexagonal representation of a physical global position
Integer Formats
Integers can be constrained to the following format
type:
aid
: generic Account Identifier
aid
A shorthand format for an Account Identifier
Examples
198663
an integer nonce representing registered account 198663 in theIdentity Contracts
When to use $type
In data objects, the $type
field indicates their SDL type. This field is necessary whenever there could be ambiguity about the content type during data validation.
The rules regarding the $type
field are as follows:
record
objects must always include$type
. Even though the type is often inferred from context (such as the collection part of the path for records stored in a repository), record objects may be passed around outside of repos and therefore need to be self-describing.union
variants must always include$type
, except when they are at the top level ofsubscription
messages.
It's important to note that blob
objects always include $type
, facilitating generic processing.
As a reminder, main
types must be referenced in $type
fields using only the RDSID
, without including a #main
suffix.