Data Models

data model format specification overview


Records and messages need to be stored, transmitted, encoded, and decoded in a consistent way to promote interoperability in distributed systems. This document proposes the adoption of a slightly modified Interplanetary Linked Data (IPLD) specification for the network's data models.

Motivation

Content addressing through hashes has become a widely-used means of connecting data in distributed systems1. IPLD enables decentralized data structures that are universally addressable and linkable. The data model is similar to JSON -- with convenient types for bytes, and links. These additional properties are useful for defining linked data structures in distributed systems and allowing anyone to explore data regardless of the underlying protocol. Common type representations can then be easily representable by common programming languages. Such a standard creates low friction for tooling and libraries within the nosh-protocol network.

Our motivation for standardizing data models is largely the same as the raw IPLD Data Model motivation.

Note, we *do not strictly adhere to the raw IPLD data model definitions. The core nosh data model is based on the IPLD specification but we make a few modifications and define specific types for the nosh-protocol.

Proposal

When data needs to be authenticated (signed), referenced (linked by content hash), or stored, it can be encoded into Concise Binary Object Representation (CBOR). CBOR is an IETF standard roughly based on JSON. IPLD or the Interplanetary Linked Data (IPLD), defines a specification for hash-linked data structures and a standard data model for interoperable protocols from the IPFS ecosystem.

When data needs to have authentication (signing), referencing (via content hash linking), or storage, it is encoded in Concise Binary Object Representation (CBOR). CBOR is an IETF standard and draws its design principles from JSON. The Interplanetary Linked Data (IPLD) introduces a specialized version of CBOR known as **DAG-CBOR, which is what we intend to use. DAG-CBOR is typically considered to be fast. It's a binary, length-prefixed format. These traits usually associate with good performance.[1]. DAG-CBOR is not very human friendly. It's a binary, length-prefixed format. While these traits contribute to its performance, they do not make it easy to edit.[2]. Because of the on-demand nature of the hypothesized transactions within the nosh-protocol network (like rideshare or delivery), we optimize for performance. Depending on feedback from the community, we may update these standards to help with human-friendliness and logging.

In IPLD, nodes, refer to points in a graph of data. When encoded in binary (DAG-CBOR) result in a block.

In IPLD, nodes can reference by URL or URI (as in standard JSON), and they can also reference each other strongly by hash, referred to in IPLD as a link. A set of referenced Nodes can form, DAGs, Merkle Trees, or other complex linked data types that are useful when defining distibuted networking topologies and distributed computing architectures.

We inherit most properties from the IPLD Data Model, with the following mods:

Data Types

SDL TypeIPLD TypeJSONCBORMods
nullnullNullSpecial Value (major 7)strict IPLD format
booleanbooleanBooleanSpecial Value (major 7)strict IPLD format
integerintegerNumberInteger (majors 0,1)signed, 64-bit
stringstringStringUTF-8 String (major 3)Unicode, UTF-8
-floatNumberSpecial (major 7)disallowed
bytesbytes$bytes ObjectByte String (major 2)strict IPLD format
cid-linklink$link ObjectCID (tag 42)CID
arraylistArrayArray (major 4)
objectmapObjectMap (major 5)keys are always strings
blob-$type: blob Object$type: blob Mapan image or file

The following types strictly adhere to original IPLD Data model definitions:

The following types are removed from the original IPLD Data model definitions:

Notes on floats: It is explicitly stated in the IPLD Float Kind documentation to completely avoid Floats when developing systems on IPLD. To make things easier, we simply remove the reference to avoid complexities in implementations of the protocol. Content-addressing works best where the content being addressed has a stable meaning for the address it produces. Alternative methods for representing this meaning, or for encoding fractional numbers with greater precision and less variability, are used instead. What this means in practice is that currency values are represented universally in the lowest common denominator of the local currency.

The following types are modified from the original IPLD Data model definitions:

The following types are added to the original IPLD Data model definitions:

  • blob

Nullable and False-y

There's a critical semantic distinction between explicitly assigning a null value to a map field and omitting the field entirely. This distinction is mirrored in both JSON and Concise Binary Object Representation (CBOR), where the treatment of null or absent fields is intentionally differentiated from "false-y" values. These "false-y" values include false (for booleans), 0 (for integers), as well as empty lists or empty objects, each carrying its own significance distinct from null or missing fields.

Extended blob Type Notes

We add a blob type to represent a specific IPLD map for file content. This additional type is specifically useful for descriptions of arbitrary physical objects or entities within the network such as catalog items that will likely be represented in a variety of media formats (video, images, pdfs). Blobs are self-describing data structures that include references to metadata about themselves.

Blob nodes are objects with following fields:

  • $type (string, required): constant value of blob (blob is not recognized as a valid RDSI)
  • ref (link, required): This is a CID reference to a blob, specifically with a multicodec type of raw. Within JSON, this is represented through the $link object.
  • mimeType (string, required, not empty): This specifies the content type of the blob. The default value is application/octet-stream in cases where the content type is unknown.
  • size (integer, required, positive, non-zero): This field represents the length of the blob in bytes, requiring a positive, non-zero integer.

Notes on JSON

We do not implement IPLD DAG-JSON directly.

The DAG-JSON specification mentions that the only difference from regular JSON is that if you want to encode a link, you take the CID, encode it in base58 (or base32, for CIDv0), and flank it in a map with a "/" key -- so it looks like this:

Confusing DAG-JSON exmple with link refernce

{
   "/":"Qmfoo"
}

Confusing DAG_JSON example with bytes reference

"data": {
    "/": {
      "bytes": "iVBORK21"
    }
  }

We introduce a modified field key for each of link and bytes types for a better developer experience (DX)

For example, a node with a single field "exampleLink" with type link would encode in JSON and substitute $link as the key name instead of /.

{
  "exampleLink": 
  {
    "$link": "GgoAAAA38G"
  }
}

And a node with a single field "exampleBytes" with type bytes would encode in JSON and substitute $bytes instead of two nested objects, with outer key / inner key bytes, and the same base64 encoding scheme referenced in RFC-4648, section 4.

{
  "exampleBytes": 
  {
    "$bytes": "IaxDIBKe032"
  }
}

References

Footnotes

  1. ipld-website: https://ipld.io/#:~:text=IPLD%20is%20the%20data%20model,hashes%20as%20instances%20of%20IPLD.