From 1dfbcc1c397d25eb9cfe658ac560caaa1e99400f Mon Sep 17 00:00:00 2001 From: Kevin Atkinson Date: Fri, 27 Oct 2017 23:43:50 -0400 Subject: [PATCH 1/3] Initial Draft --- draft.md | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 draft.md diff --git a/draft.md b/draft.md new file mode 100644 index 0000000..6c2f499 --- /dev/null +++ b/draft.md @@ -0,0 +1,81 @@ +# Draft Ipld Unixfs Spec + +## Basic Structure + + - Some sort of header that indicates that this a directory and included a version number. The header could also have fields to give additional information on the meaning of the extended attributes. + + - CBOR Map + - Key: CBOR Byte or Text String: File Name + - Value: CBOR Array of: + - Type: CBOR Unsigned Int + - Link or Data: CBOR Type varies + - Optional file size: CBOR Unsigned Int + - Optional Standard Attributes: CBOR Map + - Optional Extended Attributes: CBOR Map + +The file size is only defined for regular files and is the size of the file contents. + +All maps should be ordered based on the binary values of the key, +duplicates are not allowed. + +### Notes + + * An array makes sense to be as this is more compact and the value of + the fields are unambiguous, it also allows for a separation of + standard and extended attributes + + * The key type can either be a byte or text string as POSIX makes no + requirements that file names be utf-8 and it is important that any + file name can be faithfully represented, if the string is utf-8 + then the type will be Text. + +## Types + +The type field should be limited to a set of well defined values so it +makes sense that this is an integer rather than a text string. The +value is the ascii value of a letter. When converting to JSON the +integer can be represented as a single character string. + +Possible values are as follows: + + * 0, '', `file`: regular file + * `e`, `exe`: executable file + * `d`, `dir`: directory entry + * `s`, `special`: special file type (fifo, device, etc). The second field is a CBOR Map with at least one field to describe the type. + * `l`, `symlink`: symbolic link. The second field is the contents of the link + * `o`, `other`: link to other ipld object, links followed for GC and related operations + * `u`, `unknown`: link to unknown objects, links not followed + +### Notes + + * Rather than have a special attribute for an executable bit it is more compact if we just make this a different type + * It is very useful to be able to determine if a link is a directory or an ordinary file so I made it as separate type, also there can be multiple ways to define a file size for a directory so it is best to just leave it out as it is of limited usefulness + +## Standard Attributes: + +The standard set of attributes should be limited to a small set of meaningful values. +Stripping this filed SHOULD not change the meaning of the directory entry. +Clients SHOULD be able to understand these attributes when reading a directory entry. + +Possible entries: + + * `mtime` + * `ro`: Boolean, set if the file or directory should be readonly when copied to the filesystem + +## Extended Attributes + +The extended attributes set is not well defined and can be used for vendor extensions and posix attributes that don't make sense on non-unix systems. +Stripping this field MUST not change the meaning of the directory entry. +These attributes SHOULD be passed along but do not have to be understood. +The directory header MAY include information on the meaning of the attributes; +for example it could indicate that this is a copy of a unix filesystem and to expect a standard set of corresponding attributes. + +Possible entries: + + * `user`: unix user name + * `uid`: unix numeric uid + * `group`: unix group name + * `gid`: unix numeric gid + * `perm`: full unix permissions + * extended posix attributes + * windows specific attributes From 1b83f0b42bfdf52c53fd68de3455c0d6b6a32d1b Mon Sep 17 00:00:00 2001 From: Kevin Atkinson Date: Thu, 2 Nov 2017 22:53:57 -0400 Subject: [PATCH 2/3] Rewrite draft. --- draft.md | 122 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 70 insertions(+), 52 deletions(-) diff --git a/draft.md b/draft.md index 6c2f499..b721b70 100644 --- a/draft.md +++ b/draft.md @@ -1,74 +1,66 @@ -# Draft Ipld Unixfs Spec +# Draft IPLD Unixfs Spec ## Basic Structure - - Some sort of header that indicates that this a directory and included a version number. The header could also have fields to give additional information on the meaning of the extended attributes. +A Unixfs is either a file or a directory. +The top level IPLD object is a CBOR map with at least two fields: `type` and `data` +and maybe a few other such as a version string or a set of flags. +The `type` field is either `file` or `dir`. - - CBOR Map - - Key: CBOR Byte or Text String: File Name - - Value: CBOR Array of: - - Type: CBOR Unsigned Int - - Link or Data: CBOR Type varies - - Optional file size: CBOR Unsigned Int - - Optional Standard Attributes: CBOR Map - - Optional Extended Attributes: CBOR Map +## IPLD `file` -The file size is only defined for regular files and is the size of the file contents. +If an IPLD file is a leaf its CID type is `raw` (0x55) and has no structure. +Otherwise its CID type is `dag-cbor` (0x71). +The `type` field is set to `file` and the `data` field is an CBOR array. +Each element of the array is CBOR map with the following fields: -All maps should be ordered based on the binary values of the key, -duplicates are not allowed. + - `data`: link + - `size`: cumulative size of `data` + - `fsize`: (file size) cumulative size of the payload of `data` + +The `fsize` field is omitted if the link is `raw` as it is the same value as size. -### Notes - - * An array makes sense to be as this is more compact and the value of - the fields are unambiguous, it also allows for a separation of - standard and extended attributes - - * The key type can either be a byte or text string as POSIX makes no - requirements that file names be utf-8 and it is important that any - file name can be faithfully represented, if the string is utf-8 - then the type will be Text. - -## Types - -The type field should be limited to a set of well defined values so it -makes sense that this is an integer rather than a text string. The -value is the ascii value of a letter. When converting to JSON the -integer can be represented as a single character string. +## IPLD `dir` -Possible values are as follows: +An IPLD `dir` represents a directory. +Its CID type is `dag-cbor` (0x71). +The `type` field set to `dir` and the data field is an CBOR map. +The key of the map is a filename and is a CBOR text string encoded in UTF-8. +The value of the map is another CBOR map with the following standard fields: - * 0, '', `file`: regular file - * `e`, `exe`: executable file - * `d`, `dir`: directory entry - * `s`, `special`: special file type (fifo, device, etc). The second field is a CBOR Map with at least one field to describe the type. - * `l`, `symlink`: symbolic link. The second field is the contents of the link - * `o`, `other`: link to other ipld object, links followed for GC and related operations - * `u`, `unknown`: link to unknown objects, links not followed + - `type` + - `exe`: CBOR boolean: executable bit + - `data`: normally a CBOR link, but can be other types depending on the value of the `type` field + - `size`: cumulative size of `data` + - `fsize`: (file size) cumulative size of the payload of `data` + - `fname`: CBOR byte string: original filename if it differs from the key -### Notes +And at least the following optional fields: - * Rather than have a special attribute for an executable bit it is more compact if we just make this a different type - * It is very useful to be able to determine if a link is a directory or an ordinary file so I made it as separate type, also there can be multiple ways to define a file size for a directory so it is best to just leave it out as it is of limited usefulness + - `ro`: CBOR boolean: read only + - `mtime`: Modification time + - `attr`: CBOR Map: Extended attributes -## Standard Attributes: +Additional fields may be defined. All implementation specific or user +defined fields should be stored under the `attr` field. -The standard set of attributes should be limited to a small set of meaningful values. -Stripping this filed SHOULD not change the meaning of the directory entry. -Clients SHOULD be able to understand these attributes when reading a directory entry. +### Directory Types -Possible entries: +The type field is limited to a set of well defined values: - * `mtime` - * `ro`: Boolean, set if the file or directory should be readonly when copied to the filesystem + * _omitted_: regular file + * `dir`: directory entry + * `special`: special file type (fifo, device, etc). + The `data` field is a CBOR Map with at least one field to describe the type. + * `symlink`: symbolic link. The `data` field is the contents of the link. + * `other`: link to other IPLD object, links followed for GC and related operations + * `unknown`: link to unknown objects, links not followed -## Extended Attributes +### Extended Attributes -The extended attributes set is not well defined and can be used for vendor extensions and posix attributes that don't make sense on non-unix systems. +The extended attributes set is not well defined and can be used for vendor extensions and POSIX attributes that don't make sense on non-unix systems. Stripping this field MUST not change the meaning of the directory entry. These attributes SHOULD be passed along but do not have to be understood. -The directory header MAY include information on the meaning of the attributes; -for example it could indicate that this is a copy of a unix filesystem and to expect a standard set of corresponding attributes. Possible entries: @@ -79,3 +71,29 @@ Possible entries: * `perm`: full unix permissions * extended posix attributes * windows specific attributes + +### Notes + +* Note all standard fields need to be defined for all files types. + + * The `type` field is omitted for regular files. + * The `exe` field is only present when true and only makes sense for regular files + * The `size` and `fsize` are only required when the type is a regular file and possibly a `dir`. + For other types they may be defined if they have a meaningful value. + * The `fsize` field is omitted for files that are leaves (i.e. `raw`) as it is the same value as `size`. + +* IPLD filenames must at minimum be valid UTF-8 strings and may not contain the null (0x00) or '/' characters. + Other restricts may be put in place. + If the original filename does not meet these requirements then an implementation MAY transform the file from + the original, so it is valid IPLD file, and store the original file in the `fname` field. + When extracting a directory to the filesystem an implementation + MAY make use of `fname` to restore the original name. + Implementations SHOULD reject invalid files with invalid names by default + and only translate files when a special flag is given. + When extracting implications SHOULD use the IPLD name and not `fname` unless a special flag is given. + +* To save space fields of a directory may be assigned integer values. + Integers have the added benefit of conveying additional meaning based on there values; + for example, to distinguish between standard and optional fields. + +* The `type` field may also be assigned integer values. From 3027a35e2a8ea1a04011bbda827f79128e86599b Mon Sep 17 00:00:00 2001 From: Kevin Atkinson Date: Fri, 3 Nov 2017 13:52:42 -0400 Subject: [PATCH 3/3] Disallow "." and ".." as filenames. --- draft.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/draft.md b/draft.md index b721b70..d952bca 100644 --- a/draft.md +++ b/draft.md @@ -82,7 +82,9 @@ Possible entries: For other types they may be defined if they have a meaningful value. * The `fsize` field is omitted for files that are leaves (i.e. `raw`) as it is the same value as `size`. -* IPLD filenames must at minimum be valid UTF-8 strings and may not contain the null (0x00) or '/' characters. +* IPLD filenames must valid UTF-8 strings which the following additional constraints: + (1) cannot contain the null (0x00) or `/` characters + (2) cannot be the strings: `.` or `..` Other restricts may be put in place. If the original filename does not meet these requirements then an implementation MAY transform the file from the original, so it is valid IPLD file, and store the original file in the `fname` field.