Skip to content

[io-2] Conceptual design for IO format #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
altavir opened this issue Oct 27, 2019 · 8 comments
Closed

[io-2] Conceptual design for IO format #62

altavir opened this issue Oct 27, 2019 · 8 comments

Comments

@altavir
Copy link

altavir commented Oct 27, 2019

Here are some thoughts about IO format functionality I would like to add as soon as IO-2 is out. The idea is that we can add an easy way to write objects to streams and read them from streams. The resulting API could then simplify the work with serialization and file IO.

The idea is shown in the following example (using io-1 API):

interface IOFormat<T : Any> {
    fun Output.writeThis(obj: T)
    fun Input.readThis(): T
}

fun <T : Any> Input.readWith(format: IOFormat<T>): T = format.run { readThis() }
fun <T : Any> Output.readWith(format: IOFormat<T>, obj: T) = format.run { writeThis(obj) }

class ListIOFormat<T : Any>(val format: IOFormat<T>) : IOFormat<List<T>> {
    override fun Output.writeThis(obj: List<T>) {
        writeInt(obj.size)
        format.run {
            obj.forEach {
                writeThis(it)
            }
        }
    }

    override fun Input.readThis(): List<T> {
        val size = readInt()
        return format.run {
            List(size) { readThis() }
        }
    }
}

val <T: Any> IOFormat<T>.list get() = ListIOFormat(this)

IOFormat represents a way to read and write data with Input and Output. One can construct a format for complex objects from the formats of individual parts. It could be probably used as a way to customize serialization backends. The similar idea is currently used in kmath in-memory operations on data: https://github.com/mipt-npm/kmath/blob/dev/kmath-memory/src/commonMain/kotlin/scientifik/memory/MemorySpec.kt.

@fvasco
Copy link

fvasco commented Oct 28, 2019

Hi @altavir,
the readThis() and writeThis(obj) functions look confusing to me, I propose something like readObject and writeObject, similar to readInt and writeInt.

Moreover I don't understand why IOFormat interface contains only extension methods, do you wish to use it as an implicit parameter?

with (output) {
 with (UserFormat) {
  with (BookFormat) {
   write (user)
   write (book)
  }
 }
}

In closing, I wish understand why should this feature be included in this library, serialization libraries already perform this task (see kotlinx.serialization as example).

@altavir
Copy link
Author

altavir commented Oct 28, 2019

@fvasco readObject and writeObject names are possible, I used it at some moment, but then changed (don't remember, why). Yes, I want to use the feature in context-oriented way, so it is possible to add additional serialization possibilities with additional nested contexts (or with KEEP-176). Also in this way it is easier to implement those methods. readDouble looks better than input.reaadDouble and one can use nested formats. But I am not sure that it is the best solution.

Serialization sadly does not have something similar at the moment. It has KSerializaer which works in a similar way, but with Encoder and Decoder, not with raw byte IO. Also serialization uses very limited old version of IO internally. I think that the idea is to make this library a base for serialization runtime, ktor and other things in the future and remove code duplication.

As for why do I think it is needed here. As you said, several libraries do use similar features already, but we have a problem of interface compatibility and moving functionality from one library to another one, one need to re-implement everything according to new interface. Declaring a single interface and a few very basic implementations (like list I've mentioned) would not increase the size of the library, but will significantly increase plug-ability.

After I wrote the post, I've also thought that what I am trying to do is similar to python pickle, meaning fast way to serialize-deserialize everything. The difference is that I propose to use explicit IOFormats instead of some kind of implicit inner implementations.

@fvasco
Copy link

fvasco commented Oct 28, 2019

I am sorry, @altavir,
but I not agree with the above reply.

I want to use the feature in context-oriented way, so it is possible to add additional serialization possibilities with additional nested contexts (or with KEEP-176)

It looks like the poor-man KEEP-87 implementation.

But I am not sure that it is the best solution.

Me too

I think that the idea is to make this library a base for serialization runtime, ktor and other things in the future and remove code duplication.

I don't understand.
Is serlializing arbitrary object and deprecating kotlinx.serialization a hidden goal of this library?

Declaring a single interface and a few very basic implementations (like list I've mentioned) would not increase the size of the library, but will significantly increase plug-ability.

This is the seed of the the holy war between big-endian and little-endian, each one assumes that there is only one, right, method to serialize a word.
You are assuming that every list MUST be encoded using your format, do you really think that this implementation will cover all use cases?

Is it better to use a variable size for `list.size`, to reduce the encoded size of frequently used small list?

I'm not interested to the answer (if it exists), I thinks that this is not the right place to debate about the data carried by the byte buffers.

@altavir
Copy link
Author

altavir commented Oct 28, 2019

Serialization library works in two stages: first stage transforms kotlin objects into intermediate representation (Encoder/Decoder). Then this representation is transformed or streamed into actual byte IO. While the front part is being covered by compiler magic and serialization-runtime library, the back part is actually a quite limited version of this library. I think it was planned to bring everything into a single ecosystem one day. For example, we want streaming engine for serialization, which is possible only kotlinx.io.

As for one size fits all. This is not the thing I am proposing. You can have any number of different formats for the same object. You can have a parameter in the format which switches little/big endian. Or you can have and use different objects for that. The proposal includes only common interface.

@fvasco
Copy link

fvasco commented Oct 28, 2019

Hi @altavir,
so, recap:

You are proposing to add only the IOFormat interface?

interface IOFormat<T : Any> {
    fun Output.writeThis(obj: T)
    fun Input.readThis(): T
}

fun <T : Any> Input.readWith(format: IOFormat<T>): T = format.run { readThis() }
fun <T : Any> Output.readWith(format: IOFormat<T>, obj: T) = format.run { writeThis(obj) }

If so, apart from the actual API design already commented, it looks good.
I am sorry for misunderstanding.

@altavir
Copy link
Author

altavir commented Oct 28, 2019

@fvasco Yes, the idea is to introduce only interface and maybe two small extensions for lists and maps. The rest should be done in the external plugins.

I've replaced writeThis by writeObject. It looks more concise.

@fvasco
Copy link

fvasco commented Oct 28, 2019

I've replaced writeThis by writeObject. It looks more concise.

Please think twice about motivation.
this has a special meaning in Kotlin.

maybe two small extensions for lists and maps

So my disapproval is still valid.

@fzhinkin
Copy link
Collaborator

We're rebooting the kotlinx-io development (see #131), all issues related to the previous versions will be closed. Consider reopening it if the issue remains (or the feature is still missing) in a new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants