I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.
In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.
I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.
In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.
Thanks for the advice, but this is just the format of some eyetracking software I had to use not something I develop myself
Ah, well, such is software dependencies.