• NeatNit@discuss.tchncs.de
    link
    fedilink
    arrow-up
    0
    ·
    19 hours ago

    I think SQLite is a great middle ground. It saves the database as a single .db file, and can do everything an SQL database can do. Querying for data is a lot more flexible and a lot faster. The tools for manipulating the data in any way you want are very good and very robust.

    However, I’m not sure how it would affect file size. It might be smaller because JSON/YAML wastes a lot of characters on redundant information (field names) and storing numbers as text, which the database would store as binary data in a defined structure. On the other hand, extra space is used to make common SQL operations happen much faster using fancy data structures. I don’t know which effect is greater so file size could be bigger or smaller.

    • GenderNeutralBro@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      0
      ·
      12 hours ago

      SQLite would definitely be smaller, faster, and require less memory.

      Thing is, it’s 2025, roughly 20 years since anybody’s given half a shit about storage efficiency, memory efficiency, or even CPU efficiency for anything so small. Presumably this is not something they need to query dynamically.

      • NeatNit@discuss.tchncs.de
        link
        fedilink
        arrow-up
        0
        ·
        11 hours ago

        True (in most contexts, probably including this one), but I think that only makes the case for SQLite stronger. What people do still care about is a good flexible, usable and reliable interface. I’m not sure how to get that with YAML.

    • Scrath@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      0
      ·
      16 hours ago

      I didn’t look to much at the data but I think csv might actually be an appropriate format for this?

      Nice simple plaintext and very easy to parse into a datastructure for analysing/using it in python or similar

      • nous@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        ·
        10 hours ago

        CSV would be fine. The big problem with the data as presented is it is a YAML list, so needs the whole file to be read into memory and decoded before you get and values out of it. Any line based encoding would be vastly better and allow line based processing to be done. CSV, json objects encoded into a single line, some other streaming binary format. Does not make much difference overall as long as it is line based or at least streamable.