Giter VIP home page Giter VIP logo

map-persister's Introduction

-*- mode: markdown; mode: visual-line; -*-

map-persister

  • cassiel Build Status Dependency Status
  • alexandre28f Build Status

Thoughts

A file path on disk might point to a flat serialised object (foo.ser), or the root directory of a layered serialisation (foo/). (This is assuming .ser as extension for flat files and no extension for top-level directories.) What happens if we try to save one format when the other exists? And what happens when we attempt to persist to a directory which already exists?

Here's how the unit tests are currently shaking down:

  • At top-level: creating a layered state (foo/) deletes an unlayered one (foo.ser), and vice versa. This rule is recursive, whenever a state is saved at a different depth than the previously saved one.

  • A layered state may be saved to a new directory, or to one which already holds a saved state. It may not be saved to an arbitrary directory which isn't a saved state. (This is to prevent accidental erasure.) At the moment we're marking the top level of saved states with a place-holder file, but we should probably use a dedicated directory name extension instead (foo.saved.d?).

  • At lower levels (below the top level), we aren't doing this sanity check, although we still need to do the layered vs. flat checks all the way down.

  • Items in a layered tree are added or removed as required, depending on the keys present in the map at that level. We have to remove bogus keys that happen to be in a directory on save; those entries will have a read attempt on them on unpersist, and in any case they might correspond to keys which have been explicitly removed. It's only safe to manually add things after a save and before a read.

  • We'll allow arbitrary characters in (string) keys - if we were to slug them, there would be no reliable mapping back into memory. We'll escape what we regard as illegal characters, URL-style.

  • The jury is out regarding timestamps on files (or, put another way, not overwriting files representing entries which have not changed). The brute force method is to re-import and equality-test everything we're exporting; a slightly neater way is probably to "attach" the layered persister object to a location, and have it keep a clone of the original map, so that it can do a delta on save. (This could be the fall-back for a persister that's created on the fly - it re-reads to a notional cloned map, then saves according to delta analysis.)

Operation

We want to write a nested hashmap M_new, at folder depth D, to a location which we assume contains a representation (to some arbitrary depth) of hashmap M_old.

Initial pseudocode (here for historical reasons). For the actual code, look at persistMap() and persistNode() in MapPersister.

function write(M_new, M_old, location, D):
        for all keys k in M_old which are not in M_new:
                erase from file system (directory or flat file) at location/k
                
        for all keys k in M_new which are not in M_old:
                create structure (directory or flat file) at location/k according to depth D
                
        for all keys k in both M_old and M_new:
                let obj_old = M_old[k] and obj_new = M_new[k]

                structurally compare obj_old and obj_new
                
                if not (obj_old equals obj_new):        // Need to replace tree here.
                        if location/k is dir {implies obj_old is map} and obj_new is map and D > 0:
                                RECURSE(obj_new[k], obj_old[k], location/k, D-1)
                        else:
                                erase at location/k
                                create structure for obj_new at location/k depth D

map-persister's People

Contributors

cassiel avatar rothmichaels avatar alexandre28f avatar

Watchers

 avatar James Cloos avatar

map-persister's Issues

Distinctive name for top-level persisted directory?

At the moment, the file root passed to MapPersister is under the persister's total control, and so might get completely blown away (for example, if the persister saves a changed map with depth=0, in which case the directory will be replaced with <file>.ser). We really don't want to accidentally point a persister at the wrong directory even when reading.

Suggestion: an enforced convention that the top-level (only!) directory have a specific kind of name (foo.persisted or something).

Key/directory name sanitisation

String keys in maps are interpreted as filenames regardless of their content. We need to do some URL-style escaping of characters that we consider illegal (or at least undesirable).

External file system changes

The persister will get very confused indeed if files are changed externally between an unpersist (read) and a persist (write) to the same location; the persister holds a snapshot of the last-read state so that it can save only items which have changed. It's up to the application to handle this gracefully (allowing some kind of "refresh from disk" option, for example, if desired).

OS X case sensitivity

OS X is case preserving but not case sensitive; an attempt to read file foo will find FOO or Foo, and only one of these may exist in a directory. How does this interact with our attempt to persist maps with case-significant keys?

Obviously, we can't persist maps containing keys which are identical modulo case (unless the flat-file depth is above this map). We do need some way to encode these names, that's readable in the eventual filenames.

If no external changes are made to a file tree between unpersist and persist, I think we're OK: we remove old keys before adding new ones, so a map updated with a case change in a key will be saved properly. (We should unit test that.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.