nervous-systems / hildebrand Goto Github PK

View Code? Open in Web Editor NEW

66.0 10.0 10.0 176 KB

Asynchronous DynamoDB client for Clojure & Clojurescript/Node

License: The Unlicense

Clojure 95.79% JavaScript 4.21%

hildebrand's Introduction

Hildebrand

Hildebrand is a high-level client for Amazon's Dynamo DB, built on top of Eulalie.

core.async-based API
Targets both Clojure and Clojurescript/Node
Survives the Google Closure compiler's :advanced optimizations, for e.g. Clojurescript AWS Lambda functions
Exposes advanced Dynamo features, including the Dynamo Streams service
Plain EDN representations of Dynamo tables, items, queries, and their components: conditional writes, atomic updates, filters, and so on.

Documentation

The API introduction on the wiki is a good place to start.
Introducing Hildebrand, a blog post, has a bunch of usage examples in it. The namespace layout has changed since (hildebrand -> hildebrand.core)

Examples

Querying

(require '[hildebrand.channeled :refer [query!]])

(async/into []
  (query! creds :games
          {:user-id [:= "moea"]}
          {:filter [:< [:score] 50]
           :sort :desc
           :limit 10}
          {:chan (async/chan 10 (map :score))}))
;; => [15 10]

Querying + Batched Deletes

(require '[hildebrand.channeled :refer [query! batching-deletes]])

(let [[results errors]
      (->> (query! creds :games
                   {:user-id [:= "moea"]
                    :game-title [:begins-with "Super"]}
                   {:filter [:< [:score] 100]
                    :limit  100})
           (async/split map?))
      {delete-chan :in-chan} (batching-deletes creds {:table :games})]
  (async/pipe results delete-chan))

Clojurescript

All of the functionality (barring the synchronous convenience functions) is exposed via Clojurescript. The implementation specifically targets Node, and uses lein-npm for declaring its dependency on bignumber.js. The wiki contains more information about number handling, which is the only substantial difference from the Clojure implementation.

The specific use-case I had in mind for Node support is writing AWS Lambda functions in Clojurescript.

See the Eulalie README for other Node-relevant details.

Development

Most of the integration tests expect an instance of DynamoDB Local. If the LOCAL_DYNAMO_URL environment variable isn't set, those tests will be skipped.

A couple of the tests expect to get capacity information back from Dynamo, and so can't run against a local instance. If AWS_ACCESS_KEY and AWS_SECRET_KEY are set, these tests'll try to connect and interact with a table (in Dynamo's default region, us-east-1).

Assuming a local Node install, lein cljsbuild once test-none will run the Clojurescript tests. test-advanced will run the tests under :optimizations :advanced.

Contributions welcomed.

License

hildebrand is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.

hildebrand's People

Contributors

Stargazers

Watchers

Forkers

ericfode guthur galdolber leppert nunb chelseym arichiardi prees1 mtkp puppybits

hildebrand's Issues

Build a simple schema migration system

I haven't thought much about this, but I think it would be pretty easy and super useful. We'd probably be reading partial create/update statements from disk (?), or whatever, and storing the current version numbers in an internal Dynamo table. If we were worried about lack of atomicity it would be possible to encode the version number in the name an attribute which is added to the table each time an update is issued. Anyway generally you could have a sequence of files on disk like this:

0 (create):

{:name :xyz
 :keys [:x :y]
 :attrs [:z]
 :throughput {:read 1 :write 1}}
 ...}

1 (update throughput):

{:throughput {:read 2}}

2 (add/remove index):

{:indexes [:global [:add {:name :excellent ...}] 
                   [:remove {:name :terrible ...}]] ...}

(I'm imagining the table name is encoded in the file name, so not required for updates - or not)

There are some constraints imposed by Dynamo, e.g. inability to add local index after table creation time, and we would want to coalesce successive throughput adjustments.

The whole thing would be exposed really unobtrusively, like maybe a version of ensure-table! which brings the table up to date

Adjust currying notes to specify the culinary variety

https://nervous.io/clojure/aws/dynamo/hildebrand/2015/06/01/hildebrand/

For 5 paragraphs I confusedly thought curry was referring to functions being decomposed into their basic elements ;-)

Benchmark against AWS client

Doing this meaningfully is pretty involved. I ran some benchmarks with large batch gets of large items, and the numbers looked good, but I'm pretty far from being an expert in JVM benchmarking.

Batch operations throttling / rate limiting

There's :capacity option, but can't find anything about throttling / rate limiting. Something like this:

https://java.awsblog.com/post/Tx3VAYQIZ3Q0ZVW/Rate-Limited-Scans-in-Amazon-DynamoDB

In other words, internally throttle requests based on consumed / provisioned capacity just to avoid exceptions about exceeded capacity. We have automatic DynamoDB scaling based on usage, but it's slow and can't response in a fast way.

Is there anything like this planned? If not, we are willing to implement it. Thoughts about this topic? Recommendations?

//cc @rarous @salax

Shard ID is always null in production

I have literally pulled the following code out of your blog post on consuming streams. When I run this code against a local instance of dynamoDB I can retrieve the last shard-id using this code but when I try against an actual dynamodb table with a stream, there never seems to be a shard ID available. It always comes back as null. I am not really sure whats causing the problem here.

(defn read! [creds out-chan table-name]
  (go
    (let [stream-id (<! (latest-stream-arn! creds table-name))
          shard-id  (-> (describe-stream! creds stream-id)
                        <!! :shards last :shard-id)]
      (log/info (str "Consuming for Stream " stream-id " and shard " shard-id))
      (get-records! creds stream-id shard-id :latest {} {:chan out-chan}))))

I've tried debugging the describe-stream! function and I get the following exception.

(log/info  (-> (describe-stream! creds stream-id)
                        <!!))

INFO  com.pav.notification.comments.component: #error {
 :cause unknown-operation-exception: 
 :data {:type :unknown-operation-exception, :message nil}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message unknown-operation-exception: 
   :data {:type :unknown-operation-exception, :message nil}
   :at [clojure.core$ex_info invoke core.clj 4593]}]
 :trace
 [[clojure.core$ex_info invoke core.clj 4593]
  [eulalie.support$error__GT_throwable invoke support.cljc 17]
  [eulalie.support$issue_request_BANG_$fn__13408$state_machine__5777__auto____13409$fn__13411 invoke support.cljc 29]
  [eulalie.support$issue_request_BANG_$fn__13408$state_machine__5777__auto____13409 invoke support.cljc 22]
  [clojure.core.async.impl.ioc_macros$run_state_machine invoke ioc_macros.clj 940]
  [clojure.core.async.impl.ioc_macros$run_state_machine_wrapped invoke ioc_macros.clj 944]
  [clojure.core.async.impl.ioc_macros$take_BANG_$fn__5793 invoke ioc_macros.clj 953]
  [clojure.core.async.impl.channels.ManyToManyChannel$fn__1448 invoke channels.clj 102]
  [clojure.lang.AFn run AFn.java 22]
  [java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1142]
  [java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 617]
  [java.lang.Thread run Thread.java 745]]}

crc32-mismatch for big requests

Just started tracking this issue, not sure where the problem is, but maybe anyone already saw it.

I've got simple lambda function ...

(defn- get-participants [org survey]
  (query!
    (lambda/credentials)
    :frank-participant
    {:nyx-organization [:= org]
     :id               [:begins-with (str survey ":")]}
    {:project [:nyx-organization :nyx-user :id :timestamp :employee :participant :answers]}
    {:chan (async/chan 1024)}))

(def ^:export list
  (async-lambda-fn
    (fn [ev ctx]
      (go
        (try
          (if-let [_ (schema/check ParticipantListEventSchema ev)]
            (lambda/bad-request! ctx)
            (let [org (:nyx-organization ev)
                  id (:survey ev)
                  participants (get-participants org id)
                  participants (<! (async/into [] participants))]
              (if (some #(instance? js/Error %) participants)
                (lambda/internal-server-error! ctx)
                (lambda/succeed! ctx {:participants (participants-for-user participants "admin")}))))
          (catch :default _
            (lambda/internal-server-error! ctx)))))))

... this function does work perfectly for ~100 participants. But when I try to fetch 500 participants (roughly > 100), I've got crc32-mismatch for almost all requests. When I try this in Python, it does work like a charm even for 1000 participants.

An idea what can be wrong?

create-table! without :indexes. "number of attributes in key schema must match .."

Thank you for hildebrand, it looks great!

I'm seeing some strange behaviour when trying to use create-table! without the :indexes key in the table-spec. (Which I assume is optional based on http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html ?)
E.g. this works:

(hc/create-table!!
 creds
 {:table :curries
  :throughput {:read 1 :write 1}
  :attrs {:name :string :region :string :spiciness :number}
  :keys  [:name]
  :indexes {:global
            [{:name :curries-by-region-spiciness
              :keys [:region :spiciness]
              :project [:all]
              :throughput {:read 1 :write 1}}]}})

But this gives me an error (“validation-exception: The number of attributes in key schema must match the number of attributesdefinined in attribute definitions.”):

(hc/create-table!!
 creds
 {:table :curries1
  :throughput {:read 1 :write 1}
  :attrs {:name :string :region :string :spiciness :number}
  :keys  [:name]})

On the other hand this works:

(hc/create-table!!
 creds
 {:table :curries2
  :throughput {:read 1 :write 1}
  :attrs {:name :string}
  :keys  [:name]})

However this gives me the same error as above:

(hc/create-table!!
 creds
 {:table :curries3
  :throughput {:read 1 :write 1}
  :attrs {:name :string :region :string}
  :keys  [:name]})

This again works:

(hc/create-table!!
 creds
 {:table :curries4
  :throughput {:read 1 :write 1}
  :attrs {:name :string :region :string}
  :keys  [:name :region]})

.. which made me expect that this would work, but it fails with the same error as above:

(hc/create-table!!
 creds
 {:table :curries5
  :throughput {:read 1 :write 1}
  :attrs {:name :string :region :string :spiciness :number}
  :keys  [:name :region :spiciness]})

The way I understood it so far

:attrs just defines mandatory fields
:keys are primary index columns which need to be mandatory fields defined in :attrs

BigNumber conversion

Great work on the libs, it is a pleasure using them. I am running into one thing and that is the BigNumbers conversion. When I start doing arithmetic, say fe. adding 0.1 to a 200 stored as a Number type in dynodb, I get "200.1" as a string the following will become "200.1.1" etc etc.

As a quick solution, as I store deeply nested data, I just do a deepwalk converting all BigNumbers to regular number and all is normal again except for the fact I don't like to do a deep-walk at the edges of my system.

Is this a case foreseen, and by design, or not, and what is a way to go around that?

Remove implementation details from hildebrand.clj

I don't want to make anything private, and don't want to clutter anybody's autocomplete - so maybe an auxiliary module would be best, for rename-error, defissuer - all that stuff.

Self-hosted compatibility

Hello folks!

Given the new developments in lumo, I was playfullyngly trying to see if I could make hildebrand work in a cljs-in-cljs environment:

Lumo 1.2.0
ClojureScript 1.9.482

So I setup my dependencies:

(def dependencies '[[org.clojure/clojurescript "1.9.473" :scope "provided"] ;; maybe I don't need this one
                    [io.nervous/hildebrand "0.4.5" :scope "test" :exclusions #{org.clojure/clojurescript org.clojure/clojure org.clojure/tools.reader}]
                    [andare "0.4.0"] ;; replacing core.async
                    [cljsjs/bignumber "2.1.4-1" :scope "test"]])

but when I required the namespace I got:

cljs.user=> (require 'hildebrand.core)
                        ⬆
Can't recur here at line 88 clojure/core.clj
WARNING: list already refers to: cljs.core/list being replaced by: clojure.core$macros/list at line 16 clojure/core.clj
WARNING: cons already refers to: cljs.core/cons being replaced by: clojure.core$macros/cons at line 22 clojure/core.clj
WARNING: Can't take value of macro cljs.core/let at line 32 clojure/core.clj
WARNING: let already refers to: cljs.core/let being replaced by: clojure.core$macros/let at line 32 clojure/core.clj
WARNING: Can't take value of macro cljs.core/loop at line 37 clojure/core.clj
WARNING: loop already refers to: cljs.core/loop being replaced by: clojure.core$macros/loop at line 37 clojure/core.clj
WARNING: Can't take value of macro cljs.core/fn at line 42 clojure/core.clj
WARNING: fn already refers to: cljs.core/fn being replaced by: clojure.core$macros/fn at line 42 clojure/core.clj
WARNING: first already refers to: cljs.core/first being replaced by: clojure.core$macros/first at line 49 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 55 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 55 clojure/core.clj
WARNING: next already refers to: cljs.core/next being replaced by: clojure.core$macros/next at line 57 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 64 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 64 clojure/core.clj
WARNING: rest already refers to: cljs.core/rest being replaced by: clojure.core$macros/rest at line 66 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 73 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 73 clojure/core.clj
WARNING: conj already refers to: cljs.core/conj being replaced by: clojure.core$macros/conj at line 75 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 84 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 84 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 85 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 85 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 85 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 85 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 86 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 86 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/& at line 86 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/xs at line 86 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/xs at line 87 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/coll at line 88 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/x at line 88 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/xs at line 88 clojure/core.clj
WARNING: Use of undeclared Var clojure.core$macros/xs at line 88 clojure/core.clj

Which is very unfortunate. Maybe not a priority but it would be nice if this could work. I'll try some debugging as well.

query-count returns only the size of the first partition of the result set

i.e. the number of items which total 1mb, if there are more items matching the query

Unprocessed items do not seem to be returned.

It appears that the unprocessed data is ignored in the transform result, returning only the hildebrand/error via the ex-info.

https://github.com/nervous-systems/hildebrand/blob/master/src/hildebrand/internal.cljc#L18

This is very undesirable as the one would most likely wish to retry the unprocessed items.

More offline/unit tests

Dead link in the readme

The link to the introduction blog post is dead. I think it should point here

https://nervous.io/clojure/aws/dynamo/hildebrand/2015/06/01/hildebrand/

and not here:

https://nervous.io/clojure/aws/dynamo/hildebrand/2015/06/08/hildebrand/

Write API documentation

The API is going to be pretty stable, in terms of top-level functions, so I think manually typing out meticulously formatted examples alongside explanations of possible options, corner cases for each top-level function would be better than trying to get fancy in docstrings and generating anything.

Issue loading Hildebrand using Boot?

Hello,

Caveat: I suspect that this is probably an issue with how I'm loading things, and not with hildebrand itself, but any advice here would be appreciated.

I'm trying to use hildebrand on a toy project to create an AWS Lambda Function using CLJS. I'm using boot as my build tool, instead of leiningen.

When I run my lambda function, aws-sam-local (functions.js is the compiled file), I get:

module initialization error: TypeError
    at Object.<anonymous> (/var/task/functions.js:3251:21)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.require (module.js:497:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (/var/task/tournaments.js:2:9)
    at Module._compile (module.js:570:32)
    at Object.Module._extensions..js (module.js:579:10)
    at Module.load (module.js:487:32)
    at tryModuleLoad (module.js:446:12)
    at Function.Module._load (module.js:438:3)
    at Module.require (module.js:497:17)
    at require (internal/module.js:20:19)

This is happening even with this minimal file:

(ns functions.core
  (:require [cljs-lambda.macros :Refer-macros [defgateway]]
                 [promesa.core :as p]
                 [hildebrand.channeled :refer [query!]]))

However, given I can require other libraries without issue, I think there must be something at least related to Hildebrand here.

I'm compiling using the google closure compiler with simple optimizations.

Any advice? Thank you!

Implement handling of binary data types

There's a hack allowing literal attributes to be passed through, e.g. {:x #hildebrand/literal {:BS ...}}, but there's nothing binary specific. B and BS will be returned as-is when retrieving.