nebo15 / annon.api Goto Github PK

View Code? Open in Web Editor NEW

332.0 12.0 25.0 3.76 MB

Configurable API gateway that acts as a reverse proxy with a plugin system.

Home Page: http://docs.annon.apiary.io/

License: MIT License

Shell 0.55% Elixir 86.63% API Blueprint 11.90% HTML 0.23% Dockerfile 0.69%

gateway elixir api architecture proxy authorization auth elixir-lang api-gateway reverse-proxy

annon.api's People

Contributors

Stargazers

Watchers

annon.api's Issues

Pretty print option

Allow to pass X-Pretty-Print: true header so gateway will add spaces in json response to make in readable in browsers.

This will make our responses friendlier to developers.

Race condition when settings are applied

{"time":"2017-06-21T10:02:13.889Z","sourceLocation":{"moduleName":null,"line":null,"functionName":null,"file":null},"severity":"ERROR","metadata":{"error_logger":"format"},"logMessage":"#PID<0.31977.8> running Annon.ManagementAPI.Router terminated\nServer: 35.187.188.208:8080 (http)\nRequest: PUT /apis/ca9e5956-1cb3-47a1-94ba-d5c489301edb/plugins/cors\n** (exit) an exception was raised:\n    ** (Ecto.ConstraintError) constraint error when attempting to insert struct:\n\n    * unique: plugins_pkey\n\nIf you would like to convert this constraint into an error, please\ncall unique_constraint/3 in your changeset and define the proper\nconstraint name. The changeset defined the following constraints:\n\n    * unique: plugins_api_id_name_index\n    * foreign_key: plugins_api_id_fkey\n\n        (ecto) lib/ecto/repo/schema.ex:493: anonymous fn/4 in Ecto.Repo.Schema.constraints_to_errors/3\n        (elixir) lib/enum.ex:1229: Enum.\"-map/2-lists^map/1-0-\"/2\n        (ecto) lib/ecto/repo/schema.ex:479: Ecto.Repo.Schema.constraints_to_errors/3\n        (ecto) lib/ecto/repo/schema.ex:213: anonymous fn/13 in Ecto.Repo.Schema.do_insert/4\n        (ecto) lib/ecto/repo/schema.ex:684: anonymous fn/3 in Ecto.Repo.Schema.wrap_in_transaction/6\n        (ecto) lib/ecto/adapters/sql.ex:620: anonymous fn/3 in Ecto.Adapters.SQL.do_transaction/3\n        (db_connection) lib/db_connection.ex:1275: DBConnection.transaction_run/4\n        (db_connection) lib/db_connection.ex:1199: DBConnection.run_begin/3"}

Support OpenTracing reporting

There new helpful standard that allows developers to track stacktrace over all system components, this issues should start discussion upon should we support it or not.

I thing we should, at least, wait until some nice-looking related Elixir library will be created by the community.

Nice dashboards example:

https://amplify.nginx.com/dashboard/963

Return 404 error with appropriate description when upstream domain does not exist

Investigate tcp packets proxying/mutation

https://www.youtube.com/watch?v=cpYdofjZ7UY

Revise all APIs to match latest docs

Refactor Annon to user umbrella apps

Possible apps structure:

- gateway (configurations,  plugins, public http server, >configuration repo<)
- management_api (service and management endpoints)
- cluster (cluster communication, distributed counters, presence, service discovery)
- runtime_tools (logging, monitoring, tracing, >requests repo<)

Pagination missing "count" param

Replace party_id and pcm in scopes to consumer_id and api

party_id is an internal var name.

Move plugin settings validation rules to priv/validations/plugins/*.json files

Requests list is ordered ASC, but DESC will be better for clients

We need to standardize fallbacks (as Phoenix did) to reduce code complexity. Also it can give Plugins simple API which they can use to generate responses, dropping all this logic from them into fallbacks.

Docker Compose samples

Add property tests

https://github.com/whatyouhide/stream_data

More data in logging

There are some data that is missing for logs to be more practical in debugging:

Does this request actually hit an upstream?
What full upstream response was?
Which plugins took effect? What was the API config like?

Additionally, we need to get rid of Plug.Parsers when writing logs.

RFC: Rework request URL (path, host, etc.) matching

Right now we are using PostgreSQL-style pattern matching with % and _, even trough it works, there are some limits and features that we want to support:

We want to match paths /some_api and /some_api/123 as different APIs. Current workaroung is defining /some_api_ and /some_api/_% with different match_priority. It looks ugly and will match /some_api1, which will unexpected for most users.
It would be really awesome to take parts of request and substitute them to upstream path. Eg: /blog/:id/comments -> /comments/:id.
Regex matching is expensive.

Env configurable headers blacklist

They are hardcoded, move to ENV.

RFC: All new logging/tracing system

Logging got few issues and that starts to worry me:

We need a good way to sanitize logs from sensitive data;
To log request and response we need to have them as a binary data, which works well for small json that can be stored in a request context, but really bad for all other use cases;
PostgreSQL is a bad place to store logs;
The real usage demand is more about "I want to trace everything that happened in my system" rather than "I want to know what the response and request were like".

I guess it is time to reconsider this part of Annon to be a global adapter-based tracing module, which can keep working as is (which is okay for small deployments) and get much more features for production systems: send data to SaaS services, other data stores - Elasticsearch or Cassandra, tracing systems - DataDog APM, NewRelic APM, AppSignal or Scout.

Adapter API would be large and demanding, so not every service would be able to be integrated, but that is okay since most of the issues above would have an approach to solve them.

Migrate to cassandra

Rework validation

Validation responses is yet to be perfect:

sometimes they don't match API Manifest
path's may be inconsistent (at least in latest version it's full)
nex json schema response is inconsistent with changeset

We need to find a nice way to render validations that have both changeset and json schema errors.

Fix Scopes plugin validation rules

If pcm strategy is picked, url_template should be required.

Issue with creating API through PUT/POST

Even with empty DB, this request currently fails for me locally:

curl --silent --request PUT --header "Content-Type: application/json" --data '{"name":"\"Somename\"","request":"{\"host\":\"%\",\"path\":\"/apis_status\",\"port\":80,\"scheme\":\"http\",\"methods\":[\"GET\"]}"}' "http://localhost:4001/apis/9C556157-2BAC-4901-8E3C-4471F0302D70"

The error is this:

{
  "meta": {
    "url": "http://localhost:4001/apis/9C556157-2BAC-4901-8E3C-4471F0302D70",
    "type": "object",
    "request_id": "7avvvl3ahin30empcu9kg7s1rjj88c2i",
    "code": 422
  },
  "error": {
    "type": "validation_failed",
    "message": "Validation failed. You can find validators description at our API Manifest: http://docs.apimanifest.apiary.io/#introduction/interacting-with-api/errors.",
    "invalid": [
      {
        "rules": [
          {
            "rule": null,
            "params": [],
            "description": "is invalid"
          }
        ],
        "entry_type": "json_data_property",
        "entry": "$.request"
      }
    ]
  }
}

Let's encrypt integration

Respond to ACME challenge
Get, store and serve SSL responses with Let's encrypt cert. https://github.com/sikanhe/acme
Rotate certificates at runtime

RFC: Mock/Request termination plugin

Respond with a static JSON content.

Use cases:

mocking API responses;
sending maintenance messages;
serving most common content from the Edge servers (eg. for #257).

Refactor configs

Configuration becomes a little bit messy, group related things and split it to files.

Fix 500 error when no auth token is provided

If no token is provided, an error is raised:

** (FunctionClauseError) no function clause matching in anonymous fn/1 in Annon.Plugins.Scopes.get_scopes/2
        (annon_api) lib/annon_api/plugins/scopes.ex:39: anonymous fn([]) in Annon.Plugins.Scopes.get_scopes/2
        (annon_api) lib/annon_api/plugins/scopes.ex:39: Annon.Plugins.Scopes.get_scopes/2
        (annon_api) lib/annon_api/public_api/router.ex:1: Annon.PublicRouter.plug_builder_call/2
        (annon_api) lib/plug/error_handler.ex:64: Annon.PublicRouter.call/2
        (plug) lib/plug/adapters/cowboy/handler.ex:15: Plug.Adapters.Cowboy.Handler.upgrade/4
        (cowboy) /opt/app/deps/cowboy/src/cowboy_protocol.erl:442: :cowboy_protocol.execute/4

Show entry of validation errors for properties and not for whole objects

First error is for age_groups property of borrowers, but entry is $.buckets.[0].criterias.borrowers but not $.buckets.[0].criterias.borrowers.age_groups. And etc.

{
   "meta":{
      "url":"http://os-dev-gateway.nebo15.com/tr/portfolio_subscriptions",
      "type":"object",
      "request_id":"l6hjqeuvfcf27eu1ldp1pbo3qrrssdg6",
      "code":422
   },
   "error":{
      "type":"validation_failed",
      "message":"Validation failed. You can find validators description at our API Manifest: http://docs.apimanifest.apiary.io/#introduction/interacting-with-api/errors.",
      "invalid":[
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property age_groups was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.borrowers"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property income_groups was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.borrowers"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property currencies was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.loans"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property outstanding_amount_principal was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.loans"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property apr was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.loans"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property is_prolonged was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0].criterias.loans"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property risk_class was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property currency was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property term_to_maturity was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property term_unit was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property loans_investment_min was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property assignment_schema was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property guarantee_type was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         },
         {
            "rules":[
               {
                  "rule":"required",
                  "params":[

                  ],
                  "description":"required property buy_back_available was not present"
               }
            ],
            "entry_type":"json_data_property",
            "entry":"$.buckets.[0]"
         }
      ]
   }
}

Status endpoint

/status

{
  nodes: [],
  cluster_size: N,
  latencies: {...},
  load: {}
}

Auth plugin

Scopes plugin became a tech debt that we need to pay off. It started as a plugin that resolves user scopes by a different strategies, but ended up in a mess with tokens in a the Plug.Conn and pattern matching on them in different places.

JWT and Scopes should be merged to Auth plugin, that is:

Have configurable consumer_id resolving strategies: oAuth Access Token, JWT
Have configurable consumer_scope resolving strategies: request to auth server or extraction from token.
It should set UpstreamRequest headers by itself.
It should set x-consumer-id, x-consumer-scope headers in response.

Cli tool for idempotent config management (like Kubernetes kubectl)

Add behaviour for Cache adapter

Cache has two adapters that don't have common behaviour, lets fix it.

Auto-generated docs based on Requests history

Import configuration

From file (yaml?)
From OpenAPI spec
From API Blueprint spec

RFC: Rate-limiter API

The rate limit plugin requires us to follow one of the options:

First. To store a distributed CRDT counter which is merged via clustering protocol on an event or timing basis.

Pros:

Each node is able to rate limit request even when split brain happens, counters then re-joined when nodes see each other again.
Low latency overhead, since all calls are local.
Annon users would benefit from using load balancers that route requests to the same gateway instance for a single client (eg. by balancing on IP or auth header). This CRDT would be local and only replication would be involved.

Cons:

If all request are distributed over the whole cluster, memory consumption grows quickly (=NODES*CONSUMERS) with each new API client.
Increased network traffic due to CRDT synchronization.
We can provide only estimated rate limit - when split brain happens two separate counters would co-exist allowing to send more requests; ttl for counters is not 100% reliable, some counters could be reset before other ones (leverage generation counters?).
Need to implement CRDT and gossip protocol.

Second. Build a hash ring out of all known Annon nodes and route counter increment calls or whole request to the node which is responsible for limiting it (probably via RPC calls; Kademilla DHT?).

Pros:

Annon users would benefit from using load balancers that route requests to the same gateway instance for a single client (eg. by balancing on IP or auth header).
Less memory consumption, only one counter for each API client.

Cons:

Latency hit for rate-limited requests because of RPC call
Integration with load balancer would require it to use exactly the same hashing as Annon, which is not feasible for most cases

Third. Provide only very limited rate limiting which would work either with load balancer that supports sticky sessions (route request to the same Annon instance for each consumer) or when there is only one Annon instance at all.

This is similar to option one, except we don't need to sync CRDT's.

Fourth. Use persistent backends.

Proc:

Exact rate limits is guaranteed.
Offload most of the development to integrating third-party service (eg. redis)

Cons:

Split brain would result in downtime;
Storage would result in downtime;
Possibly we would require more infrastructure dependencies.

ACL MUST use request_path, not API path

Return 201 when entity is created on PUT request

Open-Source Checklist

General

Proxy

Auth

Add Basic authorization strategy with sample auth server (as separate project that is called via RPC/REST)
Set additional headers when proxying authorized requests: X-Consumer-External-ID, X-Consumer-Scopes, X-Forwarded-For
~~ Add api key authorization strategy ~~

Security

CORS plugin
SSL native support (and investigate slow SSL implementation in Erlang)
Rate Limiting plugin
Size limiting plugin (Cowboy does not allow to stop request before it fully received, need some kind of hack around it)
Provide a way to hide secrets from logs (user passwords, tokens, etc)
Add token-based auth for private API

GUI

Beta version for management GUI

Perfomance

Stress tests, describe limitations, clustering strategies
Optimize places where we work with DB (not cache) - Logger, Consumers.
Speed up config resolving

Nice to have

~~Cache plugin and hex package to read from it via RPC~~
Auto-generated status pages
Binary protocol parser

RFC: Refactor clustering/presence systems

Skycluster becomes legacy, It would be better to split it into two parts:

discovery (new autocluster or https://github.com/mrluc/peerage);
communication (own distributed processes or firenest).

This must be done for some other RFC to work: #219, #210.

Drop nil's from paging struct and make it last key in json response

Pagination don't work without `limit` param

Limit param is optional, but pagination does not work when it's not present

Return 401 if no token was found

Right now if no token is found we resort to scopes are a blank list scenario, and return 403.

This is incorrect.

If no token is found, we must return 401 and halt.

Explicitly handle error from authorization server

When auth plugin is enabled, during auth phase when auth server is provided a token to verify, not only it pull the token from DB and renders it. It also performs some basic sanity checks:

token is not expired,
approval (between token's client_id + user_id) still exists.

If any of the two fail, the token will return response containing error key instead of data key. In turn Annon must render that error back to requester, along with response code provided by auth (403 aka access_denied error).

RFC: Annon as a supernode for distributed Erlang

Erlang distribution is limited to 45-60 nodes. There is an academic research on Erlang SD which suggests two things:

Split cluster into smaller islands which are interconnected via supernodes - for a Gateway it's an essential task to deliver request to the desirable cluster;
Name this islands by environment/host tags - gateway could pick upstream based on those clusters properties ("this API sends requests to nodes with available GPU");

We could provide an RPC protocol with service discovery on top of Erlang.

This is way future work since we don't have demand for clusters of that size.

RFC: Request metadata ingress

In a microservice architecture there it is common to have a single shared dictionary which entries are used by a bunch of upstream services. Examples of these common libraries: list of USA states, a list of item colors which are available on an eCommerce website, address types, data range units and all sorts of general configuration.

To address this cases I want to propose a similar approach to the one that we use when building libraries. It is bad to hardcode configuration within a library or to set an API that allows to configure it (either via Apllication environment or via System environment). It is much more practical to accept all options as a function call arguments, offloading configuration management to a caller app.

Same we can do in a microservice environment - downstream service may take responsibility to fulfill an API call with all data that is required to handle the request and response without the need for an upstream to store state.

The gateway can store an in-memory (ETS table) cache of well-known dictionaries that is either pulled from configuration service (when Gateway starts or cache is missing) or pushed from it (when a configuration is changed and needs to be propagated).

A new upstream metadata plugin needs to be developed, which can be added to an API and contain settings for multiple dictionaries and rules how to fetch them.

A configuration update or cache drop API needs to be developed so that responsible service can reload the cache when it changed.

Pros

No need to cache or set all that data in an upstream service. Easier deployment (you won't ever forget to add new configuration environment variable), and easier configuration management for all system (only non-business related stuff can be stored in app configuration). Microservice is closer to a pure stateless app.
No additional latency is added because requesting a configuration from its store.
Configuration update propagates almost instantly without the need to build any kind of cache management in all configuration consumers.

Concerns

A configuration may be expensive to be sent on each request (network IO hit).
People may start to load too much data in this configuration store. Best practices and limitations must be well described in the documentation.

RFC: Circuit Breaker Plugin

Implementation failure detection approaches:

Heartbeat to upstream services to the static endpoint - serve static response when error threshold is exceeded. (Hint: it's common for LB's to ping more often when failure rate starts to grow).
Track response status codes for existing requests and responses and use them as a service threshold;
Set list of status codes on which response is replaced with a static content and allow explicitly limit request timeout for an API.

For an implementation options of shared failure rate counter refer to #219.

https://github.com/Nebo15/os.gateway/blob/master/lib/gateway.ex#L12
Порт должен быть в настройках. Вообще формирование child spec я бы вынес в отдельную функцию (или модуль, если это уместно), чтобы описание супервизора выглядело более привычным.
https://github.com/Nebo15/os.gateway/blob/master/rel/config.exs#L13
Тут :os_gateway, а везде :gateway. Проект не соберется.
Poison вышел 3.0. Он вроде совместим. Лучше юзать последние зависимости, чтобы через два месяца не заниматься миграцией.
https://github.com/Nebo15/os.gateway/blob/master/mix.exs#L58
Конфиги нужно обсудить где хранить. Может все сразу в кассандре, а не держать две зависимости: бд для конфигов и бд для остального.

nebo15 / annon.api Goto Github PK

annon.api's People

Contributors

Stargazers

Watchers

Forkers

annon.api's Issues

General

Proxy

Auth

Security

GUI

Perfomance

Nice to have

Pros

Concerns

Recommend Projects

Recommend Topics

Recommend Org