Giter VIP home page Giter VIP logo

xata-py's People

Contributors

dependabot[bot] avatar doublevcodes avatar paulaguijarro avatar philkra avatar richardgill avatar sepal avatar sferadev avatar snide avatar tsg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xata-py's Issues

Make `workspace_id` a smart default.

Implement the same smart default approach as db_name and branch_name. These values are used from the bootstrapped client and are considered optional for endpoint APIs, if not provided take the value from the client. This creates leaner, less cumbersome API interfaces.

Deprecate not namespaced endpoints

With the code generator in place #16, all endpoints are organized in namespaces. This makes directly accessible endpoints that are curated by hand obsolete. These should be deprecated in favor of the new namespaced endpoints. Use the deprecation package to mark the methods.

  • Methods will be deprecated with the next pre-release version 0.3.0
  • Methods will be removed with the release of the 1.0.0 version of the sdk
  • change integration tests to use new endpoints
  • add the deprecation notice to methods and point to replacement endpoint
  • remove methods with 1.0.0

Why not change the methods to be shims for the generated endpoints, this would be a breaking change as the return type would change from dict to requests.Response.

If no direct replacement exists, we should explore the path of adding them into a helpers package.

Methods to deprecated

  • client.post()
  • client.get()
  • client.put()
  • client.delete()
  • client.patch()
  • client.query()
  • client.get_first()
  • client.get_by_id()
  • client.create()
  • client.create_or_update()
  • client.create_or_replace()
  • client.update()
  • client.delete_record()
  • client.search()
  • client.search_table()

[Meta] Python SDK initial release

For the initial release we plan a base client that doesn't yet create models via code generation, but still simplifies working with Xata in python.

  • Connection, workspace, API keys configuration mimicking the TypeScript SDK
  • Automatic e2e tests
  • Documentation published on readthedocs.io
  • Client installable via pip and published automatically to PyPI
  • Support arbitrary requests via the API (POST, PUT, GET, DELETE)
    • To data plane (i.e. xata.sh)
    • To control plane (i.e. api.xata.io)
    • To staging/local systems
  • Data plan API coverage:
    • create, insert, createOrUpdate record
    • update record
    • query
    • getFirst
    • getUnique
    • delete record
    • search
    • aggregations

Road to GA

Tasks

  1. enhancement
  2. documentation enhancement
  3. breaking change
  4. philkra
  5. breaking change codegen enhancement
  6. bug
  7. codegen enhancement
  8. enhancement
    philkra
  9. breaking change codegen
  10. breaking change codegen
  11. breaking change codegen
  12. backport 0.x enhancement good first issue helpers/transactions
  13. breaking change enhancement
  14. 4 of 4
    breaking change
  15. dependencies enhancement
    philkra
  16. awaiting docs review content update sdk/python
  17. breaking change codegen tests/integration-tests tests/unit-tests
  18. breaking change codegen tests/integration-tests

DLQ for BulkProcessor

add the unprocessable batches to a DLQ in the BP.

{
"timestamp": utcnow(),
"exception": Exception(e),
"data": batch,
}

Update: keep error count in stats object.

Remove unnecessary `requests` shims

remove the methods:

  • client.get()
  • client.post()
  • client.put()
  • client.patch()
  • client.delete()

They call into client.request() and provide only limited value.

BUG: Ensure `flush_queue` blocks if timing is flaky

The flush_queue method in the bulk processor can consider a queue to be empty and hence simply terminate the flush, even if the queue is populated. This is a race condition as the queue size in flush_queue does not use the thread-safe queue size. The one used is only updated after the first batch processing.

Bulk processor flush_queue leaks batches

BulkProcessor flush_queue completes without emitting all records to Xata.

In the following repro script it may miss one or two batches.

Occurs with: Python SDK 1.2.0 on Python 3.11.6

Repro script:

from xata.client import XataClient
from xata.helpers import BulkProcessor

TARGET_DB = "https://repro-q867qv.us-east-1.xata.sh/db/test"
BRANCH = "main"

client = XataClient(db_url=f"{TARGET_DB}:{BRANCH}")
bp = BulkProcessor(client)

data = []

for i in range(0, 10000):
    data.append({"values": i})

bp.put_records("test", data)

bp.flush_queue()

print(bp.stats)
print(bp.failed_batches_queue)

Script output:

python3 test.py
{'total': 9925, 'queue': 0, 'failed_batches': 0, 'tables': {'test': 9925}}
[]

Xata contains 9975 records. The last batch with number 9974-9999 is missing from Xata.
The stats output (9925) is different from the Xata content (9975), neither of which matches the number of records given to the bulk processor.

Initially reported on Discord.

Align naming conventions

There is a mix of snake case and camel case in the codebase. Opt for the pythonic, PEP8 naming style.

This also applies to the client.get_config() method, the keys are camel-cased.

Tasks

  1. breaking change codegen
  2. dependencies enhancement
    philkra
  3. codegen
  4. breaking change codegen tests/integration-tests

Empty list of items in `failed_batches`

The list of failed_batches in the BulkProcessor keeps continues to grow and grow if records are failing. Introduce a mechanism to fetch the items and free the space again (pop).

`throw_exception` option in BulkProcessor

Currently, every exception in the bulk processor throws an exception and terminates the thread. Add an option throw_exception that allows a bool flag to trigger throw or not to throw. Default: False.

Investigate direct namespace invocation

If you, correctly, initialize the SDK directly with a namespace as follows:

records = XataClient(api_key="", ..., region="region").records()

The bootstrapping of the internals does not work, if you e.g. set a non-default region, it will be ignored and the default one is used.

[Feedback] Transaction helper

Tasks

revisit API names for redundancy, e.g. shorten `xata.database().createDatabase()` to `xata.database().create()`

databases

  • getDatabaseMetadata -> getMetadata
  • createDatabase -> create
  • deleteDatabase -> delete
  • updateDatabaseMetadata -> updateMetadata
  • listRegions -> getRegions

users

  • getUser -> get
  • updateUser -> update
  • deleteUser -> delete

workspaces

  • getWorkspacesList -> getWorkspaces
  • createWorkspace -> create
  • getWorkspace -> get
  • updateWorkspace -> update
  • deleteWorkspace -> delete
  • getWorkspaceMembersList -> getMembers
  • updateWorkspaceMemberRole -> updateMember
  • removeWorkspaceMember -> removeMember

branch

  • getBranchList -> getBranches
  • getBranchDetails -> getDetails
  • createBranch -> create
  • deleteBranch -> delete
  • getBranchMetadata -> getMetadata
  • updateBranchMetadata -> updateMetadata
  • getBranchStats -> getStats
  • resolveBranch -> resolve

migrations

  • getBranchMigrationHistory -> getHistory
  • getBranchMigrationPlan -> getPlan
  • executeBranchMigrationPlan -> executePlan
  • getBranchSchemaHistory -> getHistory
  • compareBranchSchemas -> compare
  • updateBranchSchemas -> update
  • previewBranchSchemaEdit -> preview
  • applyBranchSchemaEdit -> apply
  • pushBranchMigrations -> push

records

  • insertRecord -> insert
  • getRecord -> get
  • insertRecordWithID -> insertWithId
  • upsertRecordWithID -> upsertWithId
  • deleteRecord -> delete
  • updateRecordWithID -> updateWithId
  • bulkInsertTableRecords -> bulkInsert

search_and_filter

  • queryTable -> query
  • vectorSearchTable -> vectorSearch
  • askTable -> ask
  • summarizeTable -> summarize
  • aggregateTable -> aggregate

table

  • createTable -> create
  • deleteTable -> delete
  • updateTable -> update
  • getTableSchema -> getSchema
  • setTableSchema -> setSchema
  • getTableColumns -> getColumns
  • addTableColumn -> addColumn

Move Mako dependency into poetry

In order to unblock #41 we had to implement a workaround in #55 to bypass the missing dependency. Move the Mako dependency in the poetry dependency management file.

Breakup test files into domains

The unit- & integration tests currently reside in one file, which makes the setup slow and the files bloated. Break each file into more domain-specific tests.

[FEEDBACK] Return the data immediately as response from the API

Every API call returns a requests.Response instance, while this is convenient to get all the information about the HTTP response, it is also clunky to handle. The expectation is to respond with the data (dict) shortcutting the .json() method. Nevertheless, should the response provide access to the status code and headers.

[Feedback][Helper] Pagination `getAll()` like

Create a query and search helper that handles the pagination under the hood. A user would provide only a query and would get the full result set without the need for a pagination routine, which is handled within the helper.

Create Database is missing properties

The xata.databases().create() method is missing the properties ui and metadata.

put:
      operationId: createDatabase
      summary: Create Database
      description: Create Database with identifier name
      requestBody:
        description: ''
        content:
          application/json:
            schema:
              description: ''
              type: object
              properties:
                branchName:
                  type: string
                  minLength: 1
                region:
                  type: string
                  minLength: 1
                ui:
                  type: object
                  properties:
                    color:
                      type: string
                metadata:
                  $ref: '#/components/schemas/BranchMetadata'
              example:
                branchName: main
                region: us-east-1
                metadata:
                  repository: github.com/my/repository
                  branch: github repository
                  stage: testing
                  labels:
                    - development
              required:
                - region

Make base URLs configurable

Currently, the client allows the initialization of different base URLs for core and workspace. This is not reflected in the namespaced endpoints, as these use a static value. Rework the Namespace class to use be able to inject a different base_url for core. The workspace will be derived from the db_url param.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.