apify / apify-client-python Goto Github PK

View Code? Open in Web Editor NEW

37.0 10.0 9.0 3.63 MB

Apify API client for Python

Home Page: https://docs.apify.com/api/client/python/

License: Apache License 2.0

Python 92.33% Makefile 0.39% JavaScript 5.84% Shell 0.38% CSS 1.07%

api apify client python scraping

apify-client-python's Introduction

Apify API client for Python

The Apify API Client for Python is the official library to access the Apify API from your Python applications. It provides useful features like automatic retries and convenience functions to improve your experience with the Apify API.

If you want to develop Apify Actors in Python, check out the Apify SDK for Python instead.

Installation

Requires Python 3.8+

You can install the package from its PyPI listing. To do that, simply run pip install apify-client in your terminal.

Usage

For usage instructions, check the documentation on Apify Docs.

Quick Start

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Start an actor and wait for it to finish
actor_call = apify_client.actor('john-doe/my-cool-actor').call()

# Fetch results from the actor's default dataset
dataset_items = apify_client.dataset(actor_call['defaultDatasetId']).list_items().items

Features

Besides greatly simplifying the process of querying the Apify API, the client provides other useful features.

Automatic parsing and error handling

Based on the endpoint, the client automatically extracts the relevant data and returns it in the expected format. Date strings are automatically converted to datetime.datetime objects. For exceptions, we throw an ApifyApiError, which wraps the plain JSON errors returned by API and enriches them with other context for easier debugging.

Retries with exponential backoff

Network communication sometimes fails. The client will automatically retry requests that failed due to a network error, an internal error of the Apify API (HTTP 500+) or rate limit error (HTTP 429). By default, it will retry up to 8 times. First retry will be attempted after ~500ms, second after ~1000ms and so on. You can configure those parameters using the max_retries and min_delay_between_retries_millis options of the ApifyClient constructor.

Support for asynchronous usage

Starting with version 1.0.0, the package offers an asynchronous version of the client, ApifyClientAsync, which allows you to work with the Apify API in an asynchronous way, using the standard async/await syntax.

Convenience functions and options

Some actions can't be performed by the API itself, such as indefinite waiting for an actor run to finish (because of network timeouts). The client provides convenient call() and wait_for_finish() functions that do that. Key-value store records can be retrieved as objects, buffers or streams via the respective options, dataset items can be fetched as individual objects or serialized data and we plan to add better stream support and async iterators.

apify-client-python's People

Contributors

Stargazers

Watchers

Forkers

pythonif alxkrispy martolini bhctest123 advantch sadigaxund teamflowflash tjrivera

apify-client-python's Issues

Creating Actor from Template missing package.json

Running apify create my-first-actor --template python-start

Error: ENOENT: no such file or directory, open \my-first-actor\package.json

You are able manually add package.json and start script to run actor.

Issue Type: LOW

Implement key-value store-related resource clients

Document how to add input to an actor

I'm trying to find out how to run an actor and specify the input for it. E.g., the LinkedIn Company URL needs something like {"queries": "Tesla\nmicrosoft.com"}

The API docs are clear that whatever data is passed to the POST request will be interpreted as input to the actor.

However, I can't find a proper alternative in the Python docs, e.g. in the Usage concepts. There's dataset's push_items(), but I don't think that's it. Is there an equivalent?

Review documentation

Check the documentation for completeness, typos and grammar errors

Write design document

Write a design document for the Apify Python Client in Notion

Replace the usage of `Any` with generic types

Do not use Any type as suggested in the ANN401.

Instead of

from typing import Any

def get_first(container: list[Any]) -> Any:
    return container[0]

use the following

from typing import TypeVar

T = TypeVar('T')

def get_first(container: list[T]) -> T:
    return container[0]

Any can probably still make sense for the *args / **kwargs.

Use ACTOR_XXX env vars which was replaced some APIFY_ACTOR_XXX

Use some of ACTOR_XXX env vars instead of APIFY_ACTOR_XXX

See the list in actor spec

Implement actor-related resource clients

Unify indentation in configuration files

Some of our configuration files currently use 2 space indent and others use 4 space indent.

Let's unify this and use the same indent (2 spaces) for all configuration files (yaml, toml, ini/cfg, ...).

Migrate to Ruff (linter & formatter)

Ruff is a new extremely fast Python linter written in Rust, which supports many rules from the flake8 & pylint world (700+).

They recently released a formatter, where single quotes are an option :).

Implement pre-commit

We should have the linting, type-checking, unit testing and documentation checking in a pre-commit hook, automatically installed when you run make install-dev.

Python Client v1

Write announcement blogpost

Write a nice blogpost announcing the Python Client availability (maybe this should be done only after app supports Docker images with the client preinstalled).

Set up package deployment to PyPI

Set up deployment of the client package to PyPI (possibly automated through GitHub actions)

Add `gracefully` parameter to `abort run` method

See https://github.com/apify/apify-client-js/pull/178/files for reference implementation in the JS client.

Set up documentation building

Set up building of the documentation from docstrings

ideally to markdown
upload the built docs to S3 so they can be shown on the web
possibly through GitHub actions

Implement log-related resource clients

Implement request queue-related resource clients

Python client: Change passing token from parameter into HTTP header

After the API server implements this feature https://github.com/apify/apify-core/issues/2082.
It needs to change how we are passing API token into the HTTP header instead of the query parameter. It will improve the security of API token.

Start using apify-shared for general consts and utils

In apify/apify-shared-python#1 we introduced apify-shared library for sharing general constants and utils across our Python projects. Let's start using it.
- https://github.com/apify/apify-shared-python
- https://pypi.org/project/apify-shared/

Implement request queue v2 methods

Remove underscore prefix from objects that are not private

Currently, we use the underscore prefix in objects that are imported from other modules (e.g. all objects in https://github.com/apify/apify-client-python/blob/master/src/apify_client/_utils.py) - are not private.

This was intended to let users of the library know, that these objects are for internal usage only.

However, this does not correspond to the usage of underscore prefixes in the Python world. We should remove these prefixes from the non-private objects.

We can still use the underscore prefix in the module names, to let users know, this module is only for internal usage and should not be imported by library users.

Cannot use Apify Python Client at Coegil platform

From email conversation:

We are using Coegil platform to run apify-client library and getting this issue.

Pip install passed fine, installing all dependencies.

from apify_client import ApifyClient

ModuleNotFoundError: No module named 'apify_client'

Implement run-related resource clients

Implement build-related resource clients

Implement task-related resource clients

Remove `all` from all `init.py`

Utilizing star imports, such as from apify_client import *, is generally considered as a bad practice in Python. This is because it can lead to namespace conflicts. While there may be specific scenarios where star imports could be useful, I don't see the case in the context of our packages.

I suggest removing them from our codebase so that we don't incentivize users to adopt this practice. Also, we won't have to maintain these lists anymore.

Implement Store API endpoints

Write integration tests

Write integration tests against production / staging API

Implement user-related resource clients

Add API endpoint for validating Actor input

We have an endpoint /acts/ACTOR-ID/validate-input which is not implemented in the client, we should implement it.

It takes the input to validate as POST payload, and optionally a build query parameter to specify the build tag against which to validate.

It returns a response with:

HTTP status 200 and body { "valid": true }
HTTP status 400 and body with the validation error

We should first add it to the documentation, so that we can refer to it in the docstrings. apify/apify-docs#722

Implement dataset-related resource clients

Add actor reboot method to the `RunClient`

Add a .reboot() method to the RunClient class. It will invoke the endpoint located at /v2/actor-runs/{actorRunId}/reboot via a POST request. The endpoint has no parameters.

Save image cache for a client

I want to speed up consecutive calls that use the same actor. There are places on the website that describe a build cache for the docker containers but say they are only available on the API.

Create project structure

Create a shell of the project, including:

setting up a dependency installer (Pipfile or requirements.txt)
setting up linting
setting up a test framework

Catch up to JS client

There were a few changes to the JS client in the past that were not propagated to the Python client, we need to catch up.

We need to add these features:

actor run env vars client (apify/apify-client-js#202)
x-apify-workflow-key header (apify/apify-client-js#212)
schema parameter in get_or_create methods for key-value store & dataset (apify/apify-client-js#233)
origin param for last actor/task run endpoints (apify/apify-client-js#248)
view parameter in dataset items methods (apify/apify-client-js#226)
flatten parameter in dataset items methods (apify/apify-client-js#264)
optional title field in task, schedule, key-value store, dataset, and request queue client methods (apify/apify-client-js#271)

Sanity test

Manually test all the endpoints to verify that they work (or return proper errors), and add automated tests if you find something that should have been tested but is not.

Add .test() method to WebhookClient

See apify/apify-client-js#181 for a reference implementation in the JS client.

Implement webhook-related resource clients

Delete the re-imports in the `consts.py` module from the `apify-shared`

We have introduced apify-shared-python to consolidate general constants for both the Client and SDK.

To facilitate a seamless transition to the new package, we have implemented re-imports with DeprecationWarning in the existing codebase - consts.py.

This is only a temporary state and after a few new releases, we should get rid of it.

Implement schedule-related resource clients

Not possible to add maxItems parameter.

Hello,
Judging from source code, I believe, it is not possible to specify maxItems parameter like this:

https://api.apify.com/v2/acts/actor-name/runs?token=YOUR_TOKEN&maxItems=10

Something like this:
actor.start(run_input={some input}, memory_mbytes=32768, build='latest', timeout_secs=100, max_items=100)

Please correct me if I'm wrong, and if so what's the way of accomplishing this?

Move `apify_client._errors` to `apify_client.errors`

We have some error subclasses like ApifyApiError defined in https://github.com/apify/apify-client-python/blob/master/src/apify_client/_errors.py, with the underscore suggesting it's a private submodule.

We have them documented in the docs, though, suggesting people should use them in their isinstance checks etc, which they should be able to, since the thrown errors should be a part of the public API of a module.

We should move them out of the private _errors submodule to a public errors submodule, to make it clear that these are OK to use by end users.

ListPage should be generic

ListPage should be generic, i.e. allow to specify a type for the data inside items so it's a List[T] instead of just a List

Rename identifiers which shadows a builtin attribute

Rename identifiers flagged with # noqa: A003 for built-in attribute shadowing.

In the case of methods named list, it even breaks mypy in some places. If you use the list type hint in the same module, e.g. here - https://github.com/apify/apify-client-python/blob/master/src/apify_client/clients/resource_clients/actor_collection.py#L50.

Address accessing of non-existing field `_maybe_parsed_body` within `httpx.Response` object

It seems that the following code:

response: httpx.Response = await self.http_client.call(
    url=self._url(f'records/{key}'),
    method='GET',
    params=self._params(),
)

returns a response of type httpx.Response. Later the field _maybe_parsed_body is accessed:

return {
    'value': response._maybe_parsed_body,
    '...': '...',
}

There are 2 occurrences of this:

Check if http_client.call really returns a httpx.Response object and in such case fix the accessing of non-existing field _maybe_parsed_body.

Release version 1.0.0

Since this project has matured quite a bit, and we're launching the Apify SDK for Python soon, and since the 0.7.0 beta has so many changes, many of which are breaking, it would be worth it to change the 0.7.0 version to 1.0.0.

Get max_no_of_posts using insta_posts_scraper

We can't retrieve all posts for a user, it only returns posts from the first page:
Here is our script:


from apify_client import ApifyClient


from apify_client import ApifyClient
apify_client = ApifyClient('token')


actor_call = apify_client.actor('apify/instagram-post-scraper').call(run_input={'username' : ['username'], "limit": 100}, )
### get dataset and posts 
dataset_items = apify_client.dataset(actor_call['defaultDatasetId']).list_items().items

Is there any way work around to fix this?

Migrate to Poetry for packaging and dependency management #156

I believe we can use just poetry instead of pip, virtualenv, setuptools and twine (although, it uses some of them under the hood).

https://python-poetry.org/

Implement client base

Implement the base of the client, including

class structure
http client
base client resource classes
utilities (response parsing etc)

apify / apify-client-python Goto Github PK

apify-client-python's Introduction

Apify API client for Python

Installation

Usage

Quick Start

Features

Automatic parsing and error handling

Retries with exponential backoff

Support for asynchronous usage

Convenience functions and options

apify-client-python's People

Contributors

Stargazers

Watchers

Forkers

apify-client-python's Issues

Recommend Projects

Recommend Topics

Recommend Org