Giter VIP home page Giter VIP logo

apify-client-python's Introduction

Apify API client for Python

The Apify API Client for Python is the official library to access the Apify API from your Python applications. It provides useful features like automatic retries and convenience functions to improve your experience with the Apify API.

If you want to develop Apify Actors in Python, check out the Apify SDK for Python instead.

Installation

Requires Python 3.8+

You can install the package from its PyPI listing. To do that, simply run pip install apify-client in your terminal.

Usage

For usage instructions, check the documentation on Apify Docs.

Quick Start

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Start an actor and wait for it to finish
actor_call = apify_client.actor('john-doe/my-cool-actor').call()

# Fetch results from the actor's default dataset
dataset_items = apify_client.dataset(actor_call['defaultDatasetId']).list_items().items

Features

Besides greatly simplifying the process of querying the Apify API, the client provides other useful features.

Automatic parsing and error handling

Based on the endpoint, the client automatically extracts the relevant data and returns it in the expected format. Date strings are automatically converted to datetime.datetime objects. For exceptions, we throw an ApifyApiError, which wraps the plain JSON errors returned by API and enriches them with other context for easier debugging.

Retries with exponential backoff

Network communication sometimes fails. The client will automatically retry requests that failed due to a network error, an internal error of the Apify API (HTTP 500+) or rate limit error (HTTP 429). By default, it will retry up to 8 times. First retry will be attempted after ~500ms, second after ~1000ms and so on. You can configure those parameters using the max_retries and min_delay_between_retries_millis options of the ApifyClient constructor.

Support for asynchronous usage

Starting with version 1.0.0, the package offers an asynchronous version of the client, ApifyClientAsync, which allows you to work with the Apify API in an asynchronous way, using the standard async/await syntax.

Convenience functions and options

Some actions can't be performed by the API itself, such as indefinite waiting for an actor run to finish (because of network timeouts). The client provides convenient call() and wait_for_finish() functions that do that. Key-value store records can be retrieved as objects, buffers or streams via the respective options, dataset items can be fetched as individual objects or serialized data and we plan to add better stream support and async iterators.

apify-client-python's People

Contributors

b4nan avatar barjin avatar dragonraid avatar drobnikj avatar equidem avatar fnesveda avatar github-actions[bot] avatar janbuchar avatar jancurn avatar jirimoravcik avatar jkuzz avatar mhamas avatar mtrunkat avatar mvolfik avatar tc-mo avatar tobice avatar valekjo avatar vdusek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apify-client-python's Issues

Creating Actor from Template missing package.json

Running apify create my-first-actor --template python-start

Error: ENOENT: no such file or directory, open \my-first-actor\package.json

You are able manually add package.json and start script to run actor.

Issue Type: LOW

Document how to add input to an actor

I'm trying to find out how to run an actor and specify the input for it. E.g., the LinkedIn Company URL needs something like {"queries": "Tesla\nmicrosoft.com"}

The API docs are clear that whatever data is passed to the POST request will be interpreted as input to the actor.

However, I can't find a proper alternative in the Python docs, e.g. in the Usage concepts. There's dataset's push_items(), but I don't think that's it. Is there an equivalent?

Replace the usage of `Any` with generic types

Do not use Any type as suggested in the ANN401.

Instead of

from typing import Any

def get_first(container: list[Any]) -> Any:
    return container[0]

use the following

from typing import TypeVar

T = TypeVar('T')

def get_first(container: list[T]) -> T:
    return container[0]

Any can probably still make sense for the *args / **kwargs.

Unify indentation in configuration files

Some of our configuration files currently use 2 space indent and others use 4 space indent.

Let's unify this and use the same indent (2 spaces) for all configuration files (yaml, toml, ini/cfg, ...).

Migrate to Ruff (linter & formatter)

Ruff is a new extremely fast Python linter written in Rust, which supports many rules from the flake8 & pylint world (700+).

They recently released a formatter, where single quotes are an option :).

Implement pre-commit

We should have the linting, type-checking, unit testing and documentation checking in a pre-commit hook, automatically installed when you run make install-dev.

Write announcement blogpost

Write a nice blogpost announcing the Python Client availability (maybe this should be done only after app supports Docker images with the client preinstalled).

Set up documentation building

Set up building of the documentation from docstrings

  • ideally to markdown
  • upload the built docs to S3 so they can be shown on the web
  • possibly through GitHub actions

Remove underscore prefix from objects that are not private

Currently, we use the underscore prefix in objects that are imported from other modules (e.g. all objects in https://github.com/apify/apify-client-python/blob/master/src/apify_client/_utils.py) - are not private.

This was intended to let users of the library know, that these objects are for internal usage only.

However, this does not correspond to the usage of underscore prefixes in the Python world. We should remove these prefixes from the non-private objects.

We can still use the underscore prefix in the module names, to let users know, this module is only for internal usage and should not be imported by library users.

Cannot use Apify Python Client at Coegil platform

From email conversation:

We are using Coegil platform to run apify-client library and getting this issue.

Pip install passed fine, installing all dependencies.

from apify_client import ApifyClient

ModuleNotFoundError: No module named 'apify_client'

Remove `__all__` from all `__init__.py`

Utilizing star imports, such as from apify_client import *, is generally considered as a bad practice in Python. This is because it can lead to namespace conflicts. While there may be specific scenarios where star imports could be useful, I don't see the case in the context of our packages.

I suggest removing them from our codebase so that we don't incentivize users to adopt this practice. Also, we won't have to maintain these lists anymore.

Add API endpoint for validating Actor input

We have an endpoint /acts/ACTOR-ID/validate-input which is not implemented in the client, we should implement it.

It takes the input to validate as POST payload, and optionally a build query parameter to specify the build tag against which to validate.

It returns a response with:

  • HTTP status 200 and body { "valid": true }
  • HTTP status 400 and body with the validation error

We should first add it to the documentation, so that we can refer to it in the docstrings. apify/apify-docs#722

Add actor reboot method to the `RunClient`

Add a .reboot() method to the RunClient class. It will invoke the endpoint located at /v2/actor-runs/{actorRunId}/reboot via a POST request. The endpoint has no parameters.

Save image cache for a client

I want to speed up consecutive calls that use the same actor. There are places on the website that describe a build cache for the docker containers but say they are only available on the API.

Create project structure

Create a shell of the project, including:

  • setting up a dependency installer (Pipfile or requirements.txt)
  • setting up linting
  • setting up a test framework

Catch up to JS client

There were a few changes to the JS client in the past that were not propagated to the Python client, we need to catch up.

We need to add these features:

Sanity test

Manually test all the endpoints to verify that they work (or return proper errors), and add automated tests if you find something that should have been tested but is not.

Move `apify_client._errors` to `apify_client.errors`

We have some error subclasses like ApifyApiError defined in https://github.com/apify/apify-client-python/blob/master/src/apify_client/_errors.py, with the underscore suggesting it's a private submodule.

We have them documented in the docs, though, suggesting people should use them in their isinstance checks etc, which they should be able to, since the thrown errors should be a part of the public API of a module.

We should move them out of the private _errors submodule to a public errors submodule, to make it clear that these are OK to use by end users.

ListPage should be generic

ListPage should be generic, i.e. allow to specify a type for the data inside items so it's a List[T] instead of just a List

Address accessing of non-existing field `_maybe_parsed_body` within `httpx.Response` object

It seems that the following code:

response: httpx.Response = await self.http_client.call(
    url=self._url(f'records/{key}'),
    method='GET',
    params=self._params(),
)

returns a response of type httpx.Response. Later the field _maybe_parsed_body is accessed:

return {
    'value': response._maybe_parsed_body,
    '...': '...',
}

There are 2 occurrences of this:

Check if http_client.call really returns a httpx.Response object and in such case fix the accessing of non-existing field _maybe_parsed_body.

Release version 1.0.0

Since this project has matured quite a bit, and we're launching the Apify SDK for Python soon, and since the 0.7.0 beta has so many changes, many of which are breaking, it would be worth it to change the 0.7.0 version to 1.0.0.

Get max_no_of_posts using insta_posts_scraper

We can't retrieve all posts for a user, it only returns posts from the first page:
Here is our script:


from apify_client import ApifyClient


from apify_client import ApifyClient
apify_client = ApifyClient('token')


actor_call = apify_client.actor('apify/instagram-post-scraper').call(run_input={'username' : ['username'], "limit": 100}, )
### get dataset and posts 
dataset_items = apify_client.dataset(actor_call['defaultDatasetId']).list_items().items

Is there any way work around to fix this?

Implement client base

Implement the base of the client, including

  • class structure
  • http client
  • base client resource classes
  • utilities (response parsing etc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.