Giter VIP home page Giter VIP logo

ctypesgen's Introduction

ctypesgen (pypdfium2-team fork)

ctypesgen is a ctypes wrapper generator for Python.

This is a fork with the objective to better suit the needs of pypdfium2, and address some of the technical debt and (in our opinion) design issues that have accumulated due to highly conservative maintenance.

Here are some notes on our development intents:

  • We do not mind API-breaking changes at this time.
  • We endeavor to use plain ctypes as much as possible and keep the template lean.
  • For now, we only envisage to work with ctypesgen's higher-level parts. The parser backend may be out of our scope.

System Dependencies

ctypesgen depends on the presence of an external C pre-processor, by default gcc or clang, as available. Alternatively, you may specify a custom pre-processor command using the --cpp option (e.g. --cpp "clang -E" to always use clang).

Tips & Tricks

  • If you have multiple libraries that are supposed to interoperate with shared symbols, first create bindings to any shared headers and then use the -m / --link-modules option on dependants. (Otherwise, you'd create duplicate symbols that are formally different types, with need to cast between them.) If the module is not installed separately, you may prefix the module name with . for a relative import, and share boilerplate code using --no-embed-templates. Relative modules will be expected to be present in the output directory at compile time. Note, this strategy can also be used to bind to same-library headers separately; however, you'll need to resolve the dependency tree on your own.
  • Extra include search paths can be provided using the -I option or by setting $CPATH/$C_INCLUDE_PATH. You could use this to add a header spoofing an external symbol via typedef void* SYMBOL; (c_void_p) that may be provided by a third-party binding at runtime.
  • If building with --no-macro-guards and you encounter broken macros, you may use --symbol-rules (see below) or replace them manually. This can be necessary on C constructs like #define NAN (0.0f / 0.0f) that don't play well with python. In particular, you are likely to run into this with --all-headers.

Notes on symbol inclusion

  • ctypesgen works with the following symbol rules:
    • yes: The symbol is eagerly included.
    • if_needed: The symbol is included if other included symbols depend on it (e.g. a type used in a function signature).
    • never: The symbol is always excluded, and implicitly all its dependants.
  • Roughly speaking, symbols from caller-given headers get assigned the include rule yes, and any others if_needed. When building with --all-headers, all symbols default to yes regardless of their origin.
  • --no-macros sets the include rule of all macro objects to never.
  • Finally, the --symbol-rules option is applied, which can be used to assign symbol rules by regex fullmatch expressions, providing callers with powerful means of control over symbol inclusion.
  • To filter out excess symbols, you'll usually want to use if_needed rather than never to avoid accidental exclusion of dependants. Use never only where this side effect is actually wanted, e.g. to exclude a broken symbol.

Binding against the Python API

cat >"overrides.py" <<END
import ctypes

class PyTypeObject (ctypes.Structure): pass
class PyObject (ctypes.Structure): pass

def POINTER(obj):
    if obj is PyObject: return ctypes.py_object
    return ctypes.POINTER(obj)
END

ctypesgen -l python --dllclass pythonapi --system-headers python3.X/Python.h --all-headers -m .overrides --linkage-anchor . -o ctypes_python.py

substituting 3.X with your system's python version.

Small test:

import sys
from ctypes import *
from ctypes_python import *

# Get a string from a Python C API function
v = Py_GetVersion()
v = cast(v, c_char_p).value.decode("utf-8")
print(v)
print(v == sys.version)  # True

# Convert back and forth between Native vs. C view of an object
class Test:
    def __init__(self, a):
        self.a = a

t = Test(a=123)
tc_ptr = cast(id(t), POINTER(PyObject_))
tc = tc_ptr.contents
print(tc.ob_refcnt)  # 1
Py_IncRef(t)
print(tc.ob_refcnt)  # 2 (incremented)
Py_DecRef(t)
print(tc.ob_refcnt)  # 1 (decremented)
t_back = cast(tc_ptr, py_object).value
print(t_back.a)
print(tc.ob_refcnt)  # 2 (new reference from t_back)

It should yield something like

3.11.6 (main, Oct  3 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)]
True
1
2
1
123
2

Known Limitations

ctypes

  • Rare calling conventions other than cdecl or stdcall are not supported.
  • Non-primitive return types in callbacks are not supported. An affected prototype wouldn't allow for the creation of a function instance, but not break the output as a whole.

pypdfium2-ctypesgen

  • The DLL class is assumed to be CDLL, otherwise it needs to be given by the caller. We do not support mixed calling conventions, because it does not match the API layer of ctypes.
  • We do not support binding to multiple binaries in the same output file. Instead, you'll want to create separate output files sharing the loader template, and possibly use module linking, as described above.

ctypesgen

  • ctypesgen's parser was originally written for C99. Support for later standards (C11 etc.) is probably incomplete.
  • The conflicting names resolver is largely untested, in particular the handling of dependants. Please report success or failure.
  • Linked modules are naively prioritized in dependency resolver and conflicting names handler, i.e. intentional overrides are ignored. The position of includes is not honored; ctypesgen always imports linked modules at top level.

Fork rationale

Trying to get through changes upstream is tedious, with unclear outcome, and often not applicable due to mismatched intents (e.g. regarding backwards compatibility). Also consider that isolating commits in separate branches is not feasible anymore as merge conflicts arise (e.g. due to code cleanups and interfering changes).

Contrast this to a fork, which allows us to keep focused and effect improvements quickly, so as to invest developer time rationally.

However, we would be glad if our work could eventually be merged back upstream once the change set has matured, if upstream can arrange themselves with the radical changes. See ctypesgen#195 for discussion.

Syncing with upstream

  • First, sync the fork's master branch using GitHub's web interface.
  • View changes on GitHub's compare page.
  • Pull and merge locally, then push the result.

Last time we had to do this, git merge origin/master -Xours did a good job. Changes to files we haven't really modified can usually just be pulled in as-is. Otherwise, you'll have to manually look through the changes and pick what you consider worthwhile on a case by case basis.

Note, it is important to verify the resulting merge commit for correctness - automatic merge strategies might produce mistakes!

Bugs

Oversights or unintentional breakage can happen at times. Feel free to file a bug report if you think a change introduces logical issues. However, please note our response policy below.

Contributions

We may accept contributions, but only if our code quality expectations are met.

Policy:

  • We may not respond to your issue or PR.
  • We may close an issue or PR without much feedback.
  • We may lock discussions or contributions if our attention is getting DDOSed.
  • We may not provide much usage support.

ctypesgen's People

Contributors

mara004 avatar olsonse avatar nilason avatar davidjamesca avatar fgrie avatar alan-r avatar kanzure avatar dependabot[bot] avatar djandries avatar echoix avatar kolanich avatar thecodeartist avatar greenbender avatar raminou avatar scorp08 avatar

Stargazers

 avatar

Watchers

 avatar

ctypesgen's Issues

Position-precise imports of linked modules?

Consider passing -dI to the C pre-processor to preserve include statements, in order to later honor the position when translating to an import of a linked module, intending to address the following known limitation (quote from Readme):

Linked modules are naively prioritized in dependency resolver and conflicting names handler, i.e. intentional overrides are ignored. The position of includes is not honored; ctypesgen always imports linked modules at top level.

Ideas

  • Improved support for binding to python API.
  • Inline-define preamble helpers separately and embed on demand; make c_ptrdiff_t a dependency node and include only as needed
  • Add opt-in autostrings helper with configurable codec, see draft
  • Merge preamble helpers and loader into a single file (to simplify printer) ? -> problem: headers-only build does not need libraryloader
  • Restore multilib support through CLI groups, see draft
  • Consider replacing --all-headers with --deepen [N]. Behave like --all-headers if the flag is specified without value, but allow e.g for --deepen 1 to eagerly add members from one nesting only, and not the whole include tree (which might be the most common case to currently use --all-headers).
    Discourage full-depth inclusion (as this tends to spam the bindings and pull in problematic macros) -- instead suggest explicit --symbol-rules yes=....
    (Fun fact: --symbol-rules yes=.+ should lead to the same result as --all-headers)
  • Add second CLI entrypoint allowing to bind to each input header individually, as a higher-level layer around current capabilities, i.e. automatic header dependency tree resolution and module linking (within one library). Advantages: namespace separation, closer to original C library.
  • Add PYI printer; allow specifying multiple printers at once to avoid repeating common pipeline steps

Fork overview, and thoughts to improve basis for chance of upstreaming

See below for an overview of this fork. Note, this writeup is a non-exhaustive work in progress.

This information may be valuable for working towards a basis that could be merged back into upstream at some point, though this seems fairly hypothetical for the near term, given time constraints, and mismatched design intents (e.g. relating to backwards compatibility).

However, this fork of ctypesgen may be a good starting point for any active future development, with a significantly overhauled code base that should be nicer to work with.

Selection of improvements from this fork

  • Removal of bloated old string classes that scream technical debt.
    Enforcement of explicit string encoding/decoding. (We might want to add back implicit string handling as opt-in in the future, see below.)
    Note, the old string classes are incompatible with some python releases of the 3.7/3.8 branches.
    See also pypdfium2-team/pypdfium2#76, ctypesgen#77, python/cpython#16799, ctypesgen#177
  • Bloated old library loader replaced with new lean library loader that is more explicit/controllable.
    See also ctypesgen#176, and 569dc4b for some oversights/peculiarities in the old library loader.
    Resolve . to the module directory, not the caller's CWD. Don't add compile libdirs to runtime.
  • Preventing the assignment of invalid/non-existent struct fields by correction of __slots__ declaration. This fix should be fairly easy for upstream to pick. See also ctypesgen#183
  • Implemented relative imports with --link-modules, and library handle sharing with --no-embed-preamble, Removed incorrect POINTER override. This properly fixes ctypesgen#86 (shared headers), and allows to divide bindings to a library in multiple outputs (e.g. translate each header to a separate python file).
  • More powerful/flexible means of control over symbol inclusion via --symbol-rules.
  • Pre-processor auto-detection and significant improvements to call style (see 7559e81).
  • Removed questionable UNCHECKED wrapper from preamble.
  • Do not bypass c_void_p -> int auto-conversion (see Readme or commit for background).
  • Propagate exception if no output members were found. (Previously would have been a warning, but the if-check was defunct.)
  • New style-related printer options that allow to disable symbol if-guards1 and macro guards.
  • Proper newline concept for the python printer, see a538742.
  • Free library handles after use, to allow for in-session deletion of DLLs. This allows to activate a formerly skipped test case on Windows.
  • Internal code cleanups and test suite improvements.

small, self-contained fixes have usually been submitted upstream and may have been merged

Points to consider

  • Restoring implicit UTF-8 string encoding/decoding as optional?

    ctypesgen originally did implicit UTF-8 encoding/decoding of in/out strings.
    While that tends to be bad practice and callers had better handle strings explicitly instead, it would seem reasonable to retain an optional backward compatibility layer for existing callers.
    I also imagine it might be convenient for a library that consistently uses UTF-8 for everything.

    Adding the old string classes back is certainly not an option for us. However, it may be possible to create a lean replacement. See ctypesgen#177 for a suggestion (copy below), or ctypesgen#77 (comment) for an alternative draft by @olsonse.
    Note that in/out must be handled in a single class.

  • The windows-specific stdcall convention

    Our fork lost it for simplicity while rewriting the library loader. It should be fairly easy to add back, just wondering how to test (as this lies beyond our use case), and how to integrate it nicely.

    Does the calling convention really have to be decided on function level with two library handles for cdecl/stdcall, or would it be sufficient to decide at library level, with a single handle? Is there any example of a single library actually exporting functions with different calling conventions?
    Note that the ctypes API is designed around deciding at library handle level, not at function level, which suggests the expected use case is a library ABI with homogeneous calling convention.

    Possible resolution: Added an option to take a caller-given dll class. It requires a small user interaction and does not support mixed calling conventions, but seems like a nice bloatless way to support a pure stdcall binary.

  • Removal of support for multiple libraries in one bindings file

    This feature was a significant complexity burden in some code areas, including pollution around symbols in printer/output code. For now we decided to remove it - callers can use --no-embed-preamble and --link-modules to create separate bindings files. This also encourages individual/explicit rather than unified loader config.

    However, see ctypesgen#86 (comment) for some interesting considerations regarding a possible cleaner re-implementation.

Other notes

  • Shifts in design intent: We would prefer to stick with plain ctypes as much as possible and avoid cluttering the bindings with custom wrappers.

  • CLI: We changed the command-line interface from action=append to action=extend and nargs=+/*. This implied switching headers from positional to flag argument to avoid confusion/interference with flags that take multiple arguments. There are more CLI changes not listed here, see diff for details.

Done tasks

  • Restored test suite usability by adapting to fork changes.
  • Restored macro guards as opt-out

Footnotes

  1. Note, this is meant for use with inherently ABI correct packaging only โ†ฉ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.