Giter VIP home page Giter VIP logo

gnh1201 / caterpillar Goto Github PK

View Code? Open in Web Editor NEW
15.0 5.0 7.0 2.21 MB

Caterpillar Proxy - The simple web debugging proxy (formerly, php-httpproxy)

Home Page: https://catswords.social/@catswords_oss

License: MIT License

Python 48.41% PHP 18.18% Dockerfile 0.19% Java 9.14% Perl 5.88% Ruby 4.92% JavaScript 5.11% Shell 0.17% HTML 7.99%
ssl network-filtering hijacking http-proxy https-proxy web-debugging-proxy parasitic-computing tls mitm k-anonymity

caterpillar's People

Contributors

gnh1201 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

caterpillar's Issues

SMTP relay

I need to implement an SMTP relay. Caterpillar will receive SMTP requests first and then forward them to a relay server written in PHP. The relay server written in PHP will then send the email to the actual recipient.

I propose the following flow:

SMTP Client (Sender) <- Caterpillar (Python) <-> Relay Server (PHP) <-> Actual recipient

Case Study in Real World: Ambiguities in TCP and HTTP

Summary

(Here's a story from Canada.)

There is an issue in infrastructure management tools (e.g., OpenShift or similar software) where TCP and HTTP are not clearly distinguished. This has led to minor conflicts between infrastructure engineers and developers.

As a result, there is a need for a solution that allows maintaining the business logic while flexibly switching the communication method between TCP (socket) and HTTP.

My Opinion

TCP and HTTP are distinctly different concepts, but since HTTP is the most well-known application protocol running on top of TCP, the mistake of not distinguishing between them often occurs. To better understand this, you can refer to the OSI 7-layer model, which explains this distinction well.

Upon hearing this story, Caterpillar Proxy has decided to work on enabling the endpoint communication method to be switched flexibly between TCP (socket) and HTTP.

Currently, we support existing plugins through the web.py file.

Serial (or Bluetooth) gateway support

Gateways that do not rely on well-known protocols such as WiFi or 3G/LTE provide data transmission only through serial or Bluetooth. For instance, gateways supporting LoRaWAN technology or satellite communication rarely, if ever, directly support Ethernet or WiFi, and even if they do, they are usually expensive.

We need to assess whether we can support serial or Bluetooth in this project.

Add WebAssembly support

Summary

In 2022, I conducted a PoC on executing WASM(WebAssembly) binaries using PHP and WAMR(WebAssembly Micro Runtime) on top of the LAMP stack. As the Caterpillar project is currently underway, it seems feasible to integrate the efforts made during that time into the Caterpillar project.

The Caterpillar project will provide a method for injecting WASM runtimes into shared hosting servers in the future.

Related Links

HTTP Basic Authentication

Some web hosting providers require HTTP Basic Authentication when accessing their subdomains rather than standalone domains.

Moreover, even in cases where it's not necessarily required, there might be potential instances where utilizing HTTP Basic Authentication is advantageous.

Stateful relay

If we implement stateful relay, we can get the effect of improving speed, reliability, and bypass the capacity limit setting of the web server (e.g., max_upload_size) to allow large-capacity transmission.

These proposals are messages that the client (e.g., server.py) will send to the server (e.g., index.php).

Proposal: Stateless relay

This is a case where the client cannot expose the port to the outside, it is the same already implemented.

{
    "jsonrpc": "2.0",
    "method": "relay_request",
    "params": {
        "data": <base64 encoded data>,
        "compressed": <e.g. deflate, none>,    // proposal
        "client": <address of the client>,
        "server": <address of the remote server>,
        "port": <port number of the remote server>,
        "scheme": <scheme (e.g. http, https, ssl, tls),
        "url": <URL>,
        "length": <length of data>,
        "chunksize": <size of buffer (e.g. 8192)>,
        "datetime": <datetime (e.g. %Y-%m-%d %H:%M:%S.%f)>
    },
    "id": 3
}

Proposal: Stateful relay

This is a case where the client can expose the port to the outside, which works similarly to tunneling.

{
    "jsonrpc": "2.0",
    "method": "relay_connect",
    "params": {
        "client": <address of the client>
        "port": <port number of the client>,
        "chunksize": <size of buffer (e.g. 8192)>,
        "datetime": <datetime (e.g. %Y-%m-%d %H:%M:%S.%f)>
    },
    "id": 3
}

Decoupling the fediverse features

The Caterpillar project aims to be a universal web debugging proxy. Currently, Fediverse-related features (such as spam filters) are embedded within the code without being modularized.

There is a need to separate this.

Bypassing HSTS policy

HSTS only applies to software that fulfills all the specifications as a web browser. Therefore, in communications where there is no web browser involved, typical SSL MITM poses no issue.

However, if you intend to use a web browser, HSTS policies can cause inconvenience. Thus, here are some alternatives:

These alternatives are based on the assumption that we won't alter the web browser's settings. Disabling the HSTS feature by adjusting the browser settings can resolve the issue more easily than expected.

  1. Removing HSTS-related headers.
  2. Proxying with an actual web browser.

I'll add more ideas if they come up in the future.

Cache services

This improvement retrieves the requested URL from cache services (such as Wayback Machine, Google Search Cache, Bing Search Cache, etc.) and delivers the response through a proxy.

Alternative methods for bypassing the request body limit

In the Caterpillar Proxy, we support a stateful method of tunneling using network sockets to bypass capacity limit settings in HTTP requests.

However, there's an idea for resolving capacity limits even in stateless mode.

It involves storing the data to be requested in a separate Object Storage. This way, capacity limits can be addressed even in Stateless mode.

ThreadPool

This project did not prioritize ThreadPool implementation as it was focused on Proof of Concept (PoC). ThreadPool implementation is necessary to resolve thread hell when a large number of connections occur.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.