Giter VIP home page Giter VIP logo

castlelemongrab / parlance Goto Github PK

View Code? Open in Web Editor NEW
68.0 68.0 9.0 214 KB

A minimum-dependency ECMAScript client library and CLI tool for Parler – a "free speech" social network that accepts real money to buy "influence" points to boost organic non-advertising content

License: Other

JavaScript 100.00%
data-science datascience datascraping disinformation es7 hatespeech javascript law-enforcement misinformation node nodejs osint parlance parler social-media social-networks speech twitter

parlance's People

Contributors

castle-lemongrab avatar dependabot[bot] avatar milesmcc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parlance's Issues

Set-cookie headers on response

Hello,

Sorry for having to contact you this way, but i have been stuck on this for a while. I'm creating my own implementation based on making calls from a nodejs server. I'm running it on https://localhost:3000. While providing the jst and mst from the frontend to the backend the requests go through without any problems. That is, until the jst expires. I never get any set-cookies header back in the response headers. Honestly I feel like i'm missing something really fundamental. I would really appreciate it if you had an answer for this. I will try and see if I can help with a few open issues.

Thanks for reading.

Add reparenting/UUID expansion options, raw output option

Code is in master that expands UUID references to posts (i.e. root/parent in an echo context), creators, and links – among other things.

Provide command-line arguments that can enable/disable this expansion. It's useful in most cases, but some cases (e.g. posts for a fixed user), expansion of the creator profile just bloats the output.

Finally, add an option to just dump the raw JSON API output, without any transformation or reduction whatsoever.

Be Node v8+ compatible (was: Credentials / init keeps failing)

Is it possible to provide an example of how to connect parlance to Parler?

I’ve tried supplying both URI encoded and decoded values directly to the —mst and —jst options as well as putting them into the auth.json file. I’ve tried including entire cookies as well as just the content/value strings within the cookies. Not sure what else to try.

Also, I am a bit confused why it would want both a -c option that presumably points to the json file with the mst and jst, while also requiring the same data as —mst and —jst sub-options to -c.

Investigate and fix "Received zero-length result; stopping" issue in paged API endpoints

Parler appears to have made some changes since moving – or, alternatively, managed to add some bugs in the process. It's hard to know for sure. In any case, Parlance is not functioning properly at this time.

Figure out why and fix it, so that Parlance users are once again able to extract, research, and examine some of the worst content humanity has to offer.

Make it easier to implement a subcommand

It'd be nice to modularize the codebase such that the major functions required to implement a paged-fetch subcommand were all in one place. Right now, they're in multiple places – src/arguments.js, src/cli.js, and two functions in src/client.js. This could be cleaned up.

Balance this with the possibility that the API changes over time and becomes less generic.

Add a simple end-to-end testing framework

Parler is an erratically-moving target. In the face of this reality, it'd be good have a simple end-to-end testing framework. The purpose of these tests would be to ensure all major CLI subcommands still work prior to release, and to detect regressions caused by changes to Parler's squirrel-infested API.

Add restart-at-key option, allow number of retries to be specified

As a follow-up to #38 and as suggested via Twitter: add a CLI option to control the number of retries for server-side errors, and add a new client and CLI option to force Parlance to start at a specific time-series key.

The former will allow the user to increase runtime resilience to server-side errors/flakiness; the latter will provide users with a way to start or restart paged queries at any point in time – whether to recover from a previous failure, or to skip over more recent results.

New Authentication

Hi!

Wondered if you're aware of any changes to Parler recently that would effect the cookie-based authentication process.

There no longer appears to be MST or JST tokens, just PHPSESID and PV* cookies. Is there potential for this codebase to make use of these instead, or does this change require a different strategy?

Cheers!

Paging: harden against infinite loops

For many sequences, it's likely necessary to determine if a key is monotonically increasing or decreasing to avoid infinite loops secondary to concurrent server-side modifications or server API bugs. Add a way to do this. This'll likely involve – at a minimum for the current endpoints – the ability to parse CouchDB ISO 8601 timestamps.

Implement command-line option parsing

Currently, command-line argument parsing is rather... baroque. Find a good option parsing library and use it to process command-line arguments and generate usage information.

Feature Request: Comment on post given post Id

This would be a great feature and would work much the same as write but with a post Id as an input.

I guess it would look something very roughly like this...

cli.js

  case 'comment':
    if (args.i) {
      await this._ensure_post_exists(client, args.i);
      await client.write_comment(args.i, args.t);
    } else if (args.r) {
      /* something goes here if you want to handle reply comments on comments?*/
     /* await client.write_reply_comment(args.r, args.t); */
    }
    break;

client.js

  async write_comment (_id, _text) {

    let url = 'v1/post';

    let body = {
      body: _text,
      parent: null, links: [], state: 4 /* does parent = post Id?) */
    }

Add support for fetching "echoes"

Currently, this tool doesn't appear to capture "echoes" (Parler's equivalent of retweets). Figure out how to do this, and what changes may be required to either (i) fetch them along with posts, or (ii) fetch them as a separate command/operation.

Add support for result filtering

This tool should support, at a minimum, hashtag-based filtering and regular expression filtering based upon post/comment/feed content. This should allow for more efficient detection of illegal libel, threats, and content that stands in violation of Title VI of the Civil Rights Act of 1964.

Some accounts do not return full paged results

There's a bug in the paging code or a misunderstanding of the undocumented API. Some account fetch tasks (ask for details) are terminating early, falsely believing there are no more results when there are in fact more. Figure out why and fix it.

Add strong typing and a type checker

At some point this code should have strong types and a type checker added to it. Marking low priority for now; if anyone sets up CI and a build process in the future, that might be a good time to do this.

Provide a command to fetch posts and echoes simultaneously

The API returns posts and echoes (postRefs) alongside each other in two parallel subarrays. The frontend code then appears to merge-join these two arrays together based upon timestamp. Provide a command that merge-joins these two arrays together based upon each post or echo's CouchDB ISO 8601 timestamp.

This issue, like #9, will require parsing those timestamps. Find a library to do... the thing, and it'll make both of these issues actionable.

Changing Page Limits Does Not Work

When trying to change the page limit rate I get the following message

❯ /usr/local/Cellar/node/15.2.1/bin/parlance followers -u --no-delay -p 100 -g 100
[warn] You are responsible for deciding if this is allowed
[warn] The authors bear no responsibility for your actions
[fatal] You have been warned; refusing to continue as-isjjjk

Investigate ratelimiting errors

It's unclear whose fault this is, but Parler appears to be rejecting requests that comply with their ratelimiting headers. I don't know if this is them being dishonest or an issue in this codebase's ratelimiting implementation. The random delay default effectively counters this – however, we should investigate the issue and determine what's going on. Parler's frontend now reports version 1.4.5.

Regression in `init` subcommand: IOH.Base does not implement write_file

Heyhey, noob here. I wonder why I get error 127:

My command, in /usr/local/lib/node_modules/@castlelemongrab/parlance (there is no config folder there btw):

parlance init --mst "$mst" --jst "$jst" -o config/auth.json

(node:11054) UnhandledPromiseRejectionWarning: Error: Process exited with status 127

at IOH.fatal (/usr/local/lib/node_modules/@castlelemongrab/parlance/node_modules/@castlelemongrab/ioh/src/ioh.js:57:11)
at Session.write_credentials (/usr/local/lib/node_modules/@castlelemongrab/parlance/src/session.js:84:18)

(Use node --trace-warnings ... to show where the warning was created)
(node:11054) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag --unhandled-rejections=strict (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:11054) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

HTTP 502 from Parler while fetching tags

I continually receive the following error if I try to pull a bigger size of parlays (e.g. #thegreatawakening).. does anyone know if this is easy to fix? I'm relatively new to all this. Thank you!

(node:12854) UnhandledPromiseRejectionWarning: StatusError: Bad Gateway
    at ClientRequest.<anonymous> (/usr/local/lib/node_modules/@castlelemongrab/parlance/node_modules/bent/src/nodejs.js:133:23)
    at Object.onceWrapper (events.js:422:26)
    at ClientRequest.emit (events.js:315:20)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (_http_client.js:634:27)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:117:17)
    at TLSSocket.socketOnData (_http_client.js:503:22)
    at TLSSocket.emit (events.js:315:20)
    at addChunk (_stream_readable.js:302:12)
    at readableAddChunk (_stream_readable.js:278:9)
    at TLSSocket.Readable.push (_stream_readable.js:217:10)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:12854) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:12854) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Implement full support for post echoes

Support for fetching feed data and feed echo data landed in 5427ff2, but the latter appears to be incomplete – either including inappropriate data or excluding relevant data. Investigate this issue and correct it if necessary. Twelve units. Unacceptable.

—confirm-page-size not accepted

When I set -p to anything (even just 11) I get the warning about —confirm-page-size followed by a fatal error even when using —confirm-page-size

Refactor console I/O

If console I/O is wrapped in a thin abstraction, this library will work in a web browser (the bent HTTPS library has both a Node and browser backend implementation). Do... the thing.

Provide initial developer-mode browser compatibility

Web browsers are ubiquitous and convenient, and there's nothing preventing someone from implementing a browser-specific version of the Arguments class and packaging this for in-browser use. The bent HTTPS client library already has a browser backend in addition to the Node backend.

Add support for session credential rekeying

If any Set-Cookie headers are sent in response to a request that contain an MST or JST, update the credential store automatically. Provide an option to write these new values out to a configuration file (not necessarily the same one the application was started with). Handle urlencoded MST and JST tokens in configuration files more gracefully (this isn't possible in general, but we can attempt to use decodeURIComponent on failure, try again if we get a different string back, and print a warning).

Call for contributions and feature requests

If you have any ideas, small or large, for how this codebase could be improved, please comment in this thread or file a new issue and reference it here so it can be added to the kanban board. No idea is too small or too big. If we're going to do free speech, let's go all the way.

Add JSONL support

As part of multi-format output support, Parlance should support JSONL, also known as "not outputting those commas and square brackets". This is the preferred format for feeding large search/query engines, and will save people time who do not want to spend their time chopping off commas and omitting square brackets.

Use this as an opportunity to set up sane flags/options for multiple output backends, as this one is near-trivial.

Add user follow/unfollow

In addition to the post and delete API endpoints, we should support programmatically following and unfollowing users.

Dramatically improve test coverage

This codebase currently has no tests. A castle with no unit tests, integration tests, or functional tests is in... unacceptable condition. Split the main client out into separate files, pick a testing framework, and at least get some basic unit tests added, with coverage metrics (N.B. should probably just use nyc and istanbul here).

Support CSV-formatted output via -f using dotted column names

We've discussed this a bit, and it seems the consensus is that we should use an existing json2csv module to handle JSON to CSV conversion prior to output. This would also provide us some user-accessible mechanisms (i.e. options and/or configuration files) to alter the CSV output (i.e. by pushing options down to the JSON/CSV converter).

We've identified a suitable library on NPM with minimal dependencies and excellent test coverage.

Clean up HTTP header generation, provide request-minimization option

Currently, this repository of free speech faithfully implements the Parler API per the frontend specification, all the way down to the production of appropriate Referrer: headers. This often results in extra network requests. In some cases or for some studies of the API, this may be undesirable.

Clean up the _create_extra_headers function, and provide an option to be less strictly API-conformant in order to minimize request count / server load.

Save json output to file

I was just wondering if there is a command to save the response from Parler to a given file, instead of printing data to console?

Add support for ZSON JSONB PostgreSQL Output

Instead of writing JSON files, it'd be great if this utility could persist ZSON JSONB data directly to a PostgreSQL database. This is straightforward, would be extremely resilient to result schema changes, and would provide for some relatively powerful downstream search, indexing, and filtering options.

Feature request: Ability to specify a date to scrape back from

Currently, when fetching Parlies based on hashtag / user, the results are returned from that moment in time, backwards. It would be very helpful, for research purposes, if a user would be able to specify the date to crawl back from.

For example, if one is investigating activity around elections time on #maga hashtag, right now it would take days of scraping to get to election time data, as this is a high activity hashtag.

I am not sure if this is easy to do or not, although the fact that system message seems to include some date is a positive sign (e.g. the startkey part of Fetching v1/post/hashtag?tag=maga&limit=10&startkey=2020-11-25T21%3A24%3A27.365Z_250881603 )

Fully implement ratelimiting

In addition to the already-present delay logic, the library should use the Session object and X-Ratelimit headers to ensure that requests are never issued in violation of ratelimiting guidelines.

Add support for raw JSON output

Splitting this off from issue #25. This may be more useful as a debug output channel (e.g. stderr or a separate file). Rethink the interface a bit and then implement.

Figure out why CouchDB is accepting invalid comment identifiers when fetching post comments

On both our local simulated API server and as evident from browser debug output, the Parler API is returning an empty result set instead of an error when fetching comments for an invalid post identifier. Figure out what's going on here, if it's an issue or could potentially deny service, and either work around it or detect the condition and block it.

Note that Parler's security contact information page is itself broken, so I am unable to report potential issues. They simply do not have a legitimate information security contact available. The page doesn't even render properly – it's bright red, because someone can't write HTML competently.

Authorization error on Mac OS

Literally my first issue, so may be a user issue.

Running Mac OS 11.1, Node v. 14.15.3, npm 6.14.9, parlance v. 1.1210.1. Getting a similar error as in Issue #22.

When I installed Parlance, it installed to /usr/local/lib/node_modules/@castlemongrab/parlance. It did not create a config folder, but I created one manually. Also manually created a cookies folder, into which I put the jst and mst files.

Doing "parlance init -o" with the various jst/mst files seems to work ("New file has been written to disk"), but when I "parlance profile" I get:

[fatal] Unable to read authorization data from config/auth.json

Any thoughts on what I'm doing wrong?

Modularize JSON output engine

There should be a properly modular way to introduce new output formats in addition to the current JSON array on standard output. Formats such as JSONL, a directory full of JSON files, PostgreSQL and/or other database output support (see issue #23) could be incredibly useful.

In addition, these output emitters could apply transforms for compatibility with other tools (e.g. visualization and/or data processing tools) on the way out. Do this, but avoid file-based runtime module loading for portability.

feature to return "discover" page results?

hello,

is there, or will there be, a feature to return posts as seen on the "discover" page here: https://parler.com/discover ?

i've been using selenium to scrape the posts in the "Parleys" section - as they seem representative of the site.

i'm assuming parlance feed returns my own feed - which is empty because I refuse to engage with anyone on Parler.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.