Giter VIP home page Giter VIP logo

word-cluster-matrix's Introduction

Grammar Fields

Grammar Fields are a psychosecurity construct used for analyzing grammar behavior and language usage over time, with a focus on deviations from the norm, the acquisition of new words, the evolution of grammar, and the passage of time in linguistic development.

To analyze language, we collect and organize all the text that a person has written, arranging it chronologically. We then determine the frequency of singularized "go words" (words that are actively used) and arrange them in ascending order. This compilation forms the basis of the "grammar field," which is then examined for changes in patterns and the speed at which these changes occur. The goal is to make it compatible with neural networks to detect how various factors such as influential individuals, trends, cults, artificial intelligence, and advertising campaigns impact language usage.

This presentation about Grammar Fields will help you understand their underlying principles.

Grammar Fields

You can review the slides in this presentation here

Technical Definition

The grammar field is comprised of two axes: Time Grouping (x) and Brevity Rank (y).

Time Grouping is hourly, daily, weekly, or monthly. (Custom grouping coming soon)

Brevity Rank is the number of times unique words are used.

Where x and y intersect is the Intensity. For example, if you have a Brevity Rank of 14 and a Time Grouping for monthly at Month 1 (Janurary), and the number where these two intersect is 163, then Brevity 14 for Janurary has 163 occurances of words being used 14 times.

This is NOT a Natural Language Processing approach. We do not analyze the words or the grammar connections themselves. (Detailed reasoning for this will come later) We are NOT doing sentiment analysis. At a high level, we assume that a lower brevity means an increased likelihood of relying on more intricate grammar while high brevity means repetition of preferred grammar.

For example: if a person uses the words "cat", "jail", and "linger" each 5 times in Janurary, then their Brevity Rank 5 for that Time Grouping will have an Intensity of 3. (They used three unique words five times in Janurary)

Example

The following image highlights these points:

images/example.png

These are the annual grammar fields of Elon Musk between 2015 and 2017

We see a slurring effect happening to his grammar fields after Trump was elected.

We also see that by 2017, his usage of rare words increases significantly since 2015.

Additional forensics will reveal these clusters can help us identify what egregores Elon was influenced by.

Installation

npm install github:HyperCrowd/word-cluster-matrix

Usage

Google Colabs

You can test Grammar Fields and upload custom CSVs right now by clicking on the button below:

Open In Colab

Google Console

Start tinkering with Grammar Fields at the CLI level with Google Console

Open in Cloud Shell

Docker

The Docker implementation creates a data directory with csv and fields folders in it. Each of these folders has a folder for the field modes (absolute, differential, and normalized). In each mode folder is a folder for each time grouping (hourly, daily, weekly, and yearly)

When you add a CSV to the mode and time grouping folder in the csvs directory, the Docker instance will create a field of that CSV file in the corresponding mode and time groupding folders in the fields folder. For example, if you create a CSV in /data/csvs/absolute/daily/test.csv, the Docker instance will create a field output in /data/fields/absolute/daily/test.csv

The Docker instance will poll for new files and changes in the csvs folders every ten seconds.

Please make sure the CSVs you add to the appropriate csvs folder have the first row as headers where the time column has a header named time and the text you wish to analyze has a header of text

To run Grammar Fields in Docker mode, simply run:

./docker.sh

You can test this Docker implementation by clicking here and typing ./docker.sh in the console:

Open in Cloud Shell

Local CLI

For command line processing of CSV, this example will cover most cases. Replace source.csv with the CSV you want, and output.csv with the file name of the grammar field output:

cat source.csv | src/cli.js field.csv

In this example, we utilize every flag:

cat source.csv | src/cli.js -t "timestamp" -x "messages" -b "daily" -m "normalized" field.csv
  • --time, -t: The name of the time column name in the CSV.
  • --text, -x: The name of the text column name in the CSV.
  • --mode, -m: The type of grammar field you want. (absolute, differential, normalized)
  • --breakdown, -b: What time groupings the grammar field is structured as. (hourly, daily, weekly, monthly)

JavaScript

const { GrammarField } = require('grammar-field');

// Possible tweets
const tweets = [
  'what is the deal?',
  'wow, this is not the cringe I was looking for',
  'gotta pay rent or else i will be broke AND without protection from the elements',
  'AI is going to replace my job, but it will never replace my heart uwu'  
]

// JavaScript timestamps
const times = [
  1322751311000,
  1322753344000,
  1322918428000,
  1322918527000,
  1322987632000
]

// Generates a grammar field where the times, the brevity ranks, and values are actual values 
const a = new GrammarField(tweets, times) 

// Generates a grammar field where the times, the brevity ranks, and values are actual values and the x axis label (time) is divided by 1000 
const b = new GrammarField(tweets, times, 'ABSOLUTE', x => x / 1000) 

// Generates a grammar field where the times, the brevity ranks, and values are differentials based on their respective prior values
const c = new GrammarField(tweets, times, 'RELATIVE')

// Generates a grammar field where the times, the brevity ranks, and values are differentials based on their respective prior values and the x axis label (time) is divided by 1000
const d = new GrammarField(tweets, times, 'RELATIVE', x => x + 's')

// Generates a grammar field where there are 100 times, 100 ranks, and 100x100 values, and all values have been normalized to this scale of 100
const e = new GrammarField(tweets, times, 'NORMALIZED')

Examples

Please see the following tests for examples on how to programmatically use Grammar Fields:

  • Grammar fields: How to use the basic features ofa Grammar Field
  • Data loaders: How to load data into a Grammar Field
  • Features: How to perform mathematical analysis on both the Brevity Ranks or the Time Grouping of a Grammar Field.

Contribution

You can edit and toy around with the code very quickly in StackBlitz by clicking on this button:

Open in StackBlitz

word-cluster-matrix's People

Contributors

hypercrowd avatar

Stargazers

Alex Burton avatar  avatar Leo avatar Jurij Jukić avatar  avatar William Schaub avatar  avatar Domenic Datti avatar null data avatar Dan V3r0wski avatar  avatar Josh Field avatar Josh Fairhead avatar Moritz Bierling avatar Łukasz Jachym avatar Jason Butler avatar

Watchers

William Schaub avatar  avatar Cookie Crumbs avatar

word-cluster-matrix's Issues

Fix ends on time windows

The time groupings need to have their respective time window endings represented in the field.

Meaning, if fields are generated based on hour of day usage, then the far left needs to be 00:00 and the far right needs to be 23:59. The way to do this is to make all hour + minutes decimals be positioned based on modulus of the end timestamp.

Consider using tf.stack to simplify aggregate features

When aggregating heatmaps, use tf.stack to get the features of that grouping. Example:

const tf = require('@tensorflow/tfjs-node-gpu');
const { loadImage } = require('canvas');

async function runImageSimilarity() {
  // Load the pre-trained model
  const model = await tf.loadLayersModel('file://path/to/model/model.json');

  // Load grayscale images
  const imagePaths = ['image1.jpg', 'image2.jpg']; // Replace with your image paths
  const images = [];
  for (const path of imagePaths) {
    const img = await loadImage(path);
    const imgTensor = tf.browser.fromPixels(img, 1).toFloat();
    images.push(imgTensor);
  }

  // Convert images to a single tensor
  const inputTensor = tf.stack(images);

  // Extract features from the images
  const features = model.predict(inputTensor);

  // Calculate similarities between images
  const similarityThreshold = 0.8;
  for (let i = 0; i < features.shape[0]; i++) {
    for (let j = i + 1; j < features.shape[0]; j++) {
      const feature1 = features.slice([i, 0], [1, features.shape[1]]);
      const feature2 = features.slice([j, 0], [1, features.shape[1]]);

      const similarityScore = tf.linalg.norm(feature1.sub(feature2));
      if (similarityScore < similarityThreshold) {
        console.log(`Images ${i} and ${j} are similar with a similarity score of ${similarityScore.arraySync()}.`);
      }
    }
  }
}

runImageSimilarity();

Vector similiarity?

Look into ways to group together stemmed go words via vector similarities and populate the field based on that

Get log view

Logarithms exist everywhere, find out if they exist here too lulz

Allow for word filtering

When feeding sentences into a field, allow for conditional filtering or either accept or reject a sentence based on custom conditions involving word presence.

Allow "refreshing focus" mode

We have features with different kinds of aggregate modes being represented.

We should add a "refreshing focus" in which words that are not used after a factor decay in their word frequency over time.

EXAMPLE:

If I use the word "pond", then it counts towards its word frequency cluster. Based on a custom rate decay function, the word frequency count of "pond" should drop (as a constant over time, as a distance of all words, etc)

Incorporate stemming

Stemming will help make sure we are dealing with the same word during unique counts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.