Giter VIP home page Giter VIP logo

bitic's Introduction

Markov character ngrams 4. source texts a description of large language models and The Book of Genesis.

live at: https://greggelong.github.io/bitic

A table of next characters is made by

taking the source texts and looking at every four characters and making a table of the next letters that follow those four characters.

  1. Choose your text: Start by selecting a piece of text that you want to analyze and generate new text from. It could be a sentence, a paragraph, or a longer piece of writing. I am using Text from the Wikipedia on Large Language Models and the book of Genesis

  2. Divide the text into n-grams: In this case, we'll use n-grams of four characters. Divide your text into consecutive four-character sequences. For example, if your text is "Hello, how are you?", the four-character n-grams would be "Hell," "ello," "llo,", "lo, ", "o, h", and so on.

  3. Calculate the probability of the next character: For each unique four-character n-gram, calculate the probability of each possible character that follows it. Divide the frequency of each character by the total occurrences of that specific n-gram. This will give you the probability distribution for the next character. But I am just appending all next characters so we don't need to do math here are the next possible characters for the ngram "code"

{code :  [' ', 'd', '.', ',', ' ', 'd', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r', 'r']}
 
  1. Generate new text: To generate new text, start with an initial four-character n-gram. Use the Markov chain to sample the next character based on its probability distribution. Append the chosen character to the generated text and slide the n-gram window by one character. Repeat this process, selecting the next character based on the probability distribution, until you have the desired length of generated text.

  2. Repeat and experiment: You can repeat the generation process with different initial n-grams or adjust the probabilities to get different variations of the generated text.

By following these steps, you can analyze a text into Markov n-grams of four characters and generate new text based on the probability distribution of the next character. This approach allows you to create text that mimics the patterns and style of the original text.

bitic's People

Contributors

greggelong avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.