Giter VIP home page Giter VIP logo

urdu-poetry-generation-using-generative-modeling's Introduction

Roman Urdu-poetry-generation-using-generative-modeling

This repository contains a project on Poetry Generation in Roman Urdu using n-gram language modeling. The project utilizes the spaCy pipeline for text processing and aims to generate a ghazal consisting of seven stanzas, each containing two verses.

Introduction

The task of this project is to generate poetry by training n-gram models on a poetry corpus. The generated poetry will follow the structure of a ghazal, with each stanza consisting of two verses. The length of each verse should be between 6 to 8 words. The poetry corpus can be augmented by scraping online sources to obtain a better representation of Urdu poetry. The models can be trained on either Roman Urdu or Urdu text, with the option to convert the generated version into Roman Urdu using the implementation from Assignment 1.

Assignment Task

The task involves generating a ghazal using different models. The implementation will generate one verse at a time until all stanzas have been generated. The following algorithm can be used to solve the poetry generation problem:

Load the Poetry Corpus

Tokenize the corpus to obtain a list of words Generate n-gram models (bigram and trigram) For each stanza: For each verse: Generate a random number between 6 and 8 (inclusive) to determine the verse length Select the first word intelligently Select subsequent words until the end of the verse, using the most probable next word based on the chosen word [bonus] Attempt to rhyme the last words of the stanza with the last word of the first stanza Print the verse Print an empty line after each stanza Implementation Challenges The challenges of this assignment include selecting subsequent words intelligently after choosing the first word of the verse. To predict the next word, it is essential to compute the most probable next word among all the possible options. This can be achieved by using a Conditional Frequency Distribution (CFD) that provides the likelihood of each possible outcome given a condition. Additionally, rhyming the generated verses can be a challenge, requiring the construction of a rhyming dictionary.

Standard n-gram Models

The project utilizes the Conditional Frequency Distribution (CFD) approach to develop both the Bigram model and Trigram model. The first word of each line is randomly selected from the starting words in the vocabulary, and the bigram model is used to generate the next word until the verse is complete. The same steps are followed for the trigram model, and the results of both n-gram models are compared. [bonus] Additionally, the possibility of making the sonnet rhyme is explored by building a pronunciation dictionary using the most probable rhyming endings.

Backward Bigram Model

In some cases, words may be better predicted from their right context rather than their left context. To address this, a Backward Bigram model is implemented, which models the generation of a sentence from right to left. The Backward Bigram model is created by modifying the Bigram model to change the modeling direction. The results of the backward bigram model are compared with previous implementations.

Bidirectional Bigram Model

The Bidirectional Bigram model combines the forward and backward models to generate output. Both the Backward Bigram model and Bidirectional Bigram model take the same input and produce the same style of output as the Bigram model. The output of the Bidirectional Bigram model is compared with the Trigram model.

Usage

Clone the repository:

Install the required dependencies mentioned in the project documentation.

Prepare the poetry corpus: This can involve collecting existing poetry datasets and scraping online sources for additional data.

Tokenize the corpus: Use the provided scripts or modules to tokenize the corpus and obtain a list of words.

Generate n-gram models: Implement the algorithms for generating the Bigram and Trigram models using the tokenized corpus.

Run the project: Execute the main script or launch the application to generate the ghazal.

Explore the generated poetry: The project will print the ghazal, consisting of seven stanzas with two verses each, following the specified constraints.

Customize and enhance:

Modify the codebase to experiment with different models, incorporate additional datasets, or improve the poetry generation algorithms as desired.

urdu-poetry-generation-using-generative-modeling's People

Contributors

fisa712 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.