Giter VIP home page Giter VIP logo

Preranathm's Projects

crawler-for-news-website icon crawler-for-news-website

Developed a simple web crawler to measure aspects of a crawl, study the characteristics of the crawl, download web pages from the crawl and gather webpage metadata of C-Span website

database-systems-assignments icon database-systems-assignments

CSCI 585 Assignments. 1. EER Diagram for E-Learn 2. SQL 3. KML - Nearest Neighbors and Convex Hull code 4. Tinkerpop Gremlin 5. Weka, Rapid Miner, Knime tools execution.

deepsentirank icon deepsentirank

Using AlexNet CNN to classify images into one of the classes defined in caffe_classes.py. Images with similar classes can be grouped together and used for Image Similarity Search. To test the model please run testModel.py

imagecat icon imagecat

ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (images,but could be extended to other files) in place, and to extract metadata and OCR information from those files/images using Tika and Tesseract OCR.

img2text icon img2text

Models, and associated helper code for GSOC 2017 project Tensorflow Image to Text in Apache Tika

inverted-index-using-gcp-and-hadoop-cluster icon inverted-index-using-gcp-and-hadoop-cluster

Created an Inverted Index of words occurring in a set of web pages using a subset of 74 files from a total of 408 files (text extracted from HTML tags) derived from the Stanford WebBase project (https://ebiquity.umbc.edu/resource/html/id/351). Placed these files in a bucket on Google cloud storage and ran a Hadoop job to read inputs from this bucket.

polar.usc.edu icon polar.usc.edu

Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California

polarpostprocessing icon polarpostprocessing

This code gets connected to Solr DB created for Sparkler Crawled Data to do further data extraction, classification, filtering and insights generation using various Machine Learning models. The ML models are capable of using keywords list from user, extract features from URL content, and classify (score) output and update Solr parameter accordingly. Apache Sparkler Link: https://github.com/USCDataScience/sparkler

pollapp icon pollapp

Polling App on WindowsPhone OS. Used for Survey purposes. Allows users to post their own questions and also vote for their favourite option for questions posted by others.

search-engine-enhancement icon search-engine-enhancement

Adding Spell Checking, AutoComplete and Snippets functionality to Solr Search Engine. Enhanced Solr program with spelling correction and an autocomplete (suggest) function. Also used an external spelling correction program called Norvig’s spell correction program in conjunction with Solr, to enhance the autocomplete functionality of Solr. Norvig’s spell correction program uses a text file(‘’big.txt”) to get set of words to calculate edit distance. Here I am using Apache Tika for this purpose.

solr-ranking-algos-comparison icon solr-ranking-algos-comparison

Imported a set of pages on Apache Solr and analyzed different ranking Algorithms like Lucene and PageRank. Using Solr to index documents, Tika and TagSoup library to extract text from any kind of HTML found on web. Developed a PHP client which accepts input from the user in HTML form, and sends request to the Solr server. Solr server processes the query and returns results which are parsed by the PHP program and displayed. Changing the ranking algorithm in Solr to PageRank. The app loops through each fetched webpage and extracts outgoing links. Using a mapping file which has web pages mapping to actual urls, filter out the urls not present in the file. Create a network graph with web pages as vertices and links representing an edge between two files using NetworkX Library. Search for a list of keywords and compare the two Algorithms.

sparkler icon sparkler

Spark-Crawler : Evolving Apache Nutch to run on Spark.

tika icon tika

Fork of APACHE TIKA - Specific Customizations for textual content extraction and enrichment

tika-python icon tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

workshop_management icon workshop_management

Allows Faculty to set their own slots and manage their assigned slots for the workshops for various courses in college.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.