Giter VIP home page Giter VIP logo

chart-recognizer's Introduction

A Specialized Image Model For Financial Charts

chart-recognizer banner


Supported versions License Code style: black

chart-recognizer is an image model specifically trained to recognize financial charts from social media sources. It's designed to recognize if an image posted on social media such as Twitter, is a financial chart or something else.

Introduction

Social media users post a lot of useful financial information, including their predictions of financial assets. However, it is often hard to distinguish if the images that they post also contain useful information. This model was developed to fill this gap, to recognize if an image is a financial chart.

I use this model in combination with my two other projects FinTwit-bot and FinTwitBERT to track market sentiment accross Twitter.

Table of Contents

Datasets

chart-recognizer has been trained on three of my datasets. So far I have not found another image dataset about financial charts. The datasets that have been used to train these models are as follows:

I have implemented two approaches to train the model using these datasets. One, where the model loads the images in memory however this does not work for more than 10k images on 48GB of RAM. The second method unpacks all the downloaded images which does not put as much strain on the user's RAM however, this approach demands some extra storage.

Model Details

The model is finetuned from Timm's efficientnet and has an accuracy of 97.8% on the test set.

Model Results

These are the latest results on the 10% test set.

  • Accuracy: 97.8
  • F1-score: 96.9

Installation

# Clone this repository
git clone https://github.com/StephanAkkerman/chart-recognizer
# Install required packages
pip install -r requirements.txt

Usage

The model can be found on Huggingface. It can be used together with the transformers library.

import timm
import torch
from PIL import Image
from timm.data import resolve_data_config, create_transform

# Load and set model to eval mode
model = timm.create_model("hf_hub:StephanAkkerman/chart-recognizer", pretrained=True)
model.eval()

# Create transform and get labels
transform = create_transform(**resolve_data_config(model.pretrained_cfg, model=model))
labels = model.pretrained_cfg["label_names"]

# Load and preprocess image
image = Image.open("img/examples/tweet_example.png").convert("RGB")
x = transform(image).unsqueeze(0)

# Get model output and apply softmax
probabilities = torch.nn.functional.softmax(model(x)[0], dim=0)

# Map probabilities to labels
output = {label: prob.item() for label, prob in zip(labels, probabilities)}

# Print the predicted probabilities
print(output)

Citation

If you use chart-recognizer in your research, please cite as follows:

@misc{chart-recognizer,
  author = {Stephan Akkerman},
  title = {chart-recognizer: A Specialized Image Model for Financial Charts},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/StephanAkkerman/chart-recognizer}}
}

Contributing

Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. We appreciate your help in improving this project.

License

This project is licensed under the GPL-3.0 License. See the LICENSE file for details.

chart-recognizer's People

Contributors

imgbotapp avatar stephanakkerman avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

nahid320

chart-recognizer's Issues

Add deletion / addition of new images in downloaded-images

If the images on the hub get updated the local images should change as well

  • Get complete dataset from HF
  • Compare image ids with the images that are currently saved
    • If an image with an id exist skip it
    • If an image with an id does not yet exist, create it
    • If there an image with an id that does not exist in the HF dataset, delete it

Perform analysis on test set

So we know which images are difficult to classify, add a notebook that covers:

  • The images from the test set that failed
  • Others?

Keep track of prior model results using a results.json file

We track the results with prior parameters so we can see the changes in performance when changing parameters. Maybe we can also add a config.json for changing the settings of the model.

The parameters we should track are:

  • Timm model name

  • Batch size

  • Datablock parameters

  • Datasets used

  • Number of images used

  • Validation percentage

  • Metrics results on val and test set

  • Loss function

  • Callbacks

  • Epoch

  • Learning rate

  • Confusion matrix results (TP, TN, FP, FN)

  • Add config.json for changing parameters

  • Write model output + used parameters to results.json

Add test set for analysis

We use all data now for validation and training, we should save some data so we can do analysis to see what images are hard to classify

Change id of image to hash

Now it is based on nothing, using a hash will make it easier to perform #8
This hash can then also be used to find duplicates

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.