Giter VIP home page Giter VIP logo

sportscardtoollib's Introduction

Hi, I'm Travis 🐸

Pronouns: He/Him

Hi, I am a newly minted Software Engineer! I recently graduated from Columbia University (SEAS) with a degree in Computer Science with a focus on intelligent systems (AI Track).

I was previously at Meta/Facebook for 3 summer internships. Prior to that, I interned at Rockwell Collins (now Collins Aerospace).

I am currently working on mobile (IOS and Android) at Epic Systems. In my spare time, you can find me working on some of my passion projects and open-source libraries. I am always looking for collaborators, opportunities to learn, and chances to create impact. Feel free to shoot me an email or connect on LinkedIn to talk more.

:)


linkedin gmail

My Technologies 🔧


sportscardtoollib's People

Contributors

travisgibbs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sportscardtoollib's Issues

Explore Other Possible Data Oppurtunities

  • Use link between data and bref to gather more info
  • Use PSA's collection to harvest descriptions about groupings/sets
  • Use ebay or point130 for pricing info or to represent cards available in entity view

Process multi-player, team, checklist, and unknown card + Improve BREF Collection and standardize names

Notes for listing parsing:
Error Key words:
“UER” -> uncorrected error
“ERR: No Copyright”
“COR: Copyright”
“CO” -> Coach
“ERR” -> Error
“/POR” concat onto last name
“/66” concat onto last name
“RS” -> Rookie Stars
“RP” - Relief Pitcher
“CL” - Closer
“Grey Backs”, “White Backs” in 1956
“DP” -> Double Print
“LL” -> League Leaders

Other Problems:
Multiple player cards “fn1 ln1/fn2 ln2”
“yellow”/green/blue parallels (after name)
All Star “AS”
“Coaches” in listing to manager card
“Team” or team nickname in listing to team card -> A’s to athletics & Cards to Cardinals
“Checklist” in listing to checklist card
“UMP” umpire card
#book then number “1 Ted Williams” remove first word if number?
Return from bref what name works to override?
“Leaders” in card new cat?
“Rookies” in card
Checklist pattern Checklist 1/Jim Kaat

Player Name:
“Hank Aaron” to “Henry Aaron”

Scrape and Store 2004-2023 and 1880-1940

Currently the master data collection holds 1940-2004 baseball cards. We would like to be able to store more of the cards released in more recent years, but have run into significant storage problems. The amount of unique cards, sets, groups, and variations increases year over year which means currently we are approaching the hard cap on Atlas which is the current database engine.

It would be worthwhile to estimate the cost of completing the data set month over month or considering alternate hosting.

Build Upload Feature

There are a few essential items that we hope users are able to upload to the data server. Users uploading their own collections including relevant information like grading and price would allow for a crowd sourced means of assessing value. There are several sites for shoes that allow for a listing of best buy price (lowest price someone would sell) and best sale price (highest price someone would pay) though I don't view this tool as a market place it could be useful to allow people to assess value of a card even if they don't own it.

That being said we should look for a few hard to source points of information to be sourced from the user when possible:

  • Images (hard to gather for some rare variations and in the abundance of modern cards)
  • Price (130point and ebay are flaky or rate limit respectively making millions of cards difficult to price)
  • Population/Availability (Knowing what sort of cards are being tracked by the tooling could be useful for optimization)

In the long term it would be great to have a robust image collection that could enable a CV model to quickly scan in cards but tracking variations to this degree seems too difficult.

Link Card Variations via identifier

Context: Cards often have several serial, auto, and mem variations while maintaining the same look and listing information.

The idea is that these cards should be linked via some central identifier which would allow for tools like the collection tool to quickly identify a card and then pick a variation to match what is in their collection.

Not sure how to approach this yet but I think it is possible to match parts of the listing information selectively or look for identical listings with differing details by using a hashmap or some kind of encoding.

Improve Name Parsing and Hierarchy of Cards

Cards often contain multiple players making it difficult to identify who is a player and how many players there are. There are a few possibilities to fix this but it draws a bigger question about storing player data along with card data and the overall hierarchy of data. Name could likely be converted in a comma separated string once an accurate parsing method is found.

Add Collections Tool

Collection Tool will allow for offline tracking of a users cards, with the eventual goal of building a user friendly UI and online tracking.

The initial version of this will be a basic class that allows for some streamlined input of cards probably using already developed search functionality. Since there are no universal card numbers (uuids) it will be difficult to quickly read in large collections of cards by single value without further user input. Other alternatives to a search method could include image scanning especially if the image part of the database is flushed out.

Product Scraping

Scraping product details from becket or another source where possible. This would possible include images and price data, but the MVP is pack size in terms of cards and amount of packs along with any other descriptive info. This is a companion to #50.

Project Roadmap

This Issue will serve as a living document representing the current roadmap for SportsCardTool until it is migrated to a more permanent place.

The long term goal of SportsCardTool is to build a suite of tools that provide insight for the sports card community.

Phase 0 Exploration

  • Build scraping tool for sports cards
  • Build API server for scraped data
  • Build basic react app with table for API data

Phase 1 MVP

  • Build scraping tool for products and breaks
  • Link card data with product/release data via identifying key
  • Build release based API endpoint with odds
  • Build new portion of React app to link card content with release data to show odds of chase cards or user’s defined goals
  • Build pipeline for automatic data upload for various data sources
  • Identify potential contributors

Phase 2 Public Release

  • Bug fix and adjust to potential weight

Phase 3 Future Use Cases

  • Build price functionality for cards via 130 point and ebay access points (premium feature due to cost of ebay api)
  • Build pack/box opening simulation tool
  • Add community requested features

Add "release" info scraped from various sources

We would like to host new ents built around units that are purchased at store IE having a new database where an entry might be Bowman Draft and its contents being the various forms it was sold in and the odds for each of those forms. Linking this data with the already rich card data could create a powerful tool that would do more than paywalled sources.

Build Price Scraping Tool

In the future SportsCardTool could become the easiest way of finding and valueing cards. Currently we have stored many cards without any reference to price information, but linking those cards to price information can be easy or difficult depending on rarity and volatility of the card itself. Sources like ebay api or 130 point could be used as was established in earlier releases. Solving this problem or adding some sort of feature would become much easier if we are able to solve #20 and link cards by a common element.

Odds Scraping

Scraping odds where applicable from recent years sets ~2018-2023 should be possible via becket odds pdfs seehere. Using Tabula and bs4 to site crawl year by year release pages should allow for a fairly complete dataset of odds that can be stored on a release basis along with each product (IE hobby, different retail sizes, ETC). From there with a little text comprehension we should be able to link each of the hit odds to different sets in the established card dataset allowing for effective opening simulation given a products details.

Better RC Detection

Technical definitions of what is a rookie card are difficult in themselves, it may be worth trying to create distinct categories to better understand what a rookie card is. The website that we are scraping seems to favor cards with the (RC) logo, but in recent years cards like 1st Bowman or even college cards are accepted as another form of rookie card. In the past these cards don't have these logos in some cases. Additionally the presence of minor league cards complicates this matter.

Ideas for solutions:

  • Build in logic on a per case basis particularly for 1st bowman
  • Build in logic to determine first year of players career and compare to card year to build new category (first year card) or (pre major card)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.