Giter VIP home page Giter VIP logo

datascope-tools's Introduction

datascope-tools might eventually consist of a collection of scripts, tools, and ideas that we have collected over the years at Datascope Analytics that help with our work. This is somewhat inspired by bit.ly's data_hacks

Scripts

  • pair-with: simple script to manipulate .hg/hgrc files when pairing with other people

  • decrapify: simple script to remove junk files that commonly occur when we're writing code or experimenting with scripts

Functions

  • decorators.cache: While developing scrapers, code that interacts with APIs, and other functions that take a little while to run, I find myself often wanting to cache the results of calls to functions for a little while during development, like this.

    from datascope_tools import decorators
    
    @decorators.cache(expire_after=30*60)
    def v_important():
        time.sleep(42)
        return 'vnice'
  • iterators.itergrams: Iterate over consecutive overalapping tuples in an iterable. Doesn't read the entire list into memory so is useful for BIG DATA.

    from datascope_tools import iterators
    
    data = ['you', 'like-a', 'da', 'juice', 'eh?']
    for ngram in iterators.itergrams(data, n=3):
        print(ngram)
    
    >> ('you', 'like-a', 'da')
    >> ('like-a', 'da', 'juice')
    >> ('da', 'juice', 'eh?')
  • iterators.rank_sorted: A combination of enumerate and sort iterators that also deals with tied ranks. Say there is a game where you get scored and you want to rank people:

    scores = [
        ('alex', 100),
        ('bo', 100),
        ('brian', 100),
        ('dean', 10),
        ('horatio', 90),
        ('irmak', 100),
        ('jess', 100),
        ('karlota', 90),
        ('melba', 90),
        ('meridith', 100),
        ('michael', 100),
        ('mike', 100),
        ('mollie', 100),
        ('vlad', 100),
    ]
    answer = list(rank_sorted(scores, key=operator.itemgetter(1)))
    >> [
        (1, ('alex', 100)),
        (1, ('bo', 100)),
        ...
        (11, ('horatio', 90)),
        (11, ('karlota', 90)),
        (11, ('melba', 90)),
        (14, ('dean', 10)),
    ]

Classes

  • mixins.UniqueKey: Sometimes it's nice to have instances of classes as keys in a dictionary or as members of a set. The UniqueKey mixin sets the __hash__ and __eq__ function so that instances will be hashed using a "unique identifier" attribute.

    class Snowflake(UniqueKey):
        def __init__(self, snowflake_id):
            self.id = snowflake_id
    
    set([Snowflake('a'), Snowflake('a')])
    >> set([Snowflake('a')])

datascope-tools's People

Contributors

stringertheory avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

datascope-tools's Issues

pair-with is busted when %include'ing in .hgrc

Traceback (most recent call last):
File "/usr/local/bin/pair-with", line 70, in
config_parser.read(hgrc_filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 297, in read
self._read(fp, filename)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ConfigParser.py", line 538, in _read
raise e
ConfigParser.ParsingError: File contains parsing errors: /Users/bo/Projects/PG_Transformations/.hg/hgrc
[line 4]: '%include ../.hgrc_global'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.