Giter VIP home page Giter VIP logo

parsons's People

Contributors

andrewrook avatar angloyna avatar austinweisgrau avatar bmos avatar bzupnick avatar chrisc avatar cmc333333 avatar codygordon avatar crayolakat avatar dannyboy15 avatar dependabot[bot] avatar ianrferguson avatar jason94 avatar jburchard avatar kasiahinkson avatar matthewkrausse avatar mkrausse-ggtx avatar mkwoods927 avatar neverett avatar pjsier avatar rgriff23 avatar schuyler1d avatar sharinetmc avatar shaunagm avatar sjwmoveon avatar sorenspicknall avatar talevy42 avatar tomiiwa avatar tonywhittaker avatar ydamit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsons's Issues

APIConnector get_request not passing params

On line 83 of utilities/api_connector.py :
r = self.request(url, 'GET', params=None)

But I think it should be:
r = self.request(url, 'GET', params=params)

Unless it intentionally isn't?

permission denied error on Windows

Users are seeing a "Permission denied" on Windows machines error when running the Redshift.query method.

It looks like there is a bug in how Parsons manages temporary files in Windows.

Parsons used the Python standard library's tempfile.NamedTemporaryFile to create and track temporary files. The documentation says:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

So, the problem comes down to Parsons opening a temporary file, and then attempting to open that same file again later on, which doesn't work on Windows.

VAN: logger error

The error message TypeError: string indices must be integers comes up when using toggle_activist_code() , with the following:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/parsons/ngpvan/people.py in apply_response(self, id, response, id_type, contact_type_id, input_type_id, date_canvassed, result_code_id)
435 logger.info(f'{id_type.upper()} {id} updated.')
436 else:
--> 437 logger.info(f"{r[1]['errors'][0]['code']}: {r[1]['errors'][0]['text']}")
438 raise ValueError(f"{r[1]['errors'][0]['text']}")
439

It looks like an API error from VAN is coming in as a string, but the logger in Parsons assumes it is a dictionary.

Hustle Class

Hustle just released a beta API, and we're eager to be able to get this incorporated into Parsons.

Docs here: https://api.hustle.com/docs/

Categories of endpoints in priority order are:

  • Leads
  • Agents
  • Groups
  • Tags

Create Rock The Vote Class

We have code in the TMC private repo, and can probably easily create a class.

h/t to Gerard at ACRONYM who I originally stole said code from.

VAN apply_activist_code() logger text

When a script uses apply_activist_code a message comes through saying “Method deprecated. Use apply_activist_code() or remove_activist_code().” The AC still gets applied, but it’s a lot of scary red text when it happens for every record.Screen Shot 2019-10-29 at 10.06.59 AM.png

MobileCommons Error Handling

When the credentials are incorrect, the script raises xml.parsers.expat.ExpatError: mismatched tag:. Instead, it should check for a <Response [401]> and raise a more descriptive error like Invalid credentials.

VAN: Approve Score Method

@elyse-weiss - This would be a great contribution for you to work on.

van.approve_scores(score_ids, raise_on_error=False)

Pass in a list of score ids. If they can be approved, then approve. Allow the user to specify if it should fail if it cannot be approved or just to log and return nothing.

Drop temp table if upsert fails

Right now, rs.upsert() creates a table with a timestamp. If the upsert fails, the timestamp table remains. Can we drop it?

VAN Connector

The VAN connector works, but it was the first one that we built and needs some love.

  • Create scaffolding for standardized APIConnector class
  • Create API GET methods that paginate
  • Refactor all relevant VAN methods to use the new GET method.
  • Create API POST method.
  • Refactor all relevant VAN methods to use the new POST method.
  • Create API DELETE methods.
  • Refactor all relevant VAN methods to use the new DELETE method.
  • Create API PATCH methods.
  • Refactor all relevant VAN methods to use the new PATCH method.

to_redshift() - max varchar length issues

I personally have a number of syncs that use to_redshift(if_exists='append') and the call often fails because of varchar length. I am proposing a few changes:

  • for any columns that are varchar, set the column to at lease varchar(100)
  • for other varchar lengths, round up! so varchar(862) would become varchar(1000)

Connection Error in the Person class

Hi there,

I found an error in the People class in the people.py file. I got the following error when trying to use the find_people method: 'VAN' object has no attribute 'post_request'. It looks like this is because it is calling the get_request and post_request methods from the VANConnector class on the VAN class.

I think the simplest fix may be to change the People constructor to:

class People(object):

    def __init__(self, van_object):

        self.connection = van_object.connection

So that self.connection in the People class refers to the VAN connection object rather than the VAN object.

Custom Fields endpoints in Parsons VAN

VAN charges to add custom fields into Pipeline. Meanwhile, custom fields can be crucial to EveryAction use-cases, and loading a value to a custom field on a person record requires not only the field's numeric ID, but also its group's. Rather than requiring a sync's end users to have to find and submit both IDs as parameters, a single call to get all custom field data would allow comparison to a submitted custom field ID (or even custom field name).

Merge Redshift and PostgresCore

The methods are largely overlapping -- given that Redshift is an offshoot of Postgres -- so we should merge the code base, where possible to reduce lines of code.

Redshift integration - "nullas" should be "null as"

The Redshift integration code creates a sql string with the copy function. However, one of the SQL data conversion parameters is incorrect on line 57 of the code.

if nullas: sql += f"nullas {nullas}"

According to the Redshift data conversion parameters documentation, the correct version of this SQL should be "NULL AS " not "NULLAS". Therefore, the Python that is using an f-string to create the SQL query should read as follows:

if nullas: sql += f"null as {nullas}"

@eliotst

MobileCommons Bugs

There are a few bugs I discovered in the existing MobilleCommons class:

  • profiles filters do not seem to work beyond limit - page seems a little useless because there is no order by and pagination is already incorporated.
  • group members filters do not seem to work
  • We will want to add automatic pagination to groups and group members. Pagination exists in profiles, so the code can be lifted from there.

Please additionally test the endpoints that add and delete.

Postgres support

It would be great if you could write tables to postgres servers as well as redshift. Being able to run queries from them and transfer data out of them would be a bonus.

Email validation/parsing

Cleaning signup sheets and other human entered email addresses.

Here's the code I use that may be helpful to y'all. (It's a bit messy.) It uses email-validator and pydash (because I'm sad I don't get to write in node.js).

from email_validator import validate_email, EmailNotValidError, EmailSyntaxError, EmailUndeliverableError
import re
from pydash import predicates
from pydash.strings import trim, reg_exp_replace, clean, deburr
from typing import List
from pydash.collections import every, filter_
from pydash.arrays import flatten_deep


def empty_if_null(value: str) -> str:
    return value if value else ""

def trim_non_printing(value: str) -> str:
    value = trim(value)
    value = reg_exp_replace(value, '[\u202a\u25a0\u00a0\s]+$', '')
    value = reg_exp_replace(value, '^[\u202a\u25a0\u00a0\s]+', '')
    return value

def clean_email_string(value: str) -> str:
    if not predicates.is_string(value):
        return ""
    # lowercase everything
    value = trim_non_printing(clean(deburr(value)))
    # strip spaces in the middle of the address
    value = reg_exp_replace(value, r'\s+', '')
    return value

email_display_name_re = re.compile(r".+\<(?P<email>[^@]+@[^\>]+)\>")

def fix_common_email_problems(value: str) -> str:
    if email_display_name_re.match(value):
        components = email_display_name_re.search(value)
        value = components.group('email')
    value = clean_email_string(value)
    # trim off the start or end: ,  .  :  "  >  <  '
    # then trim whitespace again
    value = trim(trim(value, ',.:"><\''))
    # fix common suffix issues (could do a better job with this though...)
    value = reg_exp_replace(value, r',com$', '.com')
    return value

def clean_emails(email: str) -> List[str]:
    def _clean_emails(email: str, already_fixed: bool) -> List[any]:
        try:
            return [validate_email(email)['email'].lower()]
        except EmailNotValidError as e:
            msg = str(e)
            if 'It must have exactly one @-sign' in msg:
                print(f'try splitting or {email} with {email.count("@")} @ signs')
                for delim in [';', '/', ',', '|']:
                    # if email is split by this delimiter, do we end up with one @ in each set?
                    # if so, split on that delimiter and treat each as their own address in need 
                    # of cleaning.
                    if every(email.split(delim), lambda x: x.count('@') == 1):
                        print(f'the delimiter is {delim}')
                        return list(map(lambda x: _clean_emails(x, False), email.split(delim)))
                print("Can't figure out what delimiter it is so lets just try cleaning in otherways")
            if not already_fixed:
                return _clean_emails(fix_common_email_problems(email), True)
            print(f'Giving up, {email} is probably just a really bad address due to {msg}')
            return []
    results = _clean_emails(email, False)
    return filter_(flatten_deep(results))

VAN POST Bulk Import Method

As the incidence of VAN usage along with other tools increases, the need to create syncs that move data into VAN will as well. The prospect of using the Bulk Import endpoint rather than individual upserts could streamline syncs into VAN and make them more feasible at scale.

MobileCommons - Addl Endpoints

MC API Docs

We'll want to add functions for the following endpoints, listed in priority order:

  • List mConnects
  • Count MConnect Calls
  • List Keywords
  • List Campaign Subscribers
  • List Incoming Messages
  • List Outgoing Messages
  • List Broadcast
  • List Broadcasts
  • List Calls
  • Donation Summary
  • List Web Clicks
  • List Tinyurls
  • Add Tag
  • Remove Tag
  • List Tags
  • Campaign Opted-In
  • Attachments
  • Profile Update Attribute
  • List mData
  • List mData Queries

VAN: validation should be optional for upsert_person

When you pass a record to upsert_person, it validates that "the minimum combination of fields were passed." The VAN API doesn't actually reject records without this info; it will just always create new records if there's insufficient info to match on.

I understand that some users may want this validation to avoid adding duplicates, but others may still want to add new records for people they have sparser information on.

docs: upsert_person doesn't return Parsons Table

The docs for the upsert_person method say it returns a Parsons Table object, but when I used it, it returned a dictionary of the vanid and other info about the person (if matched), or the vanid and a status (if unmatched/new).

Onboarding Process

Hey! I mentioned this on the previous week's webinar but am now just making an Issue about it.

Problem

Parsons has a very important goal of lowering the barrier for people who wouldn't typically contribute to project or consider themselves technical enough to use parsons. I think that if we are trying to dramatically increase the amount of parsons users and contributors, then we need to focus more on the problems people encounter before they even make it to using parsons in their code- things like (speaking from very specific personal experience) how do I set up a development environment?

Proposed solution

I need at least one other collaborator to make a short guide on how to go from a place of knowing very little code (but having a lot of motivation to do something cool) to getting an environment set up and ready to use parsons (as in pip installing and ready to REPL some .py) and then building a small "hello world" type script using a feature (maybe something like splitting a large file into specific chunks, or turning json to csv- ideally something practical that real world people are always trying to do but are actually really straightforward to do in parsons).

I'm more than happy to take the lead on this but I would need help in splitting up this work for it to happen on a realistic timeline and be good enough to be useful.

Redshift: One Value Query

Often times, when we run a query, we don't need a Parsons table, we just need a single value. This would be a convenience method that allows for this.

Airtable.get_records() produces errors when returning zero rows

To reproduce:

from parsons import Table, Airtable

# Assuming all credentials and other data in env vars
at = Airtable()

# Successful call
all_rows = at.get_records()
print(all_rows.num_rows)

# Call with filters that would return zero rows will appear to succeed, but...
no_rows = at.get_records(formula="FIND('SOMETHING UNFINDABLE', {Some Column}) > 0")
# Error kicks in when you try to do anything with the results
print(no_rows.num_rows)

This returns an error ValueError: 'fields' is not in list, which suggests that there may be a problem with the use of unpack_dict() inside of get_records()

Too many dependencies

The current pattern in parsons is to load every submodule into parsons/__init__.py.

This is neither sustainable nor feasible for systems that require smaller upload sizes/memory footprints (e.g. AWS Lambda). The top of that file makes the goal clear:
"Eg. This allows for: from parsons import VAN"
However, it means every install and deploy of parsons requires an ever-increasing number of dependencies.

Proposal:

  • Remove the pre-amble -- or at least stop adding new ones. Admittedly this is backwards incompatible
    • If we must keep backwards incompatibility, then let's create a second lib called parsons_core where we move the sub-directories, and then import them in parsons/ from parsons_core.
  • Start documenting the best practice of importing from parsons_core.nvpvan import VAN
  • Document for low environments that they can install/import with pip install --no-deps parsons -- maintain a requirements-core.txt for cross-source dependencies that can/should stay in core.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.