Giter VIP home page Giter VIP logo

proximity's Introduction

πŸ›° Proximity

Proximity is an event aggregation server that enables you to stay in informed about local happenings while avoiding web bloat. It leverages Selenium over Browser Mob Proxy to collect event data from websites that don't have a public ( or free ) API sufficient to locate events.

  • βœ… Scrape event data from eventbrite.com , meetup.com and allevents.in
  • βœ… Filter event data based on temporal distance, day of the week, provider, keywords, and other filters.
  • βœ… Transform event streams into RSS feeds and iCalendars.

πŸ“¦ Installation

βš™οΈ Configuration

To use Proximity you must define a settings.json.

{
  "eventbrite_api_key" :"<your-eventbrite-api-key>",
  "eventbrite_max_pages": 5,
  "allevents_max_pages": 5,

  "routines" : [],
  "compilers" : [],
  "filters" : []

}

The settings.json can be divided into two parts. The first part is defining policies about scraping data from providers. The second part declares what proximity will do.

The what is determined by 3 lists : routines,compilers, and filters

  • routines : define how/where/when to scrape event data
  • compilers : package found events into another data format
  • filters : define custom event filters

Routines

Routines define what data to collect, where to collect it from, and how often. A routine requires 4 components a name , a delay , providers and a geographical circle defined by a latitude, longitude and radius.

Here is an example routine that scans for events in Philadelphia, PA every hour.

{
  "name" : "Philly scan",
  "radius" : 5.0,
  "latitude" : 39.9526,
  "longitude" : -75.1652,
  "delay" : 3600,
  "meetup" : true,
  "allevents" : true,
  "eventbrite" : false
}

Note : this routine only scans meetup and allevents because eventbrite is explicitly disabled. However, you can simply omit disabled providers in the configuration alltogether.

There are some routine attributes not on display above.

  • "auto" : true Instead of specifying a geolocation you can ask the server to infer based on ip.
    • ⚠️ This is an experimental feature that is not well tested. Explicit coordinates are preferred.
  • "disable" : true You can disable an entire scan which may be useful for troubleshooting.
  • "run_on_restart" : true By default proximity will wait delay seconds to run any configured scan routines, but you can force a routine to run on server startup.

Compilers

Comilers produce export formats for found data like RSS or iCalendar. Compilers are defined by 5 key components, name, type, path, conjunction, and filters.

This compiler creates an RSS feed on the filesystem at ~/feeds/philly-jawns.rss. It uses a disk filter to exclude events outside of the geographical disk.

{
  "name" : "Philly Jawns",
  "type" : "rss",
  "path" : "~/feeds/philly-jawns.rss",
  "conjunction" : true,
  "filters" : [
    {
      "type" : "disk",
      "radius" : 3.0,
      "latitude" : 39.9526,
      "longitude" : -75.1652
    },
  ]
}

Note : the attribute conjunction defines how to logically compose the filters. With a value of false, the filters would be disjunctive, meaning an event need only be true for one of the defined filters for inclusion.

Filters

Proximity ships with a set or primitive filters that can be used as building blocks to create more specific filters. In the filters section of your settings.json you define custom filters that can be referenced in your compilers definitions or used to query the server with ( more on that below ).

For more information on primitive filters see filters

"filters" : [
    {
      "name" : "nearPhillyInPersonOnTheWeekEnd",
      "conjunction" : true,
      "filters" : [
        {
          "type" : "disk",
          "radius" : 3.0,
          "latitude" : 39.9526,
          "longitude" : -75.1652,
        },
        {
          "type" : "inPerson",
          "invert" : false
        },
        {
          "type" : "weekdays",
          "days" : ["friday","saturday","sunday"],
          "invert" : false
        }
      ]
    }
]

This filter ( as the name would imply ) returns events that are near Philadelphia, that are not online, that are on the weekend. Here we use 3 primitive filters disk , inPerson, and weekdays in synthesis to create a custom filter. inPerson and weekdays both have a attribute called invert, when inverted the filter does the opposite of what is expected. An inverted inPerson filter returns only online events. An inverted weekdays filter returns events not with the listed days.


When we want to reference this filter in our compilers definition. We do it by specifying a custom type.

{
    "type" : "custom",
    "name" : "nearPhillyInPersonOnTheWeekEnd"
}

Usage

proximity is distribuited as an executable JAR, while you can invoke the server directly through proximity.jar. It is recommended you use the companion client prxy to manage the proxmity instance.

When you have a valid settings.json, you launch proximity using prxy like below.

prxy --daemon

At this point, proximity will parse your configuration and perform scans and compilations as defined. You can check that your configuration was valid by issuing this command to the client.

prxy --status

If you want to stop or restart the server you can issue these commands.

prxy --kill
prxy --restart

Viewing events from the client

Note : proximity stores event data in a sqlite3 database named app.db so you can interact with that directly if desired.

If you want to check event data interactively you have 2 views at your disposal.

Json view

prxy --json # recommended that you use jq , i.e. prxy --json | jq '.'

will ask proximity for all events in json format.


table view

prxy

will print out an interactive table built with BubbleTea that allows you to filter and sort events. You can navigate directly to the source event on its website from a here.


view flags

You can pass flags to the client to create different views into the event data.

To restrict your view by geography you can use the following options.

prxy --radius <radius> --latitude <latitude> --longitude <longitude> 

or

prxy --routine <routine-name> # use the same location settings as named routine

When using a routine, you can override routine parameters with the explicit radius, longitude, and latitude flags.

When no Location Filter is specified prxy will not filter events by location


custom filters

As mentioned earlier, we can use our custom filters to query data. That is done using the --filter flags. To use the custom filter defined earlier.

prxy --filter nearPhillyInPersonOnTheWeekEnd
# or
prxy --json --filter nearPhillyInPersonOnTheWeekEnd

proximity's People

Contributors

matthewkeville avatar

Watchers

 avatar

proximity's Issues

Meetup Scanner breaks when using multi-word locality

When the meetup scanner is setup against a geocoordinate that maps to a multi-word locality the scan fails.
The page generated by the query string is not found.

β”Œ2023-10-26 14:51:39.690000[ScannerThread] WARN  keville.util.HarUtil - unable to find a valid response for the request url : https://www.meetup.com/find/?location=us--NJ--Wall Township&source=EVEN─
β”‚2023-10-26 14:51:39.691000[ScannerThread] ERROR keville.scanner.EventScannerScheduler - scan failed, type : MEETUP                                                                                  ─
β”‚2023-10-26 14:51:39.691000[ScannerThread] ERROR keville.scanner.EventScannerScheduler - Cannot invoke "com.google.gson.JsonObject.get(String)" because "response" is null     

One Off Scan

As a user that travels to new locations frequently it would be convenient to perform a one off scan without having to assemble a configuration for it in json.

AllEvents scanner doesn't radially scan

Today AllEvents scanner doesn't actually implement a radial scan. It finds events only in the locality it's geocoordinate belongs to. AllEvents scanner should be refactored to find the set of Localities that fall within the scanning region and then scrape against those localities.

Update Protocol

Today, proximity finds event data and assumes that it is accurate forever. proximity should have a protocol that examines the known events to see if they are still up to date.

Eventbrite description is incomplete

The eventbrite api has two related fields for describing events. Today the not deprecated one is used to build events. However many events on there site provide a far more detailed explanation in the deprecated field. Can we integrate that into event description?

AllEvents does not support all localities

AllEvents has event listing only for a limited set of localities. For example, there is a page for Belmar and for Asbury Park but there is no page for Wall Township. The AllEvents scanner should be able to perform a check to determine whether the locality investigated is valid for AllEvents . Today there is no check and the EventScanner errs.

β”Œ2023-10-26 14:51:50.060000[ScannerThread] INFO  keville.providers.AllEvents.AllEventsScanner - targetting https://allevents.in/wall+township-nj/all                                                 ─
β”‚2023-10-26 14:51:51.812000[ScannerThread] WARN  keville.providers.AllEvents.AllEventsScanner - this search only had one page                                                                        ─
β”‚2023-10-26 14:51:53.927000[ScannerThread] INFO  keville.providers.AllEvents.AllEventsHarProcessor - processing : https://allevents.in/pages/404?ref_url=https%3A%2F%2Fallevents.in%2Fwall%2Btownship─
β”‚2023-10-26 14:51:53.935000[ScannerThread] WARN  keville.util.HarUtil - response was not encoded i.e. plain text                                                                                     ─
β”‚2023-10-26 14:51:53.938000[ScannerThread] WARN  keville.providers.AllEvents.AllEventsHarProcessor - unable to locate stub data in JSON-LD with regex                                                ─
β”‚2023-10-26 14:51:53.938000[ScannerThread] WARN  keville.providers.AllEvents.AllEventsHarProcessor - no embedded schema events found                                                                 ─
β”‚2023-10-26 14:51:53.945000[ScannerThread] ERROR keville.scanner.EventScannerScheduler - scan failed, type : ALLEVENTS                                                                               ─
β”‚2023-10-26 14:51:53.945000[ScannerThread] ERROR keville.scanner.EventScannerScheduler - Cannot invoke "com.google.gson.JsonArray.iterator()" because "schemaEvents" is null      

prxy restart

It would be convenient to have prxy --restart instead of executing prxy --kill & prxy --daemon to reload configurations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.