Giter VIP home page Giter VIP logo

firetower's People

Contributors

dcolish avatar dukesam avatar gmcquillan avatar schmichael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

firetower's Issues

Create Form Elements for Modifying Category Meta Data

We store metadata on the backend about each category. Some of this is meant to be modified by the user. We need to provide those form elements so that the following elements can be modified:

  • human_name -- the human readable version of this category. We might consider making this default to the first 30 characters from the signature.
  • threshold -- the ratio of sameness required for a successful match for this category.

Feature: group categories together

It'd be nice to group categories together which are related to the same failure. That way you can view them among the other categories as an aggregate, but also break them up within the group view to see which sub-category has how many errors.

For simplifying what's going on, it seems handy.

Measure Efficacy of a Category

We need some method for determining how internally consistent a category is. My first thought for this effort is to construct an arithmetic mean of all the events within a category when compared against the category signature. Our efficacy could be as simple as looking at the standard deviation of this number.

Ideally, this would be an offline process so that the classification efforts aren't encumbered. Results could be saved within the category_ids key, which already houses most of the other category metadata.

Firetower logo/mockups

Put together some ideas for fancy Firetower logo & page design. (I've got some stuff in the works already, but it's still a little rough - will definitely have something to show you in the next couple days, though.)

archive_feature branch TimeRanges Broken

The latest set of changes to the archive_feature branch seem to be passing a Redis object where we expect a Redis.Connection() instance. At least, that's what I suspect. See below.

I did pass in a Redis.Connection instance directly into the static method, which is what the constructor comments lead me to believe it expects. List output of the categories does include three category objects (which is about average for this level of accuracy).

In [1]: from firetower import category

In [2]: import redis

In [3]: conn = redis.Redis()

In [4]: cats = category.Category.get_all_categories
category.Category.get_all_categories  

In [4]: cats = category.Category.get_all_categories(conn)

In [5]: for cat in cats:
   ...:   cat.timeseries.range(0, -1)
   ...: 
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (31, 0))

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

/Users/gavin/src/firetower/<ipython console> in <module>()

/Users/gavin/src/firetower/firetower/category.pyc in range(self, start, end)
     12 
     13     def range(self, start, end):
---> 14         return self.redis_conn.zrevrangebyscore(
     15             "ts_%s" % self.cat_id, end, start, withscores=True
     16         )

AttributeError: 'Redis' object has no attribute 'zrevrangebyscore'

Flush Old Events from Eventstreams

This is a bit of a controversial proposal, however at some point we need to figure out how to move old events out of redis-server, which requires that the data fit into memory.

We can go two routes with this:

  1. Just remove items older than X, or items older than the latest Y items.
  2. Use those same rules, but instead of deleting them, we serialize to disk.

I'm strongly leaning toward the second procedure, but I'm looking for comments.

Right now, Firetower at UA is taking up about 4G of memory, and generating about as much in logfiles everyday.

Create JSON APIs

Right now the error data is sent to the front end by populating a page template variable - I think it would be more flexible/powerful to be able to make an ajax call that returns the json here. (If nothing else, makes it easier for me - this is how I'm used to pulling in chart data.) I'll take a stab at setting this up, if there are no objections.

Even Idle Firetower Server Spends All Its Time Archiving

I noticed something strange on my larger installation of Firetower (which has about 3380 categories) -- the server is consistently taking up about 16% of a CPU, even when no new events are coming in to be categorized.

This is the strace output for that process:

write(4, "[2012-01-25 20:42] DEBUG: Fireto"..., 108) = 108
sendto(3, "_2\r\n$7\r\nHGETALL\r\n$48\r\ncounter_9d"..., 72, 0, NULL, 0) = 72
recvfrom(3, "_0\r\n", 8192, 0, NULL, NULL) = 4
write(4, "[2012-01-25 20:42] DEBUG: Fireto"..., 108) = 108
sendto(3, "_2\r\n$7\r\nHGETALL\r\n$48\r\ncounter_bb"..., 72, 0, NULL, 0) = 72
recvfrom(3, "*0\r\n", 8192, 0, NULL, NULL) = 4
select(0, NULL, NULL, NULL, {1, 0}^C <unfinished ...>

Clearly, it's spending all its time doing HGETALL calls for the various categories to check and see if they have any data to archive. This is hardcoded to happen every 2 seconds. The problem now is that this process takes more than 2 seconds to complete, so it's pretty much constant.

Two pronged solution:

  • make the default a little longer (say 60 seconds), and configurable.
  • This many categories isn't ideal. Need to address the category dispersion.

Create Multi-Use Client Event Consumer

So far, we have a couple of input methods for inserting items into the firetower queue.

It'd be nice to have some sort of general purpose service to run on client machines which intercepts events emitted by various logging systems and sends them to Firetower.

An alternative view point would be to create specific logging to Firetower options for each language we'd like to use with this. I'm not certain which would be more work, but this problem needs to get solved one way or another.

Fix Up Form Elements For Aggregate and Category Views

Right now the form values at the bottom of the page are just sitting there without any styling or any protective js for values (i.e. to warn the user about crashing her browser).

Feel free to add more items, but the things that could really use help are these:

  1. Visual delineation between view form elements and the rest of the page.
  2. A submit button. It's really distracting to have each letter you type get submitted and needing to wait for the page to reload.
  3. Some input checking to make sure someone doesn't try to select the last several years' worth of minutely data-points, etc. I'm not sure what sane defaults would look like.
  4. Category views should have the same form elements. Right now they only have a subset.

Automatically Increase Default Similarity Threshold If Using (Really) Quick Classification

Right now, we have some automatic reductions in accuracy depending on the size of a signature. We do this by using a faster, but less accurate algorithm. The result is that we maintain performance, but large signatures basically get randomly placed with large categories.

To fix this, I'm proposing that we also increase the default similarity threshold at each step of the size escalation. Default of 0.5 gets turned into 0.75 for quick, and 0.86 for really_quick.

To make sure we're not making things worse, I'd like to have the category efficacy issue completed first.

Display Meta Information

It'd be useful to know a few pieces of metadata about how the system is running:

  1. How much memory is being used by redis
  2. How much CPU time, or current percentage that the firetower-server process(es) use.
  3. Current, max, mean, number of classifications/sec.
  4. Number of categories, categories/hour, categories/day, categories/week.
    ...

More?

Hook up API to return JSON data

Here is a tentative overview of the data structures to be exposed by the firetower api, with corresponding URLs:

Firetower API URLs:

/api/
categories/ - dictionary of categories, indexed by category id, with a dictionary of metadata for each category

categories/timeseries/  -  all categories, last 5 minutes' worth of data for each category

categories/timeseries/<start/end get params>  -  all categories, number of instances returned per category over a specified time range

categories/timeseries/?all=1  -  all categories, all data  (computationally intensive, should not be default behavior)

category/<cat_id>  -  category metadata

category/<cat_id>/timeseries/  -  last 5 minutes' worth of data for specified category

category/<cat_id>/timeseries/<start/end get params>  -  specified category, specified time range

category/<cat_id>/events/  -  raw text of events/errors/tracebacks for specified category

Data to be exposed by API:
(returned as JSON objects)

/api/categories/  -  dictionary of categories, indexed by category id, with a dictionary of metadata for each category

{
    'cat1': { 
        "human_readable": "Name of the Category", 
        "signature": "Full text of most recent event/traceback classified into this category",
        "threshold": Threshold defined for this category (number) 
    }
    'cat2': { 
        "human_readable": "Name of the Category", 
        "signature": "Full text of most recent event/traceback classified into this category",
        "threshold": Threshold defined for this category (number) 
    }
}


/api/categories/timeseries/  -  all categories, last 5 minutes' worth of data for each category

Dictionary with cat id's as keys, listing number of events that were classified in that category over the last 5 minutes, 
indexed by timestamp (currently per second)

5 minutes is an arbitrary amount of time, to be defined as a default (could be 30 minutes, or any other value).


{ 
    'cat1': [ (1, 10), (2, 5), (3, 15) ] 
    'cat2': [ (1, 7), (2, 14), (3, 8) ] 
}

For category 1:

---
cat1: category id
1, 2, 3: timestamps (indexed by second) - will include 5 minutes' worth of data
10, 5, 15: number of instances of errors returned per second for this category


/api/categories/timeseries/<start/end get params>  -  all categories, number of instances returned per category over a specified time range

Dictionary with cat id's as keys, listing number of events that were classified in that category between specified start and end points, 
indexed by timestamp (currently per second) - same format as categories/timeseries/, different time range

If time range specified included 5 seconds:

{ 
    'cat1': [ (1, 10), (2, 5), (3, 15), (4, 20), (5, 2) ] 
    'cat2': [ (1, 7), (2, 14), (3, 8), (4, 7), (5, 12)  ] 
}

For category 1:

---
cat1: category id
1, 2, 3, 4, 5: timestamps (indexed by second)
10, 5, 15, 20, 2: number of instances of errors returned per second for this category


/api/categories/timeseries/?all=1  -  all categories, all data  (computationally intensive, should not be default behavior)

Same format as above, just with way more data. :)


/api/category/<cat_id>  -  dictionary of metadata for a single, specified category

{ 
    "human_readable": "Name of the Category", 
    "signature": "Full text of most recent event/traceback classified into this category",
    "threshold": Threshold defined for this category (number) 
}


/api/category/<cat_id>/timeseries/  -  last 5 minutes' worth of data for specified category

For a single category, specified in url by category id, returns 5 minutes' worth of data as a series of tuples 
of timestamp (currently seconds) and number of instances returned in given category per timestamp.

[ (1, 10), (2, 5), (3, 15)  ]

1, 2, 3: timestamps (indexed by second)
10, 5, 15: number of instances of events in given category returned per second

5 minutes is an arbitrary amount of time, to be defined as a default (could be 30 minutes, or any other value).


/api/category/<cat_id>/timeseries/<start/end get params>  -  number of events returned in specified category over a specified time range

For a single category, specified in url by category id, returns a series of tuples of timestamp (currently seconds) 
and number of instances returned in given category per timestamp, between start and end points as defined by get params.

category x (specified in url with a cat_id)
5pm to 8pm (specified in url with start/end timestamp get params)
^^^ if no start/end time defined, returns some set default time range (e.g. 5min)
how many instances were returned per second

[ (1, 10), (2, 5), (3, 15)  ]

1, 2, 3: timestamps (indexed by second)
10, 5, 15: number of instances of errors of type x returned per second


/api/category/<cat_id>/events/  -  dictionary of raw text of all events/errors/tracebacks for specified category

{
    'cat1': [ 'I am some error text for cat1', 'I am some related error text in the same category' ]
    'cat2': [ 'I am some error text for cat2', 'I am some related error text in the same category' ]
    'cat3': [ 'I am some error text for cat3', 'I am some related error text in the same category' ]
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.