nava2 / cs9864-realtime-bluemix Goto Github PK
View Code? Open in Web Editor NEWRepository for CS 9864 project
Repository for CS 9864 project
Users subscribe on "triggers," e.g. Drop by 5%, massive share liquidation.
Two types of triggers is more than enough.
Notifications for a user based on the trigger happening: Make several "available" but since this is PoC, only need one.
Twillio? https://www.twilio.com/
Implement a service that forwards data in "packages" of some time frame to clients that register with the service.
I'm undecided about this service. Leaving it unassigned currently.
Implement a script to unrar -> process -> upload to cloudera.
Finalize the project report 1.
This will be implemented on CSD Infrastructure.
Specs information: CSDEmail.pdf
Des: The system shall process data in provenance from various sources and merge the data, providing services with real time streaming access to combined data. Therefore the system shall support the combination of multiple data sources in real time
BIG DATA: variety, volume, velocity
Rationale: The combination of data sources is important in order for user services to extract better insights. Enabling the system to combine data sources will provide end user with easier access to the data.
Complete the sequence diagrams
Des: End-users shall transparently interact with “client services” through a single API entry-point for interaction with “application”.
q4: The API shall provide simultaneous connection to up to xxx users without any performance impediment.
Rationale: Simplicity in the client API lowers the barrier to entry for client application developers to use the service
This is the service that runs the registrar.
All components are up for debate, this is a suggestion. This should utilize the component in #24.
It should have the following configurable parameters:
All responses must:
Register an end-point with the service
Remove an end-point
Des: The system shall support data from a variety of sources, a means of connecting to various data sources should provided. Through this interface, administrator could add and remove data sources.
q3: The connection/disconnection to data sources should not affect the availability of the system.
Rationale: Having an easy way of deploying API connection services will make it easier to deploy new services that may utilize the data. Since the purpose of the system is to offer Data as a Service, the addition of new data sources should be facilitated.
Using Github's milestone software, create a project plan outlining when parts need to be done.
Services should be broken into sub-issues with a single over-arching issue. "Sub-issues" should follow the 8/80 rule leaning more towards 8h chunks.
All tasks must be aligned with Milestones. I've already "started" milestone 1, which is progress report 1.
@brogly could you please comment with expected deliverable dates discussed last Friday.
The client registrar needs a heartbeat to check on its registered services.
Suggested implementation: On each URL, every X seconds, do a HEAD /
request and if a non-network error or success is returned, then it's available.
Create a gantt chart using the information from #18.
This should be done in a similar style to Microsoft Project.
Des:The system shall have a webservice registry interface that enables users to search for and subscribe to various web services.
q1:The availability of this service shall be congruent with the uptime guarantees of Bluemix of 99.95%.
Rationale: Having a centralized list of currently available makes the system more usable, it makes it easier for users to use the system.
Merge all requirements and changes into the report document.
Using #26, and the libraries #27, #28 and #29 implement a service that:
I recommend completing #27, #28 OR #29 before working on this. Ignore secondary sources until after this issue is completed.
Des: Data in provenance from data/processing services shall persist.
q9: Data entering the system shall be stored quickly an effectively within 100ms of processing.
Rationale: Storing the data instead of simply forwarding will allow for historical data and data processing performance improvement.
Create a client API that has the following routes:
/list
Lists all client services/:id/:rest
This forwards the request to the correct client service if it exists, if it does not exist, it will return 404
or a relevant errorService that will clean up data on a timer (e.g. every day?) removing data that is older than x
time.
This can be modified in cloudant.json.
Des: The system shall provide client services access to real-time, historical, and external feeds upon request. The system shall provide services with data query access and returns data results as large as requested. Clients may ask for assorted variations of data from client services which may require processing
BIG DATA: volume x velocity x variety
q6: The system shall handle sending data responses to query up to 3GB in size. The Data shall be sent at a response speed with latency between 0.5-2 seconds.
Rationale: Larger data sets allow for deeper analysis and better insights and applications, the system shall therefore support sending such large amounts of varied data.
Starting point https://gist.github.com/unbracketed/3380407
Simple webpage: AJAX-based ticker line with news bubbles.
This should be "library" code that we can require
and reuse within services.
All components are up for debate, this is a suggestion.
It should have the following configurable parameters:
All responses must:
Register an end-point with the service
Remove an end-point
Des: Each service shall be built independently of one another. No end-service shall have internal call to other end-service. If services wish to use other services, they shall communicate with the services in a manner similar to end-users.
q10: For maintainability purpose, the dependence upon service shall be explicit and external. By having this dependence clearly indicated, the registry can control the availability of the system effectively.
Rationale: Having little dependence on specific instances of other services gives strong fault tolerance, scalability, and throughput when the system is balanced.
Currently, the stock server kills an endpoint at any error. This is not correct, Bluemix sometimes gives 503
errors that are fixed by resending.
Tasks:
ECONNREFUSED
) kill the EndPoint
, never the serverthreshold
errors happen in a row, kill the end pointDes: The system will support sending continuous data streams so that End-user clients may receive continuous data from client services. The system shall support the streaming of data in provenance of one or more data service.
BIG DATA: velocity, volume, variety
q7: The system shall use a stream-processing engine with a latency of 0.5 – 2.0 seconds to process data in real-time data. . It is necessary to make sure our system runs quickly, a sluggish system may have a negative effect on a financial domain or client satisfaction. So, a response time should be within 20ms.
Rationale: Services may be interested in plotting data in real time or having some real time analytical goal, therefore having the ability of sending continuous data without the need for constant request is important.
Team to do a short presentation on SOA and hosting services on a cloud
SOA - Exerpt.doc
Use these the Issue IDs for the requirement IDs.
Des: The system shall have an administrator interface to enable real time addition/removal of services. The system shall therefore maintain an up to date view of currently available services.
q2:The deployment and removal of client services should not cause any service interruption.
Rationale: Having the ability to change which services are available is very helpful. If a service is misbehaving, this will enable administrator to handle the issue without disruption.
This is the service that registers client services. This will be very similar to #25.
All components are up for debate, this is a suggestion. This should utilize the component in #24.
It should have the following configurable parameters:
All responses must:
Register an end-point with the service
Remove an end-point
express
cf
This will read from the service implemented in #23. It will have a registry of end points to cast against.
Notes to consider:
POST
HTTP callsImplementation specific?
All responses must:
Des: The system shall support sending push notifications to end-user upon subscription. The user shall support the setup of trigger and have the ability to monitor those triggering event.
q8: Notifications shall be sent within a processing time frame of less than 1 second
Rationale: The system shall have the ability to send notification to the user when an event of interest is detected. Having the ability to communicate with the user without the user initiating communication is helpful in maintaining performance.
Create a service to obtain yahoo financial news data
Which is which?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.