Giter VIP home page Giter VIP logo

better-read's Introduction

better-read

Step-by-Step approach

  • Start with the UI/ UX
    => at least for Big Data apps one should know what data is actually needed by the user and the relationship between the data tables is not the main focus resp. the driven factor
  • Design the system around the UI/ UX.
    • Quality Goals:
      Being high performant and reliable although using large amount of data (> several million entries)
    • Techstack:
      Spring Boot, Spring MVC (Web - server generated HTML pages), Spring Security, Spring Data Cassandra, Apache Cassandra (NoSQL)

Apache Cassandra

Basics/ Concepts

  • Links:

  • Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability (horitonal/ scale-out) and high availability (running on multiple nodes) without compromising performance.
    => linear scalability applies essentially indefinitely. This capability has become one of Cassandra’s key strengths (Want more power? Add more nodes!).

  • Although being a NoSQL database, Cassandra is not a key-value store or a document-based DB. Cassandra is a transactional DB that supports full-typed schemas including tables, rows and columns.

  • "Distributed" means that Cassandra can run on multiple machines while appearing to users as a unified whole. It´s possible to run Cassandra on a single node, but this would not be really useful and should only be done for dev/ test purposes.

  • Since it is a distributed database, Cassandra can (and usually does) have multiple nodes. A node represents a single instance of Cassandra. These nodes communicate with one another through a protocol called gossip, which is a process of computer peer-to-peer communication.

  • Cassandra has a masterless architecture.

  • In Cassandra, the data itself is automatically distributed, with (positive) performance consequences. It accomplishes this using partitions.

  • One piece of data can be replicated to multiple replica nodes, ensuring reliability and fault tolerance. Cassandra supports the notion of a replication factor (RF), which describes how many copies of your data should exist in the database.

  • When developing/ designing the data schema/ structures for Cassandra, one if often approaching the task by thinking about the queries first, so having the app usage in mind (after a rough first data structure design, e.g. an ERM) and optimize the data structures for that.
    => e.g. fetch something by id, name, whatever when browsing (prio 1), follow up queries by related lookup using the result of the first query,...
    => e.g. normalization is not of any importance here, so it´s perfectly fine to violate normalization rules like adding properties from other entities directly in the tables and not use FK for that...

  • The primary keys of the entities may be used as partition keys. In this case single record partitions are used. Single record partitions make sense for cases, where one is always fetching the data of one particular record.
    When the use case actually fetches multiple records, one should try to store them in one partition (=> multi record partition), because the partitions are stored on different nodes (=> performance).
    => partitions := distributed data buckets (on different nodes)
    => one need to watch out for partitions that might get too big. A couple of hundred or even thousand records might be still ok, but more than that should be split in order to provide a reasonable performance.
    This can easily be done by adding another partition key/ col like a time related col (year, month,...), a category or group info,...
    => having not too many different partitions on one hand and partitions that are not too big on the other hand is the tradeoff one need to do in order to have a good performance

  • The order of the entries are actually already taken into consideration in Cassandra when saving the data (compared to SQL DBs where this is determined at read/ runtime). A clustering column is used for this purpose.

CQL - The Cassandra Query Language

  • CQL offers a model similar to SQL. The data is stored in tables containing rows of columns.

  • The tables itself are organized in keyspaces that are similar to the concept of schema in SQL.

  • Commands:

    • describe keyspaces; - show all available keyspaces
    • describe <keyspace>; - show a particular keyspace and its config in form of a CREATE KEYSPACE statement.
    • use <keyspace>; - use/ connect to a particularkeyspace
    • describe tables; - show all available tables in the used keyspace

DataStax Astra - Multi-cloud DBaaS built on Apache Cassandra

  • https://www.datastax.com/
  • https://astra.datastax.com/
  • Multi-cloud DBaaS built on Apache Cassandra.
  • Free tier available with currently up to 80GB storage and/or 20 million operations monthly.
  • Data APIs via REST and GraphQL APIs.
  • Deployed on major public cloud provier of own choice (AWS, Azure, Google Cloud).

better-read's People

Contributors

dev-roth avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.