Giter VIP home page Giter VIP logo

goto-eof / europea-library-server Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 14.05 MB

A library web application that allows to index, edit, explore, retrieve information about books from file metadata/web ( by using multi-threading), search, sell/buy through Stripe platform (WIP) and download e-books. This is the back-end project.

Home Page: https://europea-library.eu

Java 99.87% HTML 0.13%
download ebook epub indexer library pdf buy ebooks sell store purchase

europea-library-server's Introduction

                            ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⡿⠏⠻⣿⣿⣷⢀⡀⣾⣿⣿⠟⠙⢿⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣠⣄⣿⣿⣿⣿⣿⣿⣿⣿⣠⣄⣿⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⡿⠏⠻⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠟⠹⢿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣟⣠⣄⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣠⣄⣻⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⡍⠀⢩⣿⣿⣿⣿⣿⣿⣿⣿⡍⠀⢩⣿⣿⣿⣿⣿⣿⣿⡍⠀⢩⣿⣿⣿
                            ⣿⣿⣿⣷⣶⣾⣿⣿⣿⣿⣿⣿⣿⣿⣷⣶⣾⣿⣿⣿⣿⣿⣿⣿⣷⣶⣾⣿⣿⣿
                            ⣿⣿⣿⣿⡿⠟⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠿⠻⢿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⡷⣀⡀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⢀⣀⢾⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⣿⠟⠻⣿⣿⣿⣿⣿⣿⣿⣿⠟⠻⣿⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⣷⣀⣀⣿⣿⣿⠋⠙⣿⣿⣿⣀⣀⣾⣿⣿⣿⣿⣿⣿⣿⣿
                            ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣴⣦⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
     ______                                     __    _ __
    / ____/_  ___________  ____  ___  ____ _   / /   (_) /_  _________ ________  __
   / __/ / / / / ___/ __ \/ __ \/ _ \/ __ `/  / /   / / __ \/ ___/ __ `/ ___/ / / /
  / /___/ /_/ / /  / /_/ / /_/ /  __/ /_/ /  / /___/ / /_/ / /  / /_/ / /  / /_/ /
 /_____/\__,_/_/   \____/ .___/\___/\__,_/  /_____/_/_.___/_/   \__,_/_/   \__, /
                       /_/                                                /____/
                                                                          SERVER

Introduction

Because I have many e-books, one day, I get up and said "Why not create a web application that allows to index, catalogue, search and provide e-book information?". In this way was born Europea Library :)

What is Europea Library (server)

A library web application that allows to index, edit, explore, retrieve information about books from file metadata/web ( by using multi-threading), search, sell/buy through Stripe platform and download e-books. The front-end project can be found here.

Demo

I already bought a domain and deployed the application on my VPS. So that I have a working demo on https://europea-library.eu. For now the Stripe payment features are set to TEST MODE. It means that when you buy an e-book on europea-library.eu it is enough to use as test card like 424242... and some random information in order to purchase the book.

Development status

Take a look also to the development progress status.

How the application works?

The core of the application is the indexer job. It retrieves all the information about files and saves them on the DB. The indexing process consists of file metadata extraction and web metadata retrievement (in particular from Google Books API). On the first run the job it will take some time to index and extract information from files or retrieve them from web. This happens because the file metadata extraction and the web metadata retrievement is expensive in terms of resources even if I implemented a multithreading job. The next job run will take less time, because the metadata extraction was done for all the files (except the cases when the directory contains new e-books). After the job completed all steps, the API becomes available for queries, so that the client application can interact with the API ( otherwise an HTTP 404 status is returned). Moreover, the indexer job starts every night at 00:00 PM (configurable). If the job is already running then it will continue to process files and no other job will run. After the job finished it's work, the e-book database is made available for exploration. In particular, the front-end is able to list the files in different ways and orders, it allows also to filter the results or search for a e-book. Moreover, it is also possible to view e-book information, update it, change the cover image and finally, download the e-book. The file download is customizable, in particular, the administrator can choose if only authenticated users should be able to download files or also not authenticated users have this right. Currently, there are 3 categories of users: ADMINISTRATOR, USER and not authenticated user. ADMINISTRATOR is able to customize the Home Page, to run the job and reload the application cache. While the user is able to download e-books (as I said, this feature is customizable). All the users, authenticated and not authenticated, are able to explore the library, search for an e-book and view book information. Since version 9 (WIP), Europea Library became an e-book store (feature enableable by the administrator). Not it is possible to sell and buy digital books through Stripe payment platform, so that it is possible to pay with different banking circuits.

Features

  • index and catalog large digital books collection quickly;
  • explore e-books by directory, top-rated, featured, top downloaded, just added, category, tags, file extension, language, publisher, published date, top sold;
  • view book information;
  • download e-books;
  • search by title, author, publisher, ISBN and published date;
  • edit e-book information, including change book cover image (only administrator is able to do this);
  • generate e-book URL QR Code;
  • login/change password/register to the system and change password (2 categories of user: ADMINISTRATOR and USER | XSS protection: JWT token + HttpOnly cookie | the TTL of the user session is 24 hours);
  • password reset (reset link sent by e-mail)
  • bulk category/tag/language/publisher name change (only administrator is able to do this);
  • control panel (administration for admin, profile and security for all users)
    • user management page, now the administrator is able to disable user accounts
    • customize home page
      • enable/disable widgets
    • enable/disable protected downloads feature (only authenticated users or not authenticated are able to download e-books)
    • start/stop job
    • reload application cache
  • sell/buy e-books and view transactions (Work In Progress)
  • included actuator for application monitoring and management
  • included Swagger UI for API documentation
  • protection system against XSS attacks
  • show/hide an e-book preventing access to potential buyers (but they are accessible by the owners and by the administrator)
  • Google reCAPTCHA v.3 integration
  • site cloning slowdown feature

Run the project (test environment)

Before running the software as Spring Boot application it is necessary to follow some steps:

  • edit the application.yml in the following way:
    • set the com.andreidodu.europea-library.google.books.api_key to your Google API key
      • Europea Library uses Google Books API to retrieve information about books. This API has daily limits: 1,000 requests/day. To get the API key go to Google Console and create an API key. Remember also to enable Google Books API. The indexer job runs once per day, so that if you have for example 3.000 e-books, it will take about 3 days to retrieve all information about your library.
    • edit the default username, email and password (at the first run the application will store these information on the db):
      default-admin-username: admin
      default-admin-email: [email protected]
      default-admin-password: password
      
    • edit the qr-code-path property in order to allow to generate QR Codes for each e-book;
    • generate certificates for encrypting and decrypting our JWT tokens in src/main/resources/certs by executing this set of commands:
      openssl genpkey -algorithm RSA -out private-key-old.pem && openssl rsa -pubout -in private-key-old.pem -out public-key.pem && openssl pkcs8 -topk8 -inform PEM -outform PEM -in private-key-old.pem -out private-key.pem -nocrypt &&   rm private-key-old.pem
      
      • Now you should have a private-key.pem and a public-key.pem file in src/main/resources/certs
    • start the DBMS from the projects root directory with sudo docker-compose up -d command or create from your PostgreSQL running instance a database named europea_library
    • run the project from your IDE
    • or execute the following command from the root of the project in order to create the jar file
      ./gradlew bootJar
      
    • then create a file called start.sh
      #!/bin/bash
      /bin/java -Dspring.config.location=application.yml -jar europea-library-X.X.X.jar
      
      where X.X.X is the application version
    • make it executable:
      chmod +x start.sh
      
    • run Europea Library jar
      ./start.sh
      

Stripe payments

Test payments in your local environment (Stripe CLI).

stripe listen --forward-to localhost:8081/api/v1/stripe/webhook

API documentation

The API documentation can be accessed here: http://localhost:8081/swagger-ui/index.html

Job steps

Because the core of the application is the job indexer, I am attaching the job schema in which is explained in summary how it works. job_schema

DB schema - Tables

db_schema

DB schema - Views

db_schema_views

Technologies

Java • Spring Boot • Spring Batch • Spring Security • Spring Email Starter • Apache FreeMarker • Spring JPA • Queryds • Hibernate • Feign • Liquibase • PostgreSQL • Swagger (OpenAPI) • Docker • epublib • pdfbox • Google ZXing • Google Books API • Stripe • Apache FreeMarker

More

  • During my tests (in debug mode) I noticed that the job, in order to index and extract metadata from 8.850 files in a single-thread context, takes about 1 hour on a notebook (based on Ubuntu) with Intel i5 (2 core, 2.40GHz) equipped with an SSD. Because I need to index about 100.000 ebooks, I decided to rewrite the job by implementing a multi-thread job processor. On the same notebook I ran the multi-thread job and the result is the following: about ~2 minutes and 45 seconds to index 8.850 files. I also run the job on a set of about 110.000 ebook (~25.000 epub, ~25.000 pdf, ~60.000 other files, with some duplicate files), on i7-10750H (6 core, 2.60GHz) equipped with SSD, and it took 39 minutes to finish the job. Some steps were skipped (like FSI/FMI deleter) because I started the job on an empty database.
  • developed and tested on Linux.
  • if you have any suggestions or found a bug please contact me here

europea-library-server's People

Contributors

goto-eof avatar

Watchers

 avatar  avatar

europea-library-server's Issues

front-end + back-end: Home page

Home page requirements:

  • featured books
  • new ebooks
  • popular downloads
  • title, subtitle, content

Only administrator is able to configure the home page.

performance test

Test application with 1.000.000 e-books, with duplicates, with many nested directories.

Implement email service

To recover the password it is necessary to implement an email service in order to send the unique link.

feature: back-end, admin shall be able to hide/unhide an e-book

Administrator shall bel able to hide manually a e-book. An hidden e-book is not shown in the file explorer interface and no operations are alowed on that e-book. This means that it

  • cannot be seen by potential customers
  • cannot be bought
  • cannot be modified (except by administrator)
  • is not visible (except by administrator)
  • is ignored by the job
  • only bought book can be viewed by the owners
    A hidden e-book has a special icon that allows to understand that is a hidden file.
    The hidden property is associated to the FileMetaInfo entity.

Improve epub meta-data extractor

Currently the epub metadata epub extractor extracts only about 3-4 fields from the epub. The metadata extractor shall extract as much information as possible from epub file (ex. currently Europea does not save the book cover).

Improve pdf meta-data extractor

Currently the strategy retrieves meta-data, but usually these information are not correct. PDF files, unlike epub files usually have incorect metadata. So that find the best solution to retrieve correct data from a pdf file (search in the file content for title, author and publisher with regex?). If the title, the author and publisher are incorect, then the application cannot find the book on the web (google books API) so that PDF files usually will not have a book information record.

remove material ui from the front-end

Remove material ui from the front-end side. Currently material UI is used only for showing the SnackBar, so that use the bootstrap Toast feature.

create an installer

The configuration of Europea Library became a little bit complicated. It is necessary to create an installer that makes easier the installation and the execution. Perhaps the best and fastes solution is to create a shell script that has a set of questions with default answers.

ISBN extractor

Extract ISBN code from epub books and store it on DB.

back-end: user consensus

During the registration phase, the user should accept the web application agreements regarding the privacy protection.

tokens table

Europea Library shall have a tokens table that will allow to invalidate tokens when user clicks on logout. So that Spring Security, in order to authenticate user, should check also that the token in the tokens table is still valid.

Password reset

Allow user to recover the account password by sending an email with a link, that has an expiration and has a pseudo-random string.

front-end + back-end: Edit book cover image

Allow administrator to change the book cover image. There are 2 solutions:

  • allow administrator to upload an image -> means that the server resizes the image
  • allow administrator to modify only the link to image

adapt job flow with the stripe payments

The introduction of stripe payments feature does not allow the job to work propertly. In particular if I remove a file that was already purchased, the file meta info delete step should fail because of ContraintViolationException. So that implement soft deletes [#28] and adapt the job in order to work always, also in the case I remove a file (the meta info entity should be archived and not show in the user interface).

PDF metadata extractor

Currently the application does not extracts metadata from PDF file. Need to load also infromation related to PDF files.

Refactor

indexer Job class names are not very clear -> refactor

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.