Giter VIP home page Giter VIP logo

commons-csv-clj's Introduction

Manipulate CSV raw-data to produce Ledger history

As a way to tackle the manual manipulation of excell tables, every month, let’s automate this process.

Overview

The creditor-debtor problem

A table is given to us, which, among the raw-data given, there are:

  • CNPJ (unique identifiers of debtors)
  • Credit used (negative flux)
    • Date in which it was used.
    • Expire date to pay the loan.

    Also, we have the data on the revenues:

  • CNPJ (unique identifiers of debtors)
  • Current amount collected (positive flux)

    We wish to organize what are the first loans that should be payed, so to better alleviate our clients formal debt.

Solution

  • First, given a CNPJ, we have to create a list of Credit use, ordered by increasing date of expiry.
  • Second, liquidate the values listed in the CNPJ’s credit use.
  • Generate a list of covered loans (payable).
  • Generate a list of to-pay loans, if the list is not totally covered.
  • Write a CSV file containing
    1. CNPJ, Payable loans
    2. CNPJ, To-pay loans
    3. CNPJ, Cover status

How to use it

Requirements

  1. OpenJDK
  2. Clojure
  3. Leiningen must be installed on your system.

Install

In the place you want to install the program,

git clone https://github.com/BuddhiLW/commons-csv-clj.git

Nativate to the local repository

cd commons-csv-clj

Use

There are two raw-data: billings (installments) and revenues

lein run "path-to-revenus.csv" "path-to-billings.csv"

Result

Navigate to the data folder

cd commons-csv-clj/data/

Look at the ledger.csv.

What libraries we are using:

:dependencies [[org.clojure/clojure "1.10.1"]
               [org.clojure/data.csv "1.0.0"]]

Specifically, we make use of clojure.java.io, which is part or org.clojure/clojure. This library makes possible opening, writing and copying data in files.

Also, org.clojure/data.csv parses csv files to acceptable I/O format in Clojure.

Some details to keep in mind

In the code, we infer that any path is taken in relation to the root folder of the project. So, equivalent to parting from:

cd commons-csv-clj/

So, reading =”data/Faturamento.csv”=

Means:

# (buddhilw '(~/PP/Clojure/csv-ledger)) λ tree
CSV-Ledger
.
├── CHANGELOG.md
├── data
│   ├── revenues.csv
│   └── installments.csv
(...)

Examples of use of CSV and I/O libraries in Clojure

Template file

In an example file,

(ns csv-ledger.example
  (:require [clojure.data.csv :as csv]
            [clojure.java.io :as io])
  (:gen-class))

Combining the libraries to look what’s inside a file

We can see the data inside a csv file by,

(csv/read-csv (io/reader "data/Faturamento.csv"))

Resuming the output for brevity’s sake,

CNPJFATURAMENTO
10042870001903187548654
13180230001011635212632
30220330001203376759793

(…)

Taking only a part of it (column names)

Let’s define a name for this input, raw-data,

(def raw-data (csv/read-csv (io/reader "data/revenues.csv"))) 
#'csv-ledger.example/raw-data

We can take the first row, by

(first (csv/read-csv (io/reader "data/revenues.csv")))

Out:

CNPJFATURAMENTO

Or,

(first raw-data)

Out:

CNPJFATURAMENTO

Also,

(first (csv/read-csv (io/reader "data/installments.csv")))

Out:

CODCLIENTECPFCNPJCLINOMECLIVENCIMENTONDOCCODOPERACAOVL_FACEATRASOJUROSMORAMULTATOTAL_RECEBIDOA_RECEBERTPBAIXAVL_DCPVL_DESVL_DESCONTO

TODO/NEXT/DONE

TODO:

NEXT NEXT:

DONE:

  • Convert csv data into a vector of maps (key-value pairs).
  • Filter the vector of maps, selecting a certain CNPJ key-value pair.
  • Add the revenue values of a given CNPJ.
  • Compare with the summed value with the credit-debt.
  • Filter those who the sum-credit-debt is greater than the sum-revenue value
  • Calculate which credit-debt(s) should have preference being paid.
  • Output the credit-debt(s) payable and how much is left-to-pay.

Reconsidering

  • Create a new map with the summed-revenue value.

Development

Interesting reads:

Preparring CSV (Opening/Closing)

(ns csv-ledger.core
  (:require [clojure.data.csv :as csv]
            [clojure.java.io :as io])
  (:gen-class))

(defn -main
  "Call on <<Faturamento.csv>> and <<VALORES_VIORC.csv>>;
  Return \"ledger.csv\", which informs the situation of the balance."
  [spenditure-table raw-table]
  (println "Hello, World!"))

(def opened-revenue (csv/read-csv (io/reader "data/Faturamento.csv")))
(def header-revenue (first opened-revenue))
(def raw-revenue (drop 1 opened-revenue))

;; The lazy way
(defn read-column [filename column-index]
  (with-open [reader (io/reader filename)]
    (let [data (csv/read-csv reader)]
      (map #(nth % column-index) data))))

Managing row operations

Example - Sum fixed collon

(defn sum-second-column [filename]
  (->> (read-column filename 1)
       (drop 1) ;; Drop header column
       (map #(Double/parseDouble %))
       (reduce + 0)))
(sum-second-column "data/Faturamento.csv")

Take 10 revenues

Take 10 first values of revenues

(take 10 (map #(nth % 1) raw-revenue))
3187163533762492272838661393113331392757

Transform raw-data into map

Index values with dictionaries (taken from https://github.com/clojure/data.csv/)

NOTE: This transform is meant to be done on the opened-revenue symbol.

(defn csv-data->maps [csv-data]
  (map zipmap
       (->> (first csv-data) ;; First row is the header
            (map keyword) ;; Drop if you want string keys instead
            repeat)
	  (rest csv-data)))

Transform in a vector-of-maps

Transform csv-data into a vector of maps

(def vec-revenues (vec (csv-data->maps opened-revenue)))

Separate the unique-values of CNPJ

Take the unique values (e.g., create a set)

(def unique-CNPJ (set (map :CNPJ (csv-data->maps opened-revenue))))

Define a boolean function on subclasses

The following function submap? takes a key-pair and a map and returns true if this key pair is contrained in the map.

(defn submap? [a b]
"return true or false"
  (every? (fn [[k v]] (= v (b k)))  a))

Test with filter

If we now take a filter on the vec-revenues, for a given :CNPJ key-pair.

(filter #(submap? {:CNPJ (first unique-CNPJ)} %) vec-revenues)

RESULT:

:CNPJ17194123000103:FATURAMENTO1247:468626

or,

csv-ledger.core> (filter #(submap? {:CNPJ (first unique-CNPJ)} %) vec-revenues)
=> ({:CNPJ "17194123000103", :FATURAMENTO "1247", : "468626"}

Consolidate the process with a Function

Creating a function which take a CNPJ-value and filter the list for us

(defn cnpj#
  "take a =CNPJ-value= and narrow down to the subclass which has this CNPJ"
  [CNPJ-value vec-revenues]
  (filter #(submap? {:CNPJ CNPJ-value} %) vec-revenues))
Example
(cnpj# "17194123000103" vec-revenues)
;; ({:CNPJ "17194123000103", :FATURAMENTO "1247", : "468626"}) 
:CNPJ17194123000103:FATURAMENTO1247:468626

Add the values of a given CNPJ.

Defining BIORC dataset,

(def opened-biorc (csv/read-csv (io/reader "data/VALORES_BIORC.csv")))
(def vec-biorc (vec (csv-data->maps opened-biorc)))

Generalize cnpj#

(defn submap-cond#
  "take a =CNPJ-value= and narrow down to the subclass which has this CNPJ"
  [key value vec-revenues]
  (filter #(submap? {(keyword key) value} %) vec-revenues))

Paticularize to CPFCNPJCLI

(defn cpf-cnpj#
  [value vec-revenues]
    (submap-cond# "CPFCNPJCLI" value vec-revenues))

Differences in performance

Looking for a key on a map is O(1) and in a vector O(n). https://www.infoq.com/articles/in-depth-look-clojure-collections/

(defn raw [n] (map keyword (map str (map char (range 97 (+ 97 n))))))
(defn mk-lin [n] (interleave (raw n) (range n)))
(defn mk-map [n] (apply hash-map (mk-lin n)))

(defn lhas [k s] (some #{k} s))
(defn mhas [k s] (s k))

(defn churn [lookup maker n]
  (let [ks (raw n)
         elems (maker n)]
   (dotimes [i 100000]
     (doseq [k ks] (lookup k elems)))))


(time (churn lhas mk-lin 5))
; "Elapsed time: 998.997 msecs"


(time (churn mhas mk-map 5))
; "Elapsed time: 133.133 msecs"

Selecting a non-unique value for a CNPJ on BIORC data-set

By try and error, untill finding a non-unique key-value,

(cpf-cnpj# (nth (vec unique-CNPJ) 6) vec-biorc)
:VL_DES0:VENCIMENTO16/06/2021:VL_FACE734.85:MULTA5.33:TPBAIXAB.PARCIAL:NDOC7:TOTAL_RECEBIDO201.18:ATRASO30:CPFCNPJCLI17946892000110:CODOPERACAO10158:VL_DCP219.53:CODCLIENTE450:A_RECEBER544.33:VL_DESCONTO0:JUROSMORA5.33:NOMECLIFIT 2 ACADEMIA LTDA
:VL_DES0:VENCIMENTO16/07/2021:VL_FACE734.85:MULTA0:TPBAIXANULL:NDOC8:TOTAL_RECEBIDO0:ATRASO0:CPFCNPJCLI17946892000110:CODOPERACAO10158:VL_DCP205.61:CODCLIENTE450:A_RECEBER734.85:VL_DESCONTO0:JUROSMORA0:NOMECLIFIT 2 ACADEMIA LTDA

Define a symbol

nu-value: non-unique value

(def nu-value (cpf-cnpj# (nth (vec unique-CNPJ) 6) vec-biorc))

Narrow to all non-unique elements

Pred: non-unique

A predicate for non-unique matches, for a given key.

;; (defn non-unique#
;;   [key coll]
;;   (let [cond (partial submap-cond# key)
;;         coll-recur coll
;;         narrow #{}]
;;     (if (empty? coll-recur)
;;       narrow)
;;     (if 
;;       )))
           
;; (> (count (cond (key (first coll-recur)) coll-recur)) 1)

Add the values of this key

(defn sum-a-numeric-key
  [key coll]
  (reduce + (map bigdec (map (keyword key) coll))))  

Example

(sum-a-numeric-key "A_RECEBER" nu-value)

RESULTS: 1279.18M

(sum-a-numeric-key "TOTAL_RECEBIDO" nu-value)

RESULTS: 201.18M

NOTE: The “M” in the end of the value has nothing to do with the measure system. It’s standard R$.

Compare revenue with credit-debt

Given a CNPJ-key, we can catch both the total revenue, and remaining debt

Example

Revenue Symbol

Let’s create a symbol for the non-unique-value, nu-value, for revenues also.

(def nu-value-revenue (cnpj# (nth (vec unique-CNPJ) 6) vec-revenues))

Take a look on what it looks like,

nu-value-revenue

RESULTS:

:CNPJ17946892000110:FATURAMENTO2633:741867

These are the revenue-data for the CNPJ 17946892000110.

Now, selecting only the “FATURAMENTO” value.

((keyword "FATURAMENTO") (first nu-value-revenue))

Refresher of where we are,

Remembering, his To-pay loan, and his Total-payed loan are:

(sum-a-numeric-key "A_RECEBER" nu-value)

RESULTS: 1279.18M

(sum-a-numeric-key "TOTAL_RECEBIDO" nu-value)

RESULTS: 201.18M

Making the math

Answering the age-old question: does he have what it takes?

(- (+ (bigdec ((keyword "FATURAMENTO") (first nu-value-revenue)))
      (sum-a-numeric-key "TOTAL_RECEBIDO" nu-value))
   (sum-a-numeric-key "A_RECEBER" nu-value)) 

The number resulting number has a positive value. Thus, his Cash-flow is positive. We can pay all bills, in whichever order we wish.

Wrapping the procedure in a function

(defn cash-flow
  "Calculate the cash-flow for a given cnpj-client"
  [cnpj vec-revenues vec-biorc]
  (let [nu-value-revenue (cnpj# cnpj vec-revenues)
        nu-value-spent (cpf-cnpj# cnpj vec-biorc)]
    (- (+ (bigdec ((keyword "FATURAMENTO") (first nu-value-revenue)))
          (sum-a-numeric-key "TOTAL_RECEBIDO" nu-value-spent))
       (sum-a-numeric-key "A_RECEBER" nu-value-spent))))

Example

(cash-flow (nth (vec unique-CNPJ) 6) vec-revenues vec-biorc)

RESULTS:

1555.00M

Mapping the function cash-flow into 10 fist values,

(take 10 (map #(cash-flow % vec-revenues vec-biorc) unique-CNPJ))

Take the first 10 CNPJ values mapped explicitly,

(map  #(nth (vec unique-CNPJ) %) (range 10))

Zip these, so in a map of key-pairs.

(zipmap
 (map keyword
      (map #(nth (vec unique-CNPJ) %)
           (range 10)))
 (take 10
       (map #(cash-flow % vec-revenues vec-biorc)
            unique-CNPJ)))

Zipping values and the CNPJ values

(defn cash-flow-zip
  "zip the CNPJ-key with the cash-flow-value in key-value pairs"
  [unique-CNPJ-set]
  (zipmap
   (map keyword
        (map #(nth (vec unique-CNPJ-set) %)
             (range (count unique-CNPJ-set))))
   (take (count unique-CNPJ-set)
         (map #(cash-flow % vec-revenues vec-biorc)
              unique-CNPJ-set))))

Example

Let’s take only the first 20 values

(take 20 (cash-flow-zip unique-CNPJ))
Results

The first 20:

:360300660001712823.39M
:319781020001001650.10M
:3022033000120-772.41M
:107524680001961186.54M
:33538566000120390.64M
:235921050001822985.88M
:17965748000121-189.66M
:92041860001752738.02M
:36446343000121170.42M
:51060730001862234.26M
:34439910000197-2669.56M
:28454897000142997.88M
:120249370001861927.06M
:31090953000111-3291.20M
:244486700001331103.6M
:110698880001352182.96M
:319489180001901645.27M
:200410140001601920.94M
:73792080001022058.82M
:309502900001031693.21M

All of them,

(take (count unique-CNPJ) (cash-flow-zip unique-CNPJ))

RESULTS:

:360300660001712823.39M
:319781020001001650.10M
:3022033000120-772.41M
:107524680001961186.54M
:33538566000120390.64M
:235921050001822985.88M
:17965748000121-189.66M
:92041860001752738.02M
:36446343000121170.42M
:51060730001862234.26M
:34439910000197-2669.56M
:28454897000142997.88M
:120249370001861927.06M
:31090953000111-3291.20M
:244486700001331103.6M
:110698880001352182.96M
:319489180001901645.27M
:200410140001601920.94M
:73792080001022058.82M
:309502900001031693.21M
:29540145000167-600.45M
:24508255000128-2082.18M
:26039709000194323.45M
:184876090001001886.64M
:187857120001291106.48M
:191230180001092547.74M
:396136720001082102.39M
:342686240001061527.34M
:20702307000141797.82M
:30817144000104-1290.06M
:226193790001552810.95M
:4050903000138299.01M
:10779573000119917.65M
:32814574000190809.29M
:94429760001991365.32M
:31456434000124-2160.28M
:195141700001121660.80M
:12243151000150158.14M
:26787657000134955.68M
:30532388000132-2993.73M
:33668333000142746.39M
:350274520001411493.71M
:261935720001281809.36M
:194140700001141154.80M
:42701390002971090.41M
:285310130001061931.8M
:243205230001833042.86M
:30927327000174-1743.96M
:24303621000102721.56M
:29128607000133-9104.64M
:339567040001913324.85M
:8833581000154-54.06M
:30073691000114-3829.88M
:31908491000105-1429.53M
:194616820001682606.78M
:139705140001111059.82M
:374628420001751178.28M
:62836140001042048.02M
:5143951000133-71.70M
:30724735000129212.05M
:27558575000180225.58M
:276949650001883631.92M
:25350032000148641.23M
:302752870001233100.78M
:215215580001921738.73M
:343364700001421909.92M
:33677849000153205.83M
:330119570001922403.70M
:31269972000100666.35M
:318912640001061249.11M
:34786751000105653.92M
:233063600001112135.34M
:277683800001652280.19M
:211488180001261097.58M
:247343710001651966.88M
:319658190001172090.26M
:283148430001812184.60M
:60288050001201639.21M
:344587210001611237.32M
:212383440001031234.1M
:290846010001011621.78M
:247808930001011628.8M
:311402570001721452.08M
:108370880001541190.20M
:9344871000105653.95M
:247321420001012631.01M
:210497910001141112.70M
:269889270001752081.00M
:25279612000196568.79M
:250802360001062637.2M
:292470640001731826.13M
:22639443000160919.65M
:334113420001531684.08M
:373153760001033143.22M
:34094560000174893.31M
:111259070001012090.28M
:180580460001271171.07M
:29344821000127-1019.03M
:10908817000116915.66M
:179468920001101555.00M
:230183550001031742.22M
:224272740001021499.82M
:25528983000164-2862.79M
:180407320001703165.95M
:38346620000150-1010.41M
:330942080001761366.07M
:110860750001532166.56M
:308458720001111834.57M
:219170860001912542.11M
:364972830001761183.31M
:17194123000103575.65M
:34252263000100402.48M
:19258542000197-281.41M
:35210834000106276.22M
:519774100012890.34M
:279264340001731427.26M
:329567330001902131.26M
:306424220001213360.8M
:289455180001171804.7M
:10042870001902060.38M
:326753120001912378.55M
:249555880001031361.53M
:292659540001082558.16M
:20506951000144-2650.97M
:202827930001961821.96M
:12259758000128-280.57M
:14052224000151605.86M
:336834330001481351.08M
:32809213000155-83.10M
:249413410001203587.75M
:307144830001571818.86M
:62277520000101312.06M
:235744360001901048.71M
:22975085000166-298.98M
:296425560001631221.23M
:1318023000101-357.45M
:246380000001801609.14M
:83184960002392622M
:311982920001422103.16M
:350696770001606.1M
:309352240001561204.26M
:154557740001843265.19M
:281811370001091025.18M
:355377550001043397.11M
:35190244000169658.50M
:143976780001641395.45M
:26797909000106-2378.72M
:261682690001751336.56M
:303460520001852984.52M
:312544840001291026.88M
:325204940001212112.63M
:29255601000127968.53M
:287910520001421463.08M
:32437216000105-1707.22M
:19514960000106-1950.83M
:17071839000113566.69M
:284703950001052684.2M
:30593007000125510.1M
:303028670001622601.94M
:309786280001272566.61M
:266899100001171365.74M
:20299257000101-932.93M
:3482178800011981.94M
:267127300001092912.58M
:336821090001052602.92M
:34577574000149387.78M
:654778200001141352.36M
:222094680001233187.85M
:115024440001412534.75M
:14842398000118476.07M
:209232030001671760.96M
:280653070001901298.26M
:26366223000160-231.14M
:133282210001352825.30M
:283266410001502506.56M
:32735848000155-328.43M
:30520255000146-2380.71M
:34862206000142-918.78M
:33764300000104-501.09M

Select the non-positive values (filter)

Take only negative cash-flow.

(filter #(< (second %) 0) table-CNPJ)

Results

Only negative values

:3022033000120-772.41M
:17965748000121-189.66M
:34439910000197-2669.56M
:31090953000111-3291.20M
:29540145000167-600.45M
:24508255000128-2082.18M
:30817144000104-1290.06M
:31456434000124-2160.28M
:30532388000132-2993.73M
:30927327000174-1743.96M
:29128607000133-9104.64M
:8833581000154-54.06M
:30073691000114-3829.88M
:31908491000105-1429.53M
:5143951000133-71.70M
:29344821000127-1019.03M
:25528983000164-2862.79M
:38346620000150-1010.41M
:19258542000197-281.41M
:20506951000144-2650.97M
:12259758000128-280.57M
:32809213000155-83.10M
:22975085000166-298.98M
:1318023000101-357.45M
:26797909000106-2378.72M
:32437216000105-1707.22M
:19514960000106-1950.83M
:20299257000101-932.93M
:26366223000160-231.14M
:32735848000155-328.43M
:30520255000146-2380.71M
:34862206000142-918.78M
:33764300000104-501.09M

Main call

(defn -main
  "Call on <<Faturamento.csv>> and <<VALORES_VIORC.csv>>;
  Return \"ledger.csv\", which informs the situation of the balance."
  [spenditure-table raw-table]
  (println "Hello, World!"))

commons-csv-clj's People

Contributors

buddhilw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.