ostafen / clover Goto Github PK

View Code? Open in Web Editor NEW

598.0 11.0 49.0 1.46 MB

A lightweight document-oriented NoSQL database written in pure Golang.

License: MIT License

Go 100.00%

nosql-database document-oriented-database golang json embedded-database database nosql badger boltdb

clover's People

Contributors

Stargazers

Watchers

clover's Issues

Getting error when during bulk insert

I'm trying to load documents from JSON. I wrote the following code.

	docs := make([]*clover.Document, 0)
	for _, doc := range jsonObjects {
		docs = append(docs, clover.NewDocumentOf(*doc))
	}

	err = db.Insert(collection, docs...)
	if err != nil {
		log.Printf("Insert error: %s\n", err.Error())
		return errors.New("Write error")
	}

I got the following error back and it seems it's coming from badger DB. dgraph-io/badger#441

Txn is too big to fit into one request

They suggest using the WriteBatch API (dgraph-io/badger#1242 (comment)). Also, this seems to have been improved in newer versions of badger DB so updating the version used in clover could work.

[Feature Request] Utilize `encoding.BinaryMarshaller` and `encoding.BinaryUnmarshaller`

If encoding.TextMarshaller is used for types found with clover.NewDocumentOf and encoding.TextUnmarshaller with clover.Document.Unmarshal, then standard library and custom types can be used for fields that aren't currently able to marshal/unmarshal correctly like a custom UUID type.

Implement slice indexing when accessing a field

Hi, all. Currently, clover allows to access nested document fields by the following syntax:

doc.Get("field1.field2.field3")

Now, suppose that field2 is a slice. It would be useful to support indexing elements by the following syntax:

doc.Get("field1.field2.4.field3") // here, we are trying to access the fifth element of "field2"

Standardising API to support more languages apart from Go

Curious to know whether limiting API to Go was a choice.
If not, it will be great if it supports a common standardisation, i.e Mongo has BSON, so that it becomes easy to write drivers in other languages as well.

Add a db.GetCollections() method

Hello all.

In some cases, specially for old projects it would be interesting to add a GetCollections() method that returns a slice of strings with the different existing collections names.

V2: Sort() + Where() doesn't work

With the first alpha release of v2, I have noticed that combining a Where() and a Sort() simply returns all results (Where() gets ignored?).

q, err := db.FindAll(clover.NewQuery(mailbox).
    Skip(start).
    Limit(limit).
    Sort(clover.SortOption{Field: "Created", Direction: -1}).
    Where(clover.Field("SearchText").Contains("my search text")))

If I skip the Sort() then the query works as expected (except the results are not sorted of course). "Created" is indexed in my case, and appears to work as expected with all other queries, just not when Where() is involved.

Any ideas?

Inserts do not set up index correctly

This issue was in the pre-badger code (<= dc5b9e3). I'm putting it here for anyone else who wants to continue using the old storage. Note that if you're doing this, you should also cherry-pick the other indexing bug fix, e736362.

The behavior is that Insert() of documents after the first would result in an incorrect index since it always started with an offset of 0 regardless of how many documents were already in the collection. Attempts to access document N>0 would always get the wrong document. This usually resulted in bad JSON formatting errors, either unexpected end of JSON or invalid syntax. The patch is:

diff --git a/storage.go b/storage.go
--- a/storage.go
+++ b/storage.go
@@ -260,7 +260,7 @@ func (s *storageImpl) Insert(collection
                return err
        }

-       pointers, err := appendDocs(&collectionFile{File: tempFile, size: 0}, docs)
+       pointers, err := appendDocs(&collectionFile{File: tempFile, size: coll.file.size}, docs)
        if err != nil {
                return err
        }

@ostafen, I'm submitting and then closing; hopefully others searching for a fix will find this.

Example Request: Unmarshal/Update/Replace

I can't seem to figure out how to do this. I have a struct whose fields are tagged for clover. I would like to read in a document, Unmarshal it to my struct, make changes to my struct, then update the existing document using my struct.

import c "github.com/ostafen/clover/v2"

type Example struct {
  Foo string `clover:"foo"`
}

doc, .. := c.Query(Find...))

var example Example
doc.Unmarshal(&example)
example.Foo = example.Foo + " bar "

c.ReplaceById( .... ) // can't use NewDocumentOf() since objectId isn't set?

Allow to discover available fields for a document

Currently one can only access the values of fields using Get(), requiring prior knowledge of available field names.

Given the schema-less nature of Clover this feature would be helpful in many ways.

Two possible options (non exclusive) :

access to a copy or read-only version of the internal map
method to list available fields (including sub-documents, ex. ["a", "b", "b.x", "b.y"...], for that matter having "b" standing alone is questionable)

SaveOrUpdate

clover/db.go

Line 63 in 530c209

objectId := newObjectId()

Hi,

I'm not sure if this is an issue or discussion. I'm very happy with this database and I like to help it if I can.

Some databases have a function to update or save a new document. In this case, the database is always creating a new document.

I think that if the document ID is provided by the user, it is easier to omit this line and update the document.

What do you think about?, if you are agreed, I can modify it or create a new function and I will do a PR.

Thanks a lot

Update by Id

Couldn't find a documented way of doing so.

Empty database uses more than 2 GB of disk space

Hi all

First of all: thanks for your work - it's tackling my need and works pretty well :)
One quick question: Is it intended for a nearly empty database to use > 2 GB disk space?

Example (see 000006.vlog):

▶ ls -alth test-clover-db        
total 52K
drwxrwxr-x 2 thiko thiko 4,0K Apr 24 07:41 .
-rw------- 1 thiko thiko   58 Apr 24 07:41 MANIFEST
-rw-rw-r-- 1 thiko thiko  399 Apr 24 07:41 000003.sst
-rw-rw-r-- 1 thiko thiko   20 Apr 24 07:41 000005.vlog
-rw-rw-r-- 1 thiko thiko 2,0G Apr 24 07:41 000006.vlog
-rw-rw-r-- 1 thiko thiko 128M Apr 24 07:41 00006.mem
-rw-rw-r-- 1 thiko thiko    5 Apr 24 07:41 LOCK
-rw------- 1 thiko thiko   60 Apr 23 20:54 .directory
-rw-rw-r-- 1 thiko thiko  399 Apr 23 20:53 000002.sst
-rw-rw-r-- 1 thiko thiko  349 Apr 23 20:50 000001.sst
-rw-rw-r-- 1 thiko thiko 1,0M Apr 23 20:03 DISCARD
-rw------- 1 thiko thiko   28 Apr 23 20:03 KEYREGISTRY

Thanks in advance!

need update readme

import (
  "log"
  "github.com/dgraph-io/badger/v3"
  c "github.com/ostafen/clover"
  badgerstore "github.com/ostafen/store/badger"
)

these example code in readme file cause some error. github.com/ostafen/storedosen't exist now. We can use the code in the db_test file as follows

	"github.com/dgraph-io/badger/v3"
	c "github.com/ostafen/clover/v2"
	badgerstore "github.com/ostafen/clover/v2/store/badger"

Hope to support semantic query building function

	dataAll, err := a.FindAll(
		a.BuildQuery("names", "name=xiaobai or age=20"),
	)

	dataAll, err := a.FindAll(
		a.BuildQuery("names", "name=xiaobai and age=20"),
	)

I have a simple implementation here

func isPureNumber(s string) bool {
	_, err := strconv.Atoi(s)
	return err == nil
}

	func (a *easyDB) BuildQuery(collectionName string, querySyntax string) *query.Query {
		// Query syntax = != > >= < <= like
		// Analyze query syntax and build a query object
		// Example: name=xiaobai or age>=18
		operatorList := []string{"!=", ">=", "<=", "like", "<", ">", "="}
		q := query.NewQuery(collectionName)
		var fieldName string
		var operator string
		var value interface{}
		var _q query.Criteria
		one := false
	
		if strings.Contains(querySyntax, "or") {
			orConditions := strings.Split(querySyntax, "or")
			for _, conditionStatement := range orConditions {
				operator = ""
				for _, op := range operatorList {
					if strings.Contains(conditionStatement, op) {
						conditionStatement = strings.TrimSpace(conditionStatement)
						fieldName = strings.Split(conditionStatement, op)[0]
						value = strings.Split(conditionStatement, op)[1]
						operator = op
						break
					}
				}
				if operator == "" {
					break
				}
				if !one {
					switch operator {
					case "=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = query.Field(fieldName).Eq(v)
						} else {
							_q = query.Field(fieldName).Eq(value)
						}
					case "!=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = query.Field(fieldName).Neq(v)
						} else {
							_q = query.Field(fieldName).Neq(value)
						}
					case ">":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).Gt(v)
					case ">=":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).GtEq(v)
					case "<":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).Lt(v)
					case "<=":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).LtEq(v)
					case "like":
						_q = query.Field(fieldName).Like(value.(string))
					}
				} else {
					switch operator {
					case "=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = _q.Or(query.Field(fieldName).Eq(v))
						} else {
							_q = _q.Or(query.Field(fieldName).Eq(value))
						}
					case "!=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = _q.Or(query.Field(fieldName).Neq(v))
						} else {
							_q = _q.Or(query.Field(fieldName).Neq(value))
						}
					case ">":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.Or(query.Field(fieldName).Gt(v))
					case ">=":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.Or(query.Field(fieldName).GtEq(v))
					case "<":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.Or(query.Field(fieldName).Lt(v))
					case "<=":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.Or(query.Field(fieldName).LtEq(v))
					case "like":
						_q = _q.Or(query.Field(fieldName).Like(value.(string)))
					}
				}
				println(conditionStatement, fieldName, operator, value)
				one = true
			}
		}
	
		if strings.Contains(querySyntax, "and") {
			andConditions := strings.Split(querySyntax, "and")
			for _, conditionStatement := range andConditions {
				operator = ""
				for _, op := range operatorList {
					if strings.Contains(conditionStatement, op) {
						conditionStatement = strings.TrimSpace(conditionStatement)
						fieldName = strings.Split(conditionStatement, op)[0]
						value = strings.Split(conditionStatement, op)[1]
						operator = op
						break
					}
				}
				if operator == "" {
					break
				}
				if !one {
					switch operator {
					case "=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = query.Field(fieldName).Eq(v)
						} else {
							_q = query.Field(fieldName).Eq(value)
						}
					case "!=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = query.Field(fieldName).Neq(v)
						} else {
							_q = query.Field(fieldName).Neq(value)
						}
					case ">":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).Gt(v)
					case ">=":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).GtEq(v)
					case "<":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).Lt(v)
					case "<=":
						v, _ := strconv.Atoi(value.(string))
						_q = query.Field(fieldName).LtEq(v)
					case "like":
						_q = query.Field(fieldName).Like(value.(string))
					}
				} else {
					switch operator {
					case "=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = _q.And(query.Field(fieldName).Eq(v))
						} else {
							_q = _q.And(query.Field(fieldName).Eq(value))
						}
					case "!=":
						if isPureNumber(value.(string)) {
							v, _ := strconv.Atoi(value.(string))
							_q = _q.And(query.Field(fieldName).Neq(v))
						} else {
							_q = _q.And(query.Field(fieldName).Neq(value))
						}
					case ">":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.And(query.Field(fieldName).Gt(v))
					case ">=":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.And(query.Field(fieldName).GtEq(v))
					case "<":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.And(query.Field(fieldName).Lt(v))
					case "<=":
						v, _ := strconv.Atoi(value.(string))
						_q = _q.And(query.Field(fieldName).LtEq(v))
					case "like":
						_q = _q.And(query.Field(fieldName).Like(value.(string)))
					}
				}
				println(conditionStatement, fieldName, operator, value)
				one = true
			}
		}
	
		q = q.Where(_q)
	
		return q
	}
	```

Improve existing documentation

Several additions and functions have been added for the v1.1.0 release. It is important to improve the structure of existing documentation, also adding missing sections.

DropCollection doesn't delete the collection from disk

using Clover v1.2.0

the DropCollection() method doesn't remove the data from the disk; recreating the collection after dropping the collection still has the documents previously stored in the collection.

Given the following code:

	type Foobar struct {
		Foo string
		Bar string
		Num int
	}
	var foo = Foobar{Foo: "foo", Bar: "bar", Num: 42}

	db, _ := clover.Open("clover-db")
	defer db.Close()
	db.CreateCollection("foobar")


	doc := clover.NewDocument()
	doc.Set("foobar", foo)

	docId, _ := db.InsertOne("foobar", doc)
	fmt.Println("id inserted: ", docId)

	if err := db.DropCollection("foobar"); err != nil {
		fmt.Print("error: ", err)
	}
	db.Close() // explicitly close just because

	// reopen the db, recreate the collection
	db, _ = clover.Open("clover-db")
	defer db.Close()
	db.CreateCollection("foobar")

	docs, _ := db.Query("foobar").FindAll()
	fmt.Println("after recreation, document count: ", len(docs))
	for _, d := range docs {
		fmt.Printf("docs:%+v\n", d)
	}

The output from the code:

id inserted:  b0a869b0-c8cf-4d61-91aa-98cc93808d0a
after recreation, document count:  1
docs:&{fields:map[_id:b0a869b0-c8cf-4d61-91aa-98cc93808d0a foobar:map[Bar:bar Foo:foo Num:42]]}

I would guess that this is possibly an issue with the Badger storage engine?

FindById() not working as expected

I may be doing something wrong; if so, please point it out. However, after closing and re-opening a database, FindById() does not find any documents.

package main

import (
	c "github.com/ostafen/clover"
	"log"
)

func main() {
        // This part is out of the README
	db, _ := c.Open("clover-db")
	db.CreateCollection("myCollection")

	doc := c.NewDocument()
	doc.Set("hello", "clover!")

	docId, _ := db.InsertOne("myCollection", doc)
	log.Printf("created document %s\n", docId)

	doc, _ = db.Query("myCollection").FindById(docId)
	log.Println(doc.Get("hello"))

        // Now, close the document, re-open it, and try to FindByID()
	db.Close()

	db, _ = c.Open("clover-db")
        // First, find by the ID created above
	doc, _ = db.Query("myCollection").FindById(docId)
	if doc == nil {
		log.Printf("didn't find the document %s\n", docId)
	}
        // That fails, so now find the document manually, get the ID from it, and try to find it
	ds, _ := db.Query("myCollection").FindAll()
	if len(ds) == 0 {
		log.Printf("didn't find any documents")
	}
	newDocId := ds[0].ObjectId()
	if docId != newDocId {
		log.Printf("the ids changed (%s != %s)\n", docId, newDocId)
	}
	doc, _ = db.Query("myCollection").FindById(newDocId)
	if doc == nil {
		log.Printf("didn't find the document (%s) I just loaded\n", newDocId)
	}
	db.Close()
}

What I do here is:

Tutorial code from README
Close the DB
Re-open it
Try to find the document previously inserted with FindById()
Find the document with FindAll(), and get the ID from it
Try to find the document using FindById() using the ID in step 5

On my end, once I close the DB no documents are able to be referenced by ID.

10126» go run . 
2022/03/12 09:39:59 created document ca7e8241-52ff-4f17-84b6-cb947acbe581
2022/03/12 09:39:59 clover!
2022/03/12 09:39:59 didn't find the document ca7e8241-52ff-4f17-84b6-cb947acbe581
2022/03/12 09:39:59 didn't find the document (ca7e8241-52ff-4f17-84b6-cb947acbe581) I just loaded

Clover: github.com/ostafen/clover v0.0.0-20220302164508-28d538d46bc1
Go: go version go1.17.8 linux/amd64

Edit

I believe this is because the collection index is never updated except when documents are inserted. Specifically, the index is never refreshed when the collection is loaded -- or, at least, I can't find where this happens.

Memory efficient `ExportCollection()`

At the current state, ExportCollection() loads all documents in memory before exporting them to a json file.

	result, err := db.FindAll(query.NewQuery(collectionName))
	if err != nil {
		return err
	}

	docs := make([]map[string]interface{}, 0)
	for _, doc := range result {
		docs = append(docs, doc.AsMap())
	}

	jsonString, err := json.Marshal(docs)
	if err != nil {
		return err
	}

This is far from being optimized, since collections could contain thousands or millions of documents.
Export should thus performed gradually by writing documents to the output file one by one (the ForEach() method can be used to iterate no documents).

About the size of database occupancy

Database initialization takes up a large amount of space. For example, after I save more than 400 databases, it takes up 2GB.
I don't know if this is a normal phenomenon.

Support struct in Save() method

As I understand the clover gives a way, to add any struct object to the collection, through NewDocumentOf. It converts the struct to a map struct. It is very easy to use.

But I am unable to use this object to make updates and deletes. If I want to do this, I had to let the collection find the object, and then use the document.Set(..) to make an update.

I think there is a way to add _id to the structure or require the struct with "_id', so the structure is clover supported structure. And then find and update is easy to process.

Error while marshalling json InsertOne

Issue happens when json marshalling document items cause error, in that case the doc id is nil and next line
doc.Get(objectIdField).(string) cause error .

func (db *DB) InsertOne(collectionName string, doc *Document) (string, error) {
	err := db.Insert(collectionName, doc)
	//if error happens then  doc.Get(objectIdField) is nil
	return doc.Get(objectIdField).(string), err
}

//Should be something like

func (db *DB) InsertOne(collectionName string, doc *Document) (string, error) {
	err := db.Insert(collectionName, doc)
	if doc.Get(objectIdField) == nil {
	    return "",err
	}
	return doc.Get(objectIdField).(string), err
}

V2: RunValueLogGC(): Value log GC attempt didn't result in any cleanup

Hi @ostafen - I'm noticing every 5 minutes a RunValueLogGC(): Value log GC attempt didn't result in any cleanup in the logs - I know where it comes from in the code but ... I'm suspecting that the GC maybe not working as expected. To test I inserted 7.2GB of data yesterday (300k documents), and then deleted all the documents (so it's "empty" now). I's been > 24 hours now and I still have 7.2GB. Accessing the database is much slower than usual (probably because it now scans through 7GB of "empty" data to eventually return nothing, so the deleted data definitely appears to impact performance). I've inserted some more documents, deleted those etc etc (hoping some action would cause it to start working and prune the database), but it doesn't seem to be doing anything. All I see are those error messages on the 5 minute interval. I saw somewhere that you use a discard ratio of 0.5, however I would think that this situation should be a 100% discard ratio (or maybe 99.99%).

Are you able to shed some light as to how this is supposed to work exactly, and whether there is anything I can do in Mailpit (or CloverDB) to "help" it reclaim space once a whole lot of data is deleted? I know you said "if you try to perform several insertion and deletion sequences, you will reach a point where disk space is reclaimed", however I can't seem to get that to work despite there being 300,000 less documents in two separate catalogues. I have considered closing the database and physically deleting the files, then recreating, but that seems like a very extreme solution.

Any ideas / suggestions? Thanks!

Lastly (a separate "issue"), that error output appears to be hard coded, in that any app using CloverDB will display those errors regardless. Ideally I would prefer to hide it (or see it only in debug/verbose mode) rather than always displaying it. I know the output is coming from a goroutine - but would you confider (at some stage) a clover "flag" to be able to turn that particular GC error message off?

Add support for criteria involving operations between fields

Currently, clover db only supports building criteria where you compare a field with a value, like the one in the following example:

db.Query("myCollection").Where(c.Field("myField").Eq(1))

Naturally, there are several contexts where one may need to express condition where you compare fields of the same document. For example you may want to select documents satisfying a condition such as myField1 > myField2.

This could be integrated inside clover in the following way:

db.Query("myCollection").Where(c.Field("myField1").Gt("$myField2"))

where the special syntax $myField2 tells clover to extract the value to use for comparison from the document field named myField2.

Suggestions and feedback about alternative ways to accomplish this are welcome :=)

v2: db.Delete() deleting all rather than limited set from query

Note this is regarding the v2 branch: I am experiencing very slow deletions when dealing with 1000s of documents, so I decided to try delete smaller subsets to benchmark. This is when I noticed that when trying to limit() deletions to any value, all documents appear to be deleted, for example with a collection with 1000 documents:

if err := db.Delete(clover.NewQuery("collection).Limit(100)); err != nil {
    return err
}

all documents get deleted.

Small v2 package annoyance

Hey there, I'm playing around with clover which looks really nice, so thank you for open sourcing it! :)
This is just a very small annoyance when trying to use this library for which I don't think a PR can help.

tl;dr: I think that go is getting confused with the v2 branch name being the main branch?

trying to go get clover's v2 package tried to get the latest "alpha" tag instead of the v2 git branch.
the error seems to be because I'm trying to use the document package that didn't exist when the tag was created.

go get -u github.com/ostafen/clover/v2/document
go: module github.com/ostafen/clover/v2@upgrade found (v2.0.0-alpha.2), but does not contain package github.com/ostafen/clover/v2/document

same with latest, kinda expected this.

go get -u github.com/ostafen/clover/v2/document@latest
go: module github.com/ostafen/clover/v2@latest found (v2.0.0-alpha.2), but does not contain package github.com/ostafen/clover/v2/document

trying to explicitely get the v2 branch fails, this one is the weird one for me.

go get -u github.com/ostafen/clover/v2/document@v2
go: github.com/ostafen/clover/v2/document@v2: no matching versions for query "v2"

to get the latest version for v2 I need to actually use the latest commit sha.

go get -u github.com/ostafen/clover/v2/document@dce004e1cd8e1add291511b96bb3977036abcddb
go: added github.com/ostafen/clover/v2 v2.0.0-alpha.2.0.20221120132158-dce004e1cd8e

other branch names also work as expected.

go get -u github.com/ostafen/clover/v2/document@v2-store-merge
go: added github.com/ostafen/clover/v2 v2.0.0-alpha.2.0.20230203105032-c302b23db778

ps. It's getting a bit late here so I hope I'm not just missing something completely obvious here and wasting your time, I do apologise in advance though if that's the case :D

Question: Is there a way to search for a value in all fields?

I'm trying to evaluate if this DB serves my purpose. I have a use case where I need to search for a value in all the fields. Something like below:

db.FindAll(c.NewQuery("todos").Where(c.AnyField.Eq("seach string")))

I got my test code working by collecting all the fields from the document and dynamically building the criteria string:

	for field := range fieldSet {
		criteria = criteria.Or(clover.Field(field).Eq(value))
	}

Wondering if there is a better way to search through all documents?

How could I get Top 10

Hey, I'm new with NoSQL I'm really excited about it but I couldn't find a way to sort out the just top 10
I could make it with complex functions but I was just wondering is there is an easy way.

Type of Collation in down below, I just want to get a max of 10 playtimes.
is there any query for that?

Userid   string `clover:"userid"`
PlayTime int    `clover:"playtime"`
Name     string `clover:"name"`

Add `OpenWithOptions()` to badger store

At the moment, badger store provides an Open() function which takes a badger options parameter.
However, it would be good to rather add an OpenWithOptions() function for that purpose, and make the default Open() to just take a path parameter (default options would be used in that case).

Most efficient solution for paging query results with a big collection

Hi there. Firstly thank you for this cool project - it seems quite well-suited for what I am building! I am hitting some performance issues though, so thought I'd explain what I'm doing (to give some context) and where the two bottlenecks are, and hopefully there is a different approach which I have simply overlooked.

So first of all, I'm using CloverDB to store parsed emails. Each document contains some basic info about the email, including from, to, subject, timerstamp it was received (time.Time), as well as the entire raw email. Then I have a frontend which displays a paginated overview of the message "summaries" (ie: the basic data excluding the raw email itself), from newest to old (25 per page). The bottlenecks are caused by two things here, namely:

Sorting by received timestamp (.Sort(clover.SortOption{Field: "Created", Direction: -1}))
A count of all records (db.Count(mailbox))

With about 20k emails in a collection a typical request for 25 records (let's say the latest 25) takes about 9 seconds, which includes a Count() of all documents. Removing the Count() of the all documents in the collection reduces the request to about 5 seconds, and then when also removing the sort I get the results in about 0.7 seconds.

What is the most efficient manner to reverse sort the order that documents are stored in the collection?
What is the most efficient way to count the total number of documents in a collection?

In relation to the the above, I am considering spitting the email "summary fields" (to, from, subject etc) from the actual raw email content (which can also contains attachments), and storing them separately in two separate collections. Before I refactor all my code, do you think this approach is better? Preliminary tests (which simply exclude storing the raw email data in my collection) appear to more than halve the execution time above, although I don't know whether by storing the raw data in a separate collection it would slow things down again (I don't know quite how badger handles collections).

Thank you for your time.

Feature: a cmdline tool to explore a cloverDB

Hello, and thanks for this excellent lib!

The title says it all: is there a cmdline tool to explore a cloverDB? Query collections and documents, insert, delete, create indexes...

If not, is it planned?

Thanks :)

Implement sorting

Hello,

It would be a good idea to have the possibility of sorting the different FindAll() results.

In order to don't make the code complex, here are a few simple ideas for sorting results numerically and alphabetically that would be useful to implement:
db.Query("todos").Where(c.Field("completed").Eq(true).And(c.Field("userId").In(5, 8))).FindAll().SortAsc("userId") and SortDesc
db.Query("todos").Where(c.Field("completed").Eq(true).And(c.Field("userId").In(5, 8))).FindAll().Sort("userId", 1) or -1

GT LT operation very slow

query like
query :=clover.NewQuery("stat"). Where(clover.Field("stat_time").GtEq("2022-11-14"). And(clover.Field("stat_time").LtEq("2022-11-15")). And(clover.Field("ad_id").Eq("1749344214764583"))). Sort(clover.SortOption{Field: "stat_time", Direction: 1})
very fast.
but
query :=clover.NewQuery("stat"). Where(clover.Field("ad_id").Eq("1749344214764583")). And(clover.Field("stat_time").LtEq("2022-11-15")). And(clover.Field("stat_time").GtEq("2022-11-14")). Sort(clover.SortOption{Field: "stat_time", Direction: 1})
very slow.
Why?

Add a Contains criteria

Hi, all! It could be useful to implement a Contains(elems ...interface{}) to check if a slice field contains one or more elements.
For example, assume to run the following query

db.Query("myCollection").Where(c.Field("myField").Contains(4))

on a `myCollection" collection which consists of the following three documents:

{
   ...,
   "myField": [1,2,4]
},
{
   ...,
   "myField": [5,6,7]
},
{
   ...,
   "myField": [4, 10, 20]
}

The query would return only the first and the third document.

Add Exists method to Query

Sometimes we want to simply check that the result set of a query is not empty, thus it will be handy to add an Exists() method to the Query struct.
The can be implemented by simply check whether the document returned by FindAny() is not equal to nil.

Add support for geospatial queries/indexing

It would be nice to provide Near()/NearSphere()-like criteria for geo-spatial queries.
Moreover, it should be possible to create a geospatial-index on a specific location field

Implement regex expression matching

It would be nice to implement a "Like(regex string) " criteria for regex expression matching

Implement an in-memory storage engine

The persistence layer of CloverDB is abstracted by the StorageEngine interface. The default implementation persists data on disk only and makes use of the badger kv-store.
It would be useful to also add an alternative StorageEngine implementation running completely in-memory, arranging documents in a map.

Gob encoding and internal data types

Discussed in #41

^{Originally posted by ostafen May 1, 2022}
Hi, everyone, I created this discussion to collect opinions and suggestions, since this is a very sensitive topic.
Currently, CloverDB serializes documents to json before storing them on disk. This has been done because of the fact that, early versions of the library used ".json" files directly to store data.
But since Clover evolved since that time (it now uses the badger kv-store), this solution is no more acceptable for the following reasons:

instances of the time.Time struct cannot be correctly recovered, because they are converted to string when serialized and, as a consequence, json.Unmarshal() deserializes them to normal stings. This affects queries involving dates or times (unless you decide to store them as a timestamp during document insertion).
All numbers are silently converted to float64.

To fix these issues, I was thinking to switch to the gob encoding, which preserves the correct type for each document field.
This open a new question about internal data types:

Which numeric types should be supported by clover? Should we preserve all of the types (int, uint8/int8, uint16/int16... and so on) or should we restrict types (using int64 for integer numbers and float64 for double numbers, for example).

What do you think about this?

JSON export/import function

Taking account that Clover is looking forward to implement different types of storage engines and is mainly used for small and embedded databases, adding the possibility to easily export and import your data to JSON independently of the engine used, would be really helpful for working with test data, backing up, etc.

However, there are a few problems:

How to handle the conversion specially for bigger databases. Possible bottlenecks?
Use one just single file or multiple files. Maybe both options?

v2: Constant CPU usage

Hi @ostafen. As you would probably have seen (in Mailpit), a user reported a constant CPU usage, which I believe I have traced back to CloverDB v2 (both 1 & 2 alpha).

A simple test:

package main

import (
	"time"
	"github.com/ostafen/clover/v2"
)

func main() {
	db, _ := clover.Open("", clover.InMemoryMode(true))
	defer db.Close()
	time.Sleep(600 * time.Second)
}

Run the program and check CPU usage with top/htop - the running binary will be using a constant 2-3% CPU. This does not happen with CloverDB v1.

Any ideas?

Cannot allocate initial memory

Hello,

I'm starting a project where, after some research, I would like to embed clover db. I've started to tried it but I'm getting kernel panic due to memory allocation.

I was going through past issues and I read issue #35 , so I made sure I'm using db.Close(), actually my code is just your repply to that issue:

db, err := c.Open("./db")
defer db.Close()
if err != nil {
    panic(err)
}

After which I get a kernel panic saying:

cannot allocate memory while mmapping ./db2/000001.vlog with size: 2147483646

My question is: is there a way to configure the db so it doesn't reserve so much memory from the start?

Thank you for your work.

cannot install using go 1.18.1

It seems that the package id not running perhaps for a breaking change on the UUID package

 go get github.com/ostafen/clover
# github.com/ostafen/clover
../../../go/src/github.com/ostafen/clover/db.go:49:9: multiple-value uuid.NewV4() (value of type (uuid.UUID, error)) in single-value context

Yaml support

I have read that CloverDB stores data records as JSON documents

You could replace the JSON format with YAML, because it's more streamlined:

In many cases, it takes up less disk space

Question: Searching for all documents within time range

I'm searching for a field in all documents within a given time range. Currently, my approach is to use MatchFunc to search in all fields of the document and added a where clause before it to specify the time range. I realize I could check the timestamp range within the MatchFunc but that wouldn't benefit from the indexing. Any suggestions to improve query performance?

My dataset is one million documents and the time field is an integer and is indexed.

	db.CreateIndex(eventCollection, "eventTimestamp")
	db.CreateIndex(eventCollection, "_id")

I'm trying the following query but it still takes 30 secs.

	log.Printf("Searching for %s within timestamp %d and %d", value, start, end)

	docs, err := db.FindAll(
		clover.NewQuery(eventCollection).Where(clover.Field("eventTimestamp").GtEq(start).
			And(clover.Field("eventTimestamp").LtEq(end))).MatchFunc(func(doc *clover.Document) bool {
			for _, v := range doc.ToMap() {

				val, ok := v.(string)
				if !ok {
					continue
				}

				if val == value {
					return true
				}
			}
			return false
		}).Sort(clover.SortOption{Field: "eventTimestamp", Direction: 1}))
	if err != nil {
		log.Printf("FindAll Error: %s\n", err.Error())
		return nil, errors.New("error finding documents")
	}

Cursor for running through big datasets.

Hi,

I'm trying out clover for an easy and portable way of storing stock market price data.
And it works good enough for my usecase in general. The only thing I was wondering is if there is support for cursors?
Right now I'm doing a FindAll() which takes pretty long and obv. causes the whole chunk of data to be stored in memory.

It would be awesome if something like this was possible

cond := clover.Field("openTime").GtEq(tw.Start).And(clover.Field("openTime").LtEq(tw.End))
cursor, err := s.db.Query("candles").Where(cond).Sort(clover.SortOption{"openTime", 1}).Cursor()

for cursor.Next() {
  cursor.Unmarshal(&myStruct)
  
  // do something with myStruct
}

Plan about awesome warehouse

It is recommended to create a new warehouse called awesome-clover or awesome-cloverdb.

CloverDB will have more people in the future, but the only program-self is not enough, although this database is a very small and complete library, but also needs ecology.Just like SQLITE and MongoDB, they have GUI , practice projects , driver and so on.
At present, I have developed a novel coronavirus health code collector using cloverDB ，and it is being applied in small communities in China ，served a lot of people.
I am very interested in contributing hands-on projects related to the awesome clover library to be developed, such as GUI and development examples, more complete hands-on tutorials and drivers. I will be the first to contribute to the project, and project-related maintenance is upgraded with the CloverDB upgrades.
Creating new ecological power, the clover will get better and better.

Add code examples

Hi, all! In order to help new users getting started with clover quickly, it would be good to add an examples folder containing
several basic code samples showing the main feature of the library

distributed?

any suggestion on how to run a distributed clover db?

about the function count( )

bro, the fun count( ) , when there is a lot of data, it takes a long time to return.
and i see the source code ,it is implemented by findall( ) and len( )the return
it takes a lot time when the data is big
such as ,now i have about 40,000 pieces of data. it takes more than a second...
maybe we should change the fun implementation

func (here *Db) SearchContent(names []string, num int, pg int) ([]*clover.Document, int) {
	var name string
	for i, v := range names {
		if i < len(names)-1 {
			name += "(.*" + regexp.QuoteMeta(v) + ".*)|"
		} else {
			name += "(.*" + regexp.QuoteMeta(v) + ".*)"
		}
	}
	
	query := here.content.Where(clover.Field("name").Like(name))

	startT := time.Now()

	docs, _ := query.Skip(num * pg).Limit(num).FindAll()
	fmt.Printf("time.Since(startT): %v\n", time.Since(startT))
	startU := time.Now()

	pgCount, _ := query.Count()
	fmt.Printf("time.Since(startU): %v\n", time.Since(startU))

	return docs, int(math.Floor(float64(pgCount/num) + 0.0/2.0))
}

it print this

time.Since(startT): 204.9991ms
time.Since(startU): 1.40169s

Skip/Limit criteria

Would be useful to have a skip and limit criteria for number of documents returned by FindAll.

ostafen / clover Goto Github PK

clover's People

Contributors

Stargazers

Watchers

Forkers

clover's Issues

Edit

Discussed in #41

Recommend Projects

Recommend Topics

Recommend Org