Comments (19)
I am interested in anything :-).
Do you have a pointer to an existing pandas method?
Is this a vector->vector translation or is this a dataset->dataset translation?
I have needed this before in order to get smoother graphs for response times like for example when you have a set of responses and then no request for 5 minutes. In that case it would be useful to just duplicate the latest response time but add more values (like one every second or something).
from tech.datatype.
API:
- DataFrame reindex: https://github.com/pandas-dev/pandas/blob/v1.0.5/pandas/core/frame.py#L3837-L3856
- Index reindex: https://github.com/pandas-dev/pandas/blob/v1.0.5/pandas/core/indexes/base.py#L3101-L3159
Low level calls:
- _reindex_with_indexer: https://github.com/pandas-dev/pandas/blob/8ba9c627e2ad630d977ae504a3bf0e2ec5a9885a/pandas/core/generic.py#L4524
- reindex_indexer: https://github.com/pandas-dev/pandas/blob/8ba9c627e2ad630d977ae504a3bf0e2ec5a9885a/pandas/core/internals/managers.py#L1210
Wow, pandas code is a mess. I wouldn't know where to begin even with these pointers! It is definitely dataset to dataset. Yes your example is a fairly common use of propagating last observation to fill gaps in the index (in pandas you would call reindex(method='ffill').
from tech.datatype.
Hi! I've made filling missing with values from previous (or from next) possible value in tablecloth
here: https://scicloj.github.io/tablecloth/index.html#replace - it's also candidate to be moved to tech.ml stack.
from tech.datatype.
Great timing :-). Yes, then we could split this up into 'reindex' that sets new rows to empty and then a replace function. Very nice.
from tech.datatype.
Implementing hint, RoaringBitmap has nice two functions: previousAbsentValue
and nextAbsentValue
to find a range of missing indexes as continuous range.
from tech.datatype.
https://github.com/scicloj/tablecloth/blob/master/src/tablecloth/api/missing.clj#L55
from tech.datatype.
Low level attempt at this (operation works in double space):
https://github.com/techascent/tech.datatype/blob/fill-range/src/tech/v2/datatype/functional.clj#L244
from tech.datatype.
Chris, that looks great, thanks a lot! Waiting with patience for the support of datetime dtype!
from tech.datatype.
They are sort of implicitly supported:
https://github.com/techascent/tech.datatype/blob/master/src/tech/v2/datatype/datetime/operations.clj#L945
All datetime objects have a conversion to 64bit long milliseconds. When I implement this in dataset I will take care of that conversion (and back) automatically.
from tech.datatype.
Marking this as fixed here and filing two new bugs in dataset:
techascent/tech.ml.dataset#116
techascent/tech.ml.dataset#115
from tech.datatype.
@lccambiaghi - tech.ml.dataset version 3.04
- When you have time it would be great to hear if tech.ml.dataset/fill-range-replace
works for you.
@genmeblog - Copied your file over and exported the various functions into the tech.ml.dataset namespace. The difference here is a weaker column selection mechanism and no support for grouping. But the implementation has a public function that is private previously - replace-missing-with-strategy
. The grouping and such I believe happens outside of that function. For now I wouldn't refactor; it may be better to have tablecloth work on more version of tech.ml.dataset than just the absolute most current but I did copy the code and it worked great. I may move your more sophisticated column selection criteria into tech.ml.dataset as I believe that is a solid and unambiguous upgrade to select
and select-columns
.
from tech.datatype.
Stupid question.. how can I specify the 'span' when I want a "day" between each entry?
BTW interestingly this code
(-> (ds/->dataset {:dt [(java.time.LocalDateTime/of 2020 01 01 0 0 0)
(java.time.LocalDateTime/of 2020 01 05 0 0 0)]})
(ds/fill-range-replace :dt 1))
results in
1. Unhandled java.lang.OutOfMemoryError
Java heap space
from tech.datatype.
from tech.datatype.
Amazing, thank you for the pointer, I used the handy (dtype-dt/milliseconds-in-day)
! I am extremely happy with the solution, thank you so much!!
from tech.datatype.
That is great and you are very welcome! Keep em coming :-)
from tech.datatype.
Great! Thanks @cnuernber I will switch to migrated code asap.
Regarding grouping and column selection - we can leave it in tablecloth
for a while (or forever).
from tech.datatype.
Hey Chris, why fill-range-replace
not just fill-range
?
from tech.datatype.
Because it does the replace-missing operation just after the fill range operation. Honestly I would love a better name in general for fill-range. interpolate-spans also didn't seem very good. All these names seem bad to me or at least extremely obtuse.
from tech.datatype.
Ah, you're right. I don't know the better name too. Pandas' reindex
is also not good.
from tech.datatype.
Related Issues (13)
- Support for outlier using quartiles HOT 1
- dfn/round doesn't work on generic lists HOT 1
- Sort fails on readers HOT 1
- Date/time support
- [color-gradients] Idea for a higher-level interface. HOT 2
- dfn/reduce-+ NPE, depending on data size HOT 3
- Support clojure.lang.APersistentVector. HOT 1
- operator as sequential HOT 3
- Support UUID's as first class object datatypes. HOT 2
- seq of keywords can't be converted to reader HOT 4
- packed datetime types falls into ints HOT 31
- reader can have any type HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tech.datatype.