Giter VIP home page Giter VIP logo

datawranglingpy's Introduction

My research interests include machine learning, data aggregation and clustering, computational and applied statistics, and mathematical modelling (the science of science, sport, economics, social sciences, psychometrics, bibliometrics, etc.).

In my spare time, I write books for my students and develop open-source data analysis software.

Open-access textbooks

Software

Python packages

R packages

  • stringi – Fast and portable character string processing in R (one of the most often downloaded packages for R) (GitHub) (CRAN) (paper)
  • genieclust – Fast and robust hierarchical clustering with noise point detection (GitHub) (CRAN) (paper)
  • stringx – Drop-in replacements for base R string functions powered by stringi (GitHub) (CRAN)
  • realtest – Where expectations meet reality: Realistic unit testing in R (GitHub) (CRAN)
  • TurtleGraphics – Learn computer programming in R while having a jolly time! (GitHub) (CRAN)

Data

datawranglingpy's People

Contributors

gagolews avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

datawranglingpy's Issues

equation ch 4

Thanks for the great book. The equation between fig 4.10 and 4.11 shows unrendered, as \[ \hat{F}_n(t) = \left\{ \begin{array}{ll} 0 & \text{for }t. I wanted to submit a PR, but couldn't find the book sources!

minor typos in 7.1.4

In section 7.1.4, there are two minor typos:

Exercise 7.2
Are they* worth taking note of ...

Exercise 7.3
Using numpy.insert, add* a new row/column ...

Typo in 3.5.4. Modify in place or return a modified copy?

I think there is a typo in section 3.5.4. Modify in place or return a modified copy?:

The list.sorted method modifies the list it is applied on in place:

x = [5, 3, 2, 4, 1]
x.sort()  # modifies x in place and returns nothing

(e.g. list.sort instead of list.sorted)?

missing subject

In 6.3.2. Pareto Distribution I think the sentence

This time, however, will be interested in not what is typical, but ...

is IMHO missing the subject.
I would write:

This time, however, we will be interested in not what is typical, but ...

example 7.8 - command soon to be depreciated

a warning comes up that the command
cmap=cm.get_cmap("copper"), # colour map
will soon be depreciated

substituting it with
cmap=plt.colormaps.get_cmap("copper"), # colour map
as suggested by the warning text seems to give the same output

misleading code comment

In 5.4.3 Slicing the comment in part of the very first code snippet

...
x[::-1]  # every second element
## array([50, 40, 30, 20, 10])
...

should say something like

...
x[::-1]  # every element in reverse order
## array([50, 40, 30, 20, 10])
...

backquoted operator names

In 5.4.3 Slicing
it looks like the inline code *= has. ot been rendered correctly

This did not modify the original vector, because we applied `*=` on a different object, which has not even been memorised after that operation took place.

Exercise 8.3 and others: normalisation

I came across multiple instances where it was not clear what exaxctly was meant by normalisation.
For example Exercise 8.3 asks for standardisation (calculating the Z-score), normalisation(?) and min-max scaling (min-max normalization). I assume that normalisation means calculating the mean normalisation. However, min-max scaling is a normalisation technique as well.
I'd recommend to spell out specifically what kind of normalisation needs to be calculated in excercises, to prevent confussion.

Missing figures

There are 4 missing 4 figures in chapter 9.
9.2, 9.4, 9.6, 9.8

8.1.3 - Small typo

small typo in the Important box final paragraph.
It should be corrected to:

"Generally, for two matrices, their column/row numbers must match or be equal to 1. Also, if one operand is a one-dimensional array, it will be promoted to a row vector."

Formatting error In section 1.2.3 point 5

Thank you for this free and easy to follow introduction to Data Wrangling in Python.

In section 1.2.3 under point 5:
Code is presented as normal text.
grafik
To make this code work I needed to change the double quote characters and reformat the text.
import matplotlib.pyplot as plt # basic plotting library plt.bar( ['Python', 'JavaScript', 'HTML', 'CSS'], # a list of strings [80, 30, 10, 15] # a list of integers (the corresponding bar heights) ) ) plt.title('What makes you happy?') plt.show()
grafik

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.