Giter VIP home page Giter VIP logo

data-science-interviews's Introduction

Photo by Waseem Farooq from PxHere

Data Science Interviews

Data science interview questions - with answers

The answers are given by the community

  • If you know how to answer a question — please create a PR with the answer
  • If there's already an answer, but you can improve it — please create a PR with improvement suggestion
  • If you see a mistake — please create a PR with a fix

For updates, follow me on Twitter (@Al_Grigor) and on LinkedIn (agrigorev)

Do you want to talk about data? Join DataTalks.Club

Questions by category

  • Theoretical questions: theory.md (linear models, trees, neural networks and others)
  • Technical questions: technical.md (SQL, Python, coding)
  • More to come

Contributed questions

The contrib folder contains contributed interview questions:

Other useful things

  • Awesome data science interview questions and other resources: awesome.md

This is a joint effort of many people. You can see the list of contributors here: contributors.md

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

data-science-interviews's People

Contributors

aditya239233 avatar akhilesh64 avatar alexeygrigorev avatar anki08 avatar averkij avatar azizsaya avatar bubblebooy avatar damiannmm avatar donaldonana avatar gabrielatrindade avatar hamzag95 avatar jasonnor avatar mahdirahbar avatar manuel-lang avatar martinp7 avatar mrsaeeddev avatar mruanova avatar mudittiwari255 avatar octatour avatar patrickloeber avatar pedrogengo avatar pop2pop3 avatar pymacbit avatar rahulmadanraju avatar rohanadagouda avatar samaritanhu avatar sanggusti avatar sash-ko avatar tranvohuy avatar vasugamdha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science-interviews's Issues

SQL query is incorrect

In the first case study, the query which tries to generate the ordered count of events for last week have incorrect query syntax. It doesn't yields the expected result. I want to submit the new query but I am not able to create a branch in this repo in order to create a pull request.

Suggest answer for theoretical question

Q: What if we want to build a model for predicting prices? Are prices distributed normally? Do we need to do any pre-processing for prices?


A: None of the models require independent or dependent variables follow normality assumption. The normality assumption is always on the error terms. We assume that after fitting the model, the error term are i.i.d. N(0, sigma2). 

Prediction of price could either be a time series forecast or cross-sectional forecast. In a time series forecast, we need to pre-process the price so that the time series is stationary. We can detect stationarity by using the ADF test. If it is not stationary, we use first-difference, second difference, ..., until the series is stationary. Sometimes we can use the log of price to resemble growth rate. Under the cross secitonal forecast, we can apply multivariate regression to predict price (such as house price forecast). Here we can also use the log of price, and we also need to engineer features and select features.

ROC definition

As written in Wikipedia, ROC stands for TPR against FPR.

But in the theory.md, it says "The diagrammatic representation that shows the contrast between true positive rate vs true negative rate.", I think may be it's a little mistake.

Regards,

[Feature Request] May create table of contents for markdown files of questions

Make answers to questions collapsed initially

Would be nice to have the answers to questions collapsed initially, so that reader can think a bit about the answer first, then it can be expanded to see the answer.
I would be happy to make a PR with such change. I am trying to understand how can i build site locally before pushing changes to this repo.
Thanks!

Q4 standard deviation code error-Technical.md(Python Coding)

Q4 standard deviation code error

The Error was in line.

avg = mean(numbers)

Mean can be imported from statistics and not math module.
Added the following code to correct it.

from statistics import mean

A pull request is already generated.

SQL database source

I read your article in HackerNoon and came to this repo.
That was an interesting and very helpful post.

Will you also share the database.
If you do not have a table schema, you can also post the CSV files.

select * from table1

create a CSV file and then you can share on Github.

No Pictures

Visualization/Images will make this repo a one-stop destination for all interview prep.
Is there any reason those are not added??
I will be willing to make it more appealing to the Eye.

def not called

Most of the python code(def) in technical.md is not called.As a result one has to copy the example from markdown examples and call the function to understand its working.
In some cases the markdown example is insufficient to understand the code.
For example :
9) Counter. We have a list with identifiers of form “id-SITE”. Calculate how many ids we have per site.

This Question is very brief and the markdown example guides on what we are trying to do but does not specify how is the input supplied into the function.

Algorithmic - 2. Fibonacci : fibonacci4 does not have O(n) complexity

The fibonacci4 is defined as follow:

def fibonacci4(n):
    '''Top down + memorization (dictionary), complexity = O(n)'''
    dic = {1:1, 2:2}
    if n not in dic:
        dic[n] = fibonacci4(n-1) + fibonacci4(n-2)
    return dic[n]

Since the dic is a local variable, it is not shared during the recursion (you can check it by printing at each step)

A way to take advantage of memoization is to use the package functools with the fibonnaci1 implementation:

from functools import lru_cache

@lru_cache()
def fibonacci1(n):
    if n == 0 or n == 1:
        return n
    else:
        return fibonacci1(n - 1) + fibonacci1(n - 2)

You can find a quick comparison in the following image:

fibonacci

Add AB testing questions

AB testing is asked in almost all interviews. There are many intricacies which show an interviewer if there is a deep understanding or not. Adding some of these questions would be very helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.