Giter VIP home page Giter VIP logo

dsc-capstone-choosing-a-topic's Introduction

Choosing a Capstone Topic and Dataset

Unlike the previous projects, we are not providing you with any topic/dataset options. Topic selection is part of the challenge of the Capstone project!

Choosing a Topic

When choosing a topic, think through these questions:

  • What would I be motivated to work on?
  • What data could I use?
  • How could an individual or organization use my product or findings?
  • What will I be able to accomplish in the time I have available?
  • What challenges do I foresee with this project?

Sourcing Your Own Data

Sourcing new data is a valuable skill for data scientists, but it requires a great deal of care. An inappropriate dataset or an unclear business problem can lead you spend a lot of time on a project that delivers underwhelming results. The guidelines below will help you complete a project that demonstrates your ability to engage in the full data science process.

Your data must be...

  1. Appropriate for supervised learning models. You may use unsupervised learning methods in your project (e.g. to generate cluster assignment labels), but there must be a substantial supervised learning component.

  2. Usable to solve a specific business problem. This solution must rely on your model(s).

  3. Somewhat complex. It should contain thousands of rows and features that require creativity to use. You can use a pre-existing clean dataset, but you should consider combining it with other datasets and/or engineering your own features.

  4. Unfamiliar. It can't be one we've already worked with during the course or that is commonly used for demonstration purposes (e.g. MNIST).

  5. Manageable. Stick to data that you can model with the knowledge and computational resources you have.

Once you've sourced your own data and identified the business problem you want to solve with it, you must to run them by your instructor for approval.

Problem First, or Data First?

There are two ways that you can source your own dataset: Problem First or Data First. The less time you have to complete the project, the more strongly we recommend a Data First approach to this project.

Problem First: Start with a problem that you are interested in that you could potentially solve using one of the four project models. Then look for data that you could use to solve that problem. This approach is high-risk, high-reward: Very rewarding if you are able to solve a problem you are invested in, but frustrating if you end up sinking lots of time in without finding appropriate data. To mitigate the risk, set a firm limit for the amount of time you will allow yourself to look for data before moving on to the Data First approach.

Data First: Take a look at some of the most popular internet repositories of cool data sets we've listed below. If you find a data set that's particularly interesting for you, then it's totally okay to build your problem around that data set.

There are plenty of amazing places that you can get your data from. We recommend you start looking at data sets in some of these resources first:

dsc-capstone-choosing-a-topic's People

Contributors

davidbraslow avatar loredirick avatar hoffm386 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.