Index
- What is data science?
- What kind of problems can be solved with data science?
- The data science workflow
- Data science related roles
1. What is data science?
Data science is a set of methodologies for processing and studying data. The goal is to use this data to draw meaningful conclusions. For instance, thanks to insights coming from data transactions, we could make an effective detection and prevention of fraud involving credit cards.
2. What kind of problems can be solved with data science?
More precisely, data scientists can use data to:
- Describe the current state of an organization or process
- Detect anomalous events, such as fraudulent transactions
- Diagnose the causes of observed events and behaviors
- Predict future events
Data science is about discovering and communicating insights from data, which is coming from very different sources, so in order to exploit it, data scientists need to adopt a specific workflow.
3. The data science workflow
Problem Statement. Establish a well defined question.
Data collection and storage. Fist we collect data from many sources, such as surveys and financial transactions and then, we store that data in a safe and accessible way.
Data preparation. This step includes the cleaning, for instance finding missing or duplicate values and converting data into a more organized format.
Exploration and visualization. This can involve building dashboards, track how the data changes over time or performing comparisons.
Experimentation and predictions. This could include building forecasting systems and trying different machine learning algorithms in order to compare their effectiveness on the data.