What is Data Science what are the tasks involved and why so much confusion

 

What exactly is the Data Science?


Before Answering to this question, it is very important to understand why so much confusion around it.

Why are there multiple confusing in Definitions?

One reason is an assortment of several tasks, these several tasks involved in the data science pipeline. Also important of task depend on organization to organization or application to application. Because of different level of distribution of task or priority of task so there lot confusion around it. People are doing some part of data science tasks so different opinion and definition comes from different sources.

What are Task involved in Data Science




1)      Collect the Data

2)      Store the Data

3)      Process the Data

4)      Describe the Data

5)      Model the Data

 

Data Science is the science of collecting, storing, processing, describing and modeling the data

1)     Collecting Data

What is involved in the Data Collection?

It depends on question a data scientist is trying to answer and depend on the environment in which the data scientist is working

 

Example 1: A Data Scientist working on e-commerce Company. Many cases E-commerce company have lot of data about their customers and data rich organization. In this case company has data and data scientist not required actually to collect the data and store it. Just need process, describe and model the data. Data scientist to write SQL Query to access the data. Also may required write Python or Java Code to impended the SQL Query on it

 

One Question data scientist interesting in this context is which items do customers buy Together?


Example 2: Data Scientist working on political Party

 

Government implemented new policy.

 

What are people say about new policy? is it bad or good, people like the policy or not, different opinion about policy, etc..

 

In this case Data exist, but people discussing about it in different social media platforms or public forum, but data not owned by government or us and not stored in structured way

 

In this case Data Scientist has some hacking skill to scroll the data from different source of web. So basic knowledge of python or java or any programming language

 

Example 3: Data Scientist working with Farmers

 

Effect of type of seed, fertilizer, irrigation on yield?

In this case, data is not available within the organization and not readily available on public. So Situation is really different. So Now you to design experiment and collect the data

For example some particular seed not giving good yield, does that mean that the seed is bad or irrigation method is bad or with fertilizer. Now you will design the experiment about all of these effects. We will take a piece of land and divided into 9 different parts. One part I will use one combination of seeds, another combination irrigation method and another combination of fertilizer and so on and so forth. Based on these experimental enough sample data and we can find insight on this and find which method is good for good yield. In this we need some statistical knowledge to draw diagram based hypothesis testing and analysis of variance

So what are the skills required to collect the data

Intermediate level of programming

Knowledge of database

Knowledge of Statistics

2)      Store the Data:

Storing the data in relational database like customer data, employee data, product inventory data etc..  Also companies are storing data in multiple databases. So warehouse database is used in this to store the data from multiple databases for analytics purpose

Unstructured data like text, image, videos and speech. Big-Data Data lakes used to stored unstructured data, semi structured and structured data

3)      Processing Data

Extract the data from different sources, transform and clean the data which required for the data science project and load the data

4)      Describe the Data

After loading the clean data, Visualizing Data and summarize the data with plots. Bar graph, group bar graph etc..

Example: Sales with mobile phones or TV for past 3 years with graph for visualization to easy to understand and communicate

5)      Model the Data

Statistical Modeling: - 

Underlying data distribution like normal distribution and linear distribution

In statistical modeling, we assumed simple models which allowed robust statistical analysis

Example: Reading about patient blood sugar level, height and weight etc..

Algorithmic Modeling: -

 A model represents what was learned by a machine learning algorithm. The model is the “thing” that is saved after running a machine learning algorithm on training data and represents the rules, numbers, and any other algorithm-specific data structures required to make predictions

appsdbahelp

17+ years of experience in Oracle Database, Oracle Cloud Infrastructure(OCI), Oracle EBS on Cloud, Oracle E-Business Suite, DevOps tools, Oracle WebLogic, Oracle Application Server, Oracle Access Manager and various Operating System flavors including Redhat Linux, UNIX (Solaris, HP-UX) and Windows. Expert in Oracle9i/10g/11g/12c/19c database administration, upgrade, configuration and tuning. Experience in Oracle E-Business Suite technological stack, including architecture, installation, configuration, maintenance, tuning, cloning and patching procedures. Expert in Oracle Cloud Infrastructure(OCI), Oracle EBS On Cloud and Oracle EBS Cloud Manager Experience with Oracle Cloud Solution and Expert of Oracle ERP/Oracle HCM Cloud deployment Experience in Terraform, JSON and chef cloud infrastructure automation framework Knowledge of ASM, Data Guard, Real Application Cluster, Exadata and Exalogic Knowledge of Oracle Enterprise Manager(OEM) Grid Control, Oracle WebLogic, Oracle Internet Directory, Oracle Access Manager and Apache Ability to analyze problem, develops solutions and bring program/project execution to completion.

Post a Comment

Previous Post Next Post