Introduction to Data Collection

 3 min read

YouTube video ID: cd_jj0IRmaA

Source: YouTube video by Alex The AnalystWatch original video

PDF

Data collection is the process of gathering data from different data sources to use in analysis decisionmaking and problem solving. It brings together raw data so that teams can analyze, make decisions, and solve problems based on information rather than guesswork. The process relies on knowing where data is created and how it will be used downstream.

Data Sources

Data sources are the places where data is created and stored, and they vary by context and system. Examples include EHR systems, bank accounts, APIs, websites, and CSV files. Knowing the specific sources is essential for designing how to collect and integrate the data.

Importance of Data Collection

Good data collection ensures the raw material needed for informed decisions and helps identify trends, patterns, and opportunities. It also lays the foundation for data quality. As the speaker emphasizes, "if you collect data poorly or if you do not process that data correctly that can lead to bad data which can give you incorrect results and then of course you're going to make bad decisions with bad data."

Data Collection Systems

Data collection does not just happen; it doesn't just magically appear—this is a very calculated and specific process that needs to happen in order for you to get that data. Systems are set up to collect and place data into databases for analysis, and those systems can be manually created or purchased as paid solutions. An example given is tracking customer cart additions versus purchases for an online shop to measure behavior and outcomes.

Data Pipelines (ETL)

A data pipeline automates the movement of data from one place to another while often transforming it along the way. ETL stands for Extract, Transform, Load and describes stages where raw data is pulled from sources, placed into a staging area, transformed to be more usable, and then loaded into a data warehouse, database, or even an Excel file. Data engineers, developers, analysts, and data scientists are among the roles that determine and perform the transformations needed for analysis.

ETL StageWhat happens
ExtractRaw data is taken from various sources such as parts of a website or other locations.
TransformData is placed in a staging area and then altered to become more usable for analysis.
LoadTransformed data is moved into a data warehouse, database, or Excel for use.

Continuous Data Collection

Data collection is not a one-time thing; data collection is always happening. ETL pipelines can break or need updates, and source data can change, requiring fixes to pipelines. This is a very active process that typically happens one time then is adapted or changed and often fixed over time.

Practical Roles and Experience

Data professionals are involved across all stages: extracting, staging, transforming, and loading data. The speaker noted experience on a data collection team for over 3 years and described participating in all steps, including helping clients understand how to collect new types of data. This hands-on involvement illustrates that collecting and maintaining data systems requires ongoing attention and collaboration.

  Takeaways

  • Data collection is the process of gathering data from different data sources to use in analysis decisionmaking and problem solving.
  • Effective data collection supplies the raw material for informed decisions, helps identify trends and establishes data quality foundations.
  • Data collection systems are calculated processes that can be built manually or obtained as paid systems to place data into databases for analysis.
  • ETL pipelines automate data movement and transformation through Extract, Transform, and Load stages, involving engineers, developers, analysts, and scientists.
  • Data collection is continuous: pipelines can break, sources can change, and the process requires adaptation and maintenance over time.

Frequently Asked Questions

Who is Alex The Analyst on YouTube?

Alex The Analyst is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF