Data Engineering
The process of designing, building, and managing systems and workflows that move and transform raw data into usable insights.
ETL & ELT
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are core workflows for preparing data for analysis or storage.
Data Pipelines
Automated series of steps to ingest, process, validate, and store data. Tools include Apache Airflow, dbt, and Dagster.
Batch vs Streaming
Batch processing handles large volumes of data periodically, while streaming deals with real-time or near-real-time data flow.
Data Lakes & Warehouses
Data Lakes store raw, unstructured data; Data Warehouses store structured, query-optimized data for BI & reporting.
Schema Design
Designing efficient data models and table structures for OLAP and OLTP workloads. Includes star/snowflake schemas.
Data Quality
Ensuring accuracy, completeness, and consistency of data using tools like Great Expectations, Deequ, and custom checks.
Orchestration Tools
Manage workflow execution across stages using tools like Apache Airflow, Prefect, and Dagster.
Cloud Data Platforms
Platforms like AWS Redshift, Google BigQuery, Azure Synapse offer scalable and serverless data infrastructure.
Data APIs
APIs provide real-time access and programmatic interaction with data from services, databases, or external systems.
Data Governance
Managing data availability, usability, integrity, and security across the lifecycle. Includes cataloging and access control.
Data Lineage
Tracking the origin and flow of data across systems, transformations, and pipelines to ensure traceability and compliance.