Data Engineer Internship
Join our data engineering team to learn the fundamentals of data management by assisting with database maintenance, data cleaning, and basic ETL processes. Support senior engineers in building simple data pipelines and maintaining data quality.
Who We're Looking For
We're seeking technically-minded students passionate about building large-scale data infrastructure. This internship is ideal for those who enjoy solving complex engineering problems, optimizing system performance, and designing reliable data pipelines that support mission-critical AI applications across multiple industries.
What You'll Do
As a Data Engineer Intern, you will work on production-grade data infrastructure that powers our clients' AI initiatives:
Data Pipeline Development
- • Design and implement scalable ETL/ELT pipelines for financial, healthcare, and insurance data
- • Build real-time streaming data pipelines using Apache Kafka, Spark Streaming, and cloud services
- • Develop data quality monitoring and automated validation frameworks
Cloud Infrastructure & Architecture
- • Deploy and manage data infrastructure on AWS, Azure, and Google Cloud platforms
- • Implement Infrastructure as Code using Terraform, CloudFormation, and Kubernetes
- • Optimize cloud costs and performance for large-scale data processing workloads
Data Platform Engineering
- • Build and maintain data warehouses using Snowflake, BigQuery, and Redshift
- • Implement data lake architectures with Delta Lake, Apache Iceberg, and Parquet formats
- • Develop self-service data platforms and APIs for ML model consumption
DevOps & Data Operations
- • Implement CI/CD pipelines for data pipeline deployment and testing
- • Set up monitoring, alerting, and observability for data infrastructure
- • Collaborate with ML teams to support model training and serving infrastructure
What's Required
We're looking for candidates who:
Education & Background
- • Are pursuing a degree in Computer Science, Software Engineering, Data Engineering, or related field
- • Have strong programming fundamentals and system design understanding
- • Demonstrate problem-solving skills and attention to detail
- • Show passion for building scalable, reliable systems
Technical Skills
- • Proficiency in Python, SQL, and at least one other programming language (Java, Scala, Go)
- • Experience with data processing frameworks (Apache Spark, Pandas, Dask)
- • Knowledge of cloud platforms (AWS, Azure, GCP) and containerization (Docker, Kubernetes)
- • Understanding of database systems, both SQL and NoSQL
Preferred Qualifications:
- • Experience with workflow orchestration tools (Apache Airflow, Prefect, Dagster)
- • Knowledge of streaming technologies (Kafka, Pulsar, Kinesis)
- • Familiarity with Infrastructure as Code (Terraform, CloudFormation)
- • Understanding of data modeling, warehousing, and lake architectures
Technologies You'll Work With
Our data engineering stack leverages industry-leading technologies to handle enterprise-scale data processing:
Programming & Frameworks
- • Python, Java, Scala
- • Apache Spark, PySpark
- • Pandas, Dask, Ray
- • FastAPI, Flask
Cloud & Infrastructure
- • AWS, Azure, Google Cloud
- • Kubernetes, Docker
- • Terraform, CloudFormation
- • Serverless computing
Data Storage & Processing
- • Snowflake, BigQuery, Redshift
- • Delta Lake, Apache Iceberg
- • PostgreSQL, MongoDB
- • Redis, Elasticsearch
Streaming & Orchestration
- • Apache Kafka, Kinesis
- • Apache Airflow, Prefect
- • dbt, Great Expectations
- • Prometheus, Grafana
Learning & Development Opportunities
Our Data Engineer Internship provides comprehensive exposure to modern data engineering practices and technologies:
Technical Skills Development
- • Advanced SQL optimization and database performance tuning
- • Distributed computing and parallel processing techniques
- • Cloud-native architecture and microservices design
- • Data security, privacy, and compliance best practices
Industry Knowledge
- • Financial data processing and regulatory requirements
- • Healthcare data standards (HIPAA, HL7, FHIR)
- • Insurance claims processing and risk modeling data
- • Real-time data streaming for AI/ML applications
Who We Are
Gateway is a leading data science and machine learning consulting company specializing in enterprise AI solutions. Our data engineering team builds the foundational infrastructure that enables AI transformation across banking, insurance, asset management, and pharmaceutical industries.
We design and implement data platforms that handle petabytes of enterprise data, ensuring reliability, scalability, and security. Our engineers work with cutting-edge technologies to solve complex data challenges and enable real-time decision making for Fortune 500 companies.
We're an Equal Opportunity Employer
Gateway is committed to building an inclusive workplace. We encourage applications from candidates of all backgrounds.