Data Engineer Internship

Join our data engineering team to learn the fundamentals of data management by assisting with database maintenance, data cleaning, and basic ETL processes. Support senior engineers in building simple data pipelines and maintaining data quality.

Who We're Looking For

We're seeking technically-minded students passionate about building large-scale data infrastructure. This internship is ideal for those who enjoy solving complex engineering problems, optimizing system performance, and designing reliable data pipelines that support mission-critical AI applications across multiple industries.

What You'll Do

As a Data Engineer Intern, you will work on production-grade data infrastructure that powers our clients' AI initiatives:

Data Pipeline Development

• Design and implement scalable ETL/ELT pipelines for financial, healthcare, and insurance data
• Build real-time streaming data pipelines using Apache Kafka, Spark Streaming, and cloud services
• Develop data quality monitoring and automated validation frameworks

Cloud Infrastructure & Architecture

• Deploy and manage data infrastructure on AWS, Azure, and Google Cloud platforms
• Implement Infrastructure as Code using Terraform, CloudFormation, and Kubernetes
• Optimize cloud costs and performance for large-scale data processing workloads

Data Platform Engineering

• Build and maintain data warehouses using Snowflake, BigQuery, and Redshift
• Implement data lake architectures with Delta Lake, Apache Iceberg, and Parquet formats
• Develop self-service data platforms and APIs for ML model consumption

DevOps & Data Operations

• Implement CI/CD pipelines for data pipeline deployment and testing
• Set up monitoring, alerting, and observability for data infrastructure
• Collaborate with ML teams to support model training and serving infrastructure

What's Required

We're looking for candidates who:

Education & Background

• Are pursuing a degree in Computer Science, Software Engineering, Data Engineering, or related field
• Have strong programming fundamentals and system design understanding
• Demonstrate problem-solving skills and attention to detail
• Show passion for building scalable, reliable systems

Technical Skills

• Proficiency in Python, SQL, and at least one other programming language (Java, Scala, Go)
• Experience with data processing frameworks (Apache Spark, Pandas, Dask)
• Knowledge of cloud platforms (AWS, Azure, GCP) and containerization (Docker, Kubernetes)
• Understanding of database systems, both SQL and NoSQL

Preferred Qualifications:

• Experience with workflow orchestration tools (Apache Airflow, Prefect, Dagster)
• Knowledge of streaming technologies (Kafka, Pulsar, Kinesis)
• Familiarity with Infrastructure as Code (Terraform, CloudFormation)
• Understanding of data modeling, warehousing, and lake architectures

Technologies You'll Work With

Our data engineering stack leverages industry-leading technologies to handle enterprise-scale data processing:

Programming & Frameworks

• Python, Java, Scala
• Apache Spark, PySpark
• Pandas, Dask, Ray
• FastAPI, Flask

Cloud & Infrastructure

• AWS, Azure, Google Cloud
• Kubernetes, Docker
• Terraform, CloudFormation
• Serverless computing

Data Storage & Processing

• Snowflake, BigQuery, Redshift
• Delta Lake, Apache Iceberg
• PostgreSQL, MongoDB
• Redis, Elasticsearch

Streaming & Orchestration

• Apache Kafka, Kinesis
• Apache Airflow, Prefect
• dbt, Great Expectations
• Prometheus, Grafana

Learning & Development Opportunities

Our Data Engineer Internship provides comprehensive exposure to modern data engineering practices and technologies:

Technical Skills Development

• Advanced SQL optimization and database performance tuning
• Distributed computing and parallel processing techniques
• Cloud-native architecture and microservices design
• Data security, privacy, and compliance best practices

Industry Knowledge

• Financial data processing and regulatory requirements
• Healthcare data standards (HIPAA, HL7, FHIR)
• Insurance claims processing and risk modeling data
• Real-time data streaming for AI/ML applications

Who We Are

Gateway is a leading data science and machine learning consulting company specializing in enterprise AI solutions. Our data engineering team builds the foundational infrastructure that enables AI transformation across banking, insurance, asset management, and pharmaceutical industries.

We design and implement data platforms that handle petabytes of enterprise data, ensuring reliability, scalability, and security. Our engineers work with cutting-edge technologies to solve complex data challenges and enable real-time decision making for Fortune 500 companies.

We're an Equal Opportunity Employer

Gateway is committed to building an inclusive workplace. We encourage applications from candidates of all backgrounds.