As a Data Engineer, you will be responsible for all things related to the collection, extraction, transformation, and correlation of business data across the Subsplash platform. You will report to the Site Reliability Engineering Manager. You will be an expert on our production data sources, and how to administer and tune data systems to optimize for performance. You will also work regularly with our data warehousing/data lake environments to help provide our business analysis and intelligence team with data marts. Data collection, extraction, and transformation is often achieved through Python code development and maintenance. In a remote-first, distributed team environment, you will work well with other team members to deliver working data engineering solutions early and often.
Top 3 Key Outcomes in Year 1
- Work with the lead Data Engineer to assume primary responsibilities for operating and monitoring Extract-Load-Transform (ELT) data pipelines, routine ELT pipeline add/change tasks, and continual improvement. Success in this outcome will be measured by the capacity freed up for the senior data Engineer.
- Serve as a point of escalation for any questions related to SQL query performance or optimization of Subsplash data systems. This may include analysis of slow query reports and execution plans or other questions from software engineers.
- Enable product teams, business analysis teams, and other stakeholders to integrate data into the Snowflake data warehouse system, and then access and analyze their data via Sigma, and/or Tableau.
- Work with Data Engineering and Site Reliability Engineering (SRE) to continually improve observability and proactive alerting for Extract-Load-Transform (ELT) data pipelines.
Key Responsibilities:
- Operate and maintain Subsplash data warehousing environment, consisting of DBT, Python, Terraform, AWS DMS, Snowpipe, and related ELT tools, functioning in an AWS Kubernetes infrastructure, and maintaining in GitLab SCM and CI/CD.
- Ensure PII and other sensitive data are properly handled within and while being transformed and brought into the data warehouse
- Build and maintain our ETL pipelines from production data stores into the data warehouse
- Monitor and optimize production data stores
- Assist in building and maintaining data visualizations, both internal-facing and customer-facing
- Collaborate with business analysts, product managers, and software engineers to build and verify hypotheses related to business intelligence
Qualifications
- 2+ years of experience as a Data Engineer or in a similar role
- Experience with data modeling, data warehousing, and building ETL pipelines
- Extremely comfortable with SQL
- Excellent analytical abilities
- Comfortable with ambiguity in requirements and able to be a self-starter
- Excellence communication (verbal and written) and interpersonal skills -- the ability to communicate with both business and technical teams
- Experience working with Snowflake or similar data platforms (i.e. AWS Redshift, BigQuery)
- Strong knowledge of relational databases (e.g. MySQL, MariaDB, PostgreSQL, Aurora) and document-oriented databases (e.g. MongoDB, DynamoDB)
- Strong organizational skills and the ability to learn new technologies quickly
Preferred Qualifications
- Knowledge of a programming language (Go, Python, Javascript)
- Familiar with ELT tools such as DBT, Fivetran, Meltano
- Data Science experience (e.g. machine learning, artificial intelligence)
- Bachelor's degree in Computer Science, Mathematics, Statistics, or a related field