We are seeking a talented Junior Data Engineer to join our expanding engineering team. The ideal candidate for this position combines strong Python and SQL programming skills with a passion for transforming raw data into valuable insights for business use cases. This role focuses on client-facing responsibilities, collaborating closely with customer success, operations, and data scientists to deliver actionable data solutions.
Primary Responsibilities
- Writing efficient server-side Python code, leveraging the Pandas and PySpark DataFrame APIs for scalable data transformations and aggregations.
- Designing, developing, testing, and maintaining scalable data pipelines to support data aggregation, cleansing, processing, and validation tasks.
- Automating data onboarding and analysis processes to enhance data variety and velocity.
- Integrating client data with various public and private sources.
- Ensuring data integrity and security throughout project lifecycles.
- Exploring data to find potential data quality issues or concerns throughout the data lifecycles.
Professional Requirements
- Bachelor's degree in a relevant field like Data Engineering, Computer Science, Data Science, Math, Statistics, Information Systems or 1-2 years of relevant work experience.
- Strong knowledge of Python, with a focus on Pandas and/or PySpark.
- Proficiency in SQL for data analysis and manipulation, with experience in relational databases, preferably PostgreSQL.
- Experience using Git for version control and repository management.
- Strong problem-solving skills and a proactive, can-do mentality, with a demonstrated ability to work independently in a fast-paced environment and manage multiple concurrent projects.
- Excellent communication skills with the ability to collaborate effectively in cross-functional teams and engage in constructive dialogue to find optimal solutions.
- Keen attention to detail and adaptability, with a willingness to learn new technologies and a critical approach to ensuring data quality.
- Authorized to work in the United States.
Nice to Haves
- Familiarity with utilizing parquet datasets, experience with PyArrow is a plus.
- Familiarity with aggregating in Spark or PySpark
- Basic familiarity with AWS cloud services and a general understanding of their core functionalities
- A working knowledge of common machine learning tools and techniques