Description:
The Data Platform team builds and manages reusable components and architectures designed to make it both fast and easy to build robust, scalable, production-grade data products and services in the challenging biomedical data space.
In this role, you will
- Be a technical individual contributor, building modern, cloud-native systems for standardizing and templatizing data engineering such as:
- Standardized physical storage and search / indexing systems
- Schema management (data + metadata + versioning + provenance + governance)
- API semantics and ontology management
- Standard API architectures
- Kafka + standard streaming semantics
- Standard components for publishing data to file-based, relational, and other sorts of data stores
- Metadata systems
- Tooling for QA / evaluation
- Know the metrics desired for your tools and services and iterate to deliver and improve on those metrics in an agile fashion.
- Given a well-specified data framework problem, implement end-to-end solutions using appropriate programming languages (e.g. python, Java, scala, bash), open-source tools (e.g. Spark, Elasticsearch, ...), and cloud vendor-provided tools (e.g. AWS boto3, gcloud cli)
- Leverage tools provided by Tech (e.g. infrastructure as code, cloud Ops, DevOps, logging / alerting, ...) in delivery of solutions
- Write proper documentation in code as well as in wikis/other documentation systems
- Write fantastic code along with proper unit, functional, and integration tests for code and services to ensure quality
- Stay up-to-date with developments in the open-source community around data engineering, data science, and similar tooling