Purpose-built AI Solutions | Adam Selipsky

"AWS provides a centralized hub for innovation and collaboration on a global level, connecting you with the data and machine learning tools you need, and partners you can trust, all while keeping health and life sciences data secure and private"

Adam Selipsky



Organizations in the heavily-regulated healthcare and life sciences industries – from biopharmas to healthtechs to providers and payors – need to accelerate time to diagnosis and insights, increase the pace of innovation, and bring differentiated therapeutics to market faster with an end-to-end data strategy. AWS provides a centralized hub for innovation and collaboration on a global level, connecting you with the data and machine learning tools you need, and partners you can trust, all while keeping health and life sciences data secure and private.

AWS Health Data Portfolio aligns purpose-built AWS Services and AWS Partner solutions to business needs, ranging from secure data transfer, aggregation, and storage to data analytics, collaboration, sharing, and governance. With generative AI and purpose-built machine learning services, you can easily integrate cutting-edge technologies into your existing workflows to accelerate innovations and fuel new discoveries.

Traditional assessments of disease progression and intervention efficacy used in clinical trials are often limited. These assessments normally consist of clinical outcome assessments (COAs) and patient reported outcomes (PROs). COAs are physician-derived, often requiring patients to travel to a clinical site. They are typically subjective, episodic in nature, and provide limited insight into the fluctuations of symptoms experienced outside the clinic. PROs are multi-question surveys taken by patients, usually in their home environment. While PROs provide some insight into symptom severity from the patient’s perspective outside the clinic, they often suffer from recall bias, are impacted by mood or suggestion, and are qualitative in nature. Therefore, there is a need for more objective, continuous assessments of symptom severity outside the clinic to complement current assessment tools used today.

Wearable sensing technology has enabled many new approaches for objective, continuous assessments of patients during their daily lives. Over the past decade, researchers have leveraged wearable sensors to create digital biomarkers aimed at extracting clinically meaningful data about patients outside the clinic environment. Digital biomarkers are defined as objective, quantifiable physiological and behavioral data collected and measured using digital devices (e.g., smartphones and wearable devices). Recently, efforts have been made to use these digital biomarkers to measure disease progression and efficacy of interventions in clinical trials.

The Digital Medicine and Translational Imaging (DMTI) group at Pfizer (part of Early Clinical Development within Worldwide Research, Development, and Medical) is developing and deploying digital biomarkers, derived from data captured using wearable devices in several large clinical trials. Participants in these trials wear devices containing accelerometers and near-body temperature sensors for a pre-determined monitoring period. After completing the monitoring period, participants return the devices to their respective clinical sites where the sensor data from each device is extracted and sent to Pfizer for processing.

The Pfizer DMTI team needed an efficient, scalable, and automated way to run their custom-built digital biomarkers (comprising of machine learning and heuristic, rule-based algorithms, all built in Python) on participant sensor data, and looked to AWS to build a solution. The following requirements had to be met when designing and building this solution:

  • Scalability – The application must be scalable (Sample size ranging from dozens of participants (Phase 0/I) to thousands of participants (Phase III/IV)).
  • Flexibility – The system should allow for ease of swapping in and out different digital biomarker solutions based on the respective use case of interest.
  • Reproducibility – The digital biomarker solutions need to be immutable as they are run within the application to ensure consistent results.
  • Quality/Security – The system must adhere to key regulatory standards and research practices pertaining to data privacy and security. This includes, as applicable, GxP and FDA Title 21 CFR Part 11 compliance, among others.

They leveraged AWS CloudFormation to provision and configure all resources for this architecture. This provides the ability to easily add and remove services while going through the development phase, as well as version controlling the infrastructure-as-code in Pfizer’s local GitHub. Adding CloudFormation templates to version control was an integral part of the change management solution deployment and audit documentation.

AWS Command Line Interface (CLI) was used to upload new versions of the CloudFormation template as they iterated. (For instructions, see Installing AWS CLI and Configuring the AWS CLI). Using CLI and bash scripting removes manual steps, which reduces the time to deploy and potential for human error.

By using Docker, the Pfizer team can treat each digital biomarker solution as a module that can be easily swapped in and out based on the use case. With ECS and Docker, they can generalize the pipeline and swap the digital biomarkers as needed, allowing them to scale the pattern beyond this use case. The digital biomarker solution code and all of its dependencies are installed in the Docker file. The Docker file pulls directly from private GitHub repositories with pointers to tagged releases, ensuring proper versioning when deploying. (This requires having GitHub SSH setup; see Setting up GitHub SSH). This secure integration with GitHub protects any proprietary code or data while still allowing the automated retrieval and installation of all necessary software by the pipeline.

Using an iterative development strategy, the Pfizer DMTI team and AWS broke down each feature development process into small work packages with small sample datasets to test each new feature, with the goal of having a working solution after each iteration. This let them quickly isolate and fix errors, progressing forward in a methodical but fast manner.

Using Docker along with AWS ECS/ECR helped achieve target goals of flexibility and reproducibility. Having each digital biomarker solution wrapped in a Docker image with standardized input and output designs let the Pfizer DMTI team easily swap in and out different digital biomarker solutions. This approach also met reproducibility requirements: digital biomarker solutions were completely locked down, ensuring that the digital biomarker solution was run in the same manner every time the application ran.

With the use of event-based architecture powered by Lambda and ECS, results could be automatically processed. This lets the Pfizer DMTI team scale from small-scale studies of hundreds of participants to large-scale studies of thousands of participants without changing the architecture. Additionally, there would not be idle resources in between studies.

Lastly, as the goal was to process real clinical trial data, the team needed to meet various security, quality, and compliance standards. By following AWS best practices and the AWS Shared Security Model, it was easy to implement least privilege (users only access resources necessary for users’ purpose) within the application and meet security goals. Partnering with Pfizer’s Cloud Platform Team, who qualified AWS services for GxP compliance, ensured a path to validate the architecture for GxP validation. CloudWatch and CloudTrail were instrumental in capturing all events happening in the pipeline, satisfying key audit requirements around user roles and error monitoring.

Conclusions & next steps

With cloud computing on AWS, the Pfizer team was able to run digital biomarker solutions at scale across clinical trials with ease. Here are the three key benefits they found when using AWS cloud computing for this task:

  • High quality – By using services like Docker with Amazon ECS and Amazon ECR, they were able to maintain reproducibility, which is essential in regulated environments. Also, they were able to meet key regulatory standards and research best practices with this system.
  • Time saving – Using AWS robust parallel processing capabilities enables significantly faster processing times. For example, they processed ~36,000 hours of data in about 20 minutes, which is significantly faster than the seven to 12-day run-time estimated for on-premises processing.
  • Cost saving – AWS has pay-as-you-go pricing models, so processing tasks cost far less than managing internal servers and personnel to handle the processing loads.

By using AWS services, the Pfizer team could make updates quickly after the initial deployment. Next steps include expanding the number of digital biomarker solutions that are supported today and integrating services like AWS Step Functions so that they can further simplify the architecture.