Ananya A. Joshi

  • βœ‰οΈ : aa(last_name) at andrew dot cmu dot edu
  • CV

About me: I work on deployable and interpretable solutions for data-intensive research problems. In my Ph.D. thesis research, I developed a human-in-the-loop framework and related methods that are used daily by public health stakeholders to identify and diagnose data events from large volumes of public health data.

My expertise is in applied AI, data science, and computer systems. I also have extensive experience working directly with line-level and aggregated public health data. Outside of that, I have experience across the stack (e.g. research in caching, building NiChrome) and in applications like urban data, protein folding, and LLMs (automated prompt engineering and fine tuning).

[βž• New] I'm starting to look for post-Ph.D. opportunities. Let's chat!

Personal Website Picture


Current Roles


Carnegie Mellon University
Computer Science Ph.D. Student:
Aug 2020-exp. Feb 2025
The Delphi Group at Carnegie Mellon University
Lead and Project Manager for the FlaSH Project
Jan 2023-Present

Educational History


Recent Projects

* indicates first/co-first author

πŸ“Œ Thesis Proposal

December 8th 2023
Chairs: Roni Rosenfeld (CMU) and Bryan Wilder (CMU), Rayid Ghani (CMU), Matt Biggerstaff (Centers for Disease Control and Prevention, Flu Division)

Growing volumes of public health-related data render existing techniques for identifying changes in disease dynamics and data quality assurance, designed for smaller data volumes, obsolete. Accordingly, my thesis presents a practical framework for experts to monitor large-scale aggregate public health data. Our novel methods, which are simple, scalable, and shown to be accurate in real-world settings, identify data corresponding to quality issues or changes in disease dynamics. Coupled with our custom user interfaces, these methods have led to a 53x increase in monitoring efficiency for data experts at the Delphi Group at Carnegie Mellon University, who can now detect about 200 noteworthy data issues from 15 million new data points each week. As a final step, we are building communication pipelines to disseminate this expert-analyzed data to public health stakeholders, including state and local health departments, thereby enabling actionable intelligence from Delphi's vast volume of public health data.

[Slides Here][YouTube Playlist Here]
πŸ“Œ Enabling Steering Using Sparse Autoencoders

Jul 2024-Sep 2024
Ananya Joshi*, Celia Cintas, Skyler Speakman
IBM Nairobi, Ph.D. Research Intern, Human-Centered AI and Trustworthy ML Group

Given how well Sparse Autoencoders (SAEs) can steer text toward particular topics (e.g. [Golden Gate Bridge]) I developed methods that also enable SAEs to: Our experiments show how this approach has unique advantages over current fine-tuning applications.
πŸ“£ [Open-sourced from IBM]
βž• Manuscript in submission!
πŸ“Œ Outlier Ranking for Large Scale Public Health Data

Ananya Joshi*, Tina Townes, Nolan Gormley, Luke Neurieter, Roni Rosenfeld, Bryan Wilder
DELPHI Group, Carnegie Mellon University

In this paper, we developed and deployed new human-in-the-loop machine learning algorithms to identify and rank outliers from large volumes of geospatial-temporal public health streams that enable reviewers to identify noteworthy data 9.1x faster. [Code][ArXiV]
πŸ“£ Accepted to appear at AAAI 2024 Presentation + Poster
πŸ“£ Accepted at the Addressing Socioethical Effects of AI Workshop at AAAI 2024

πŸ“Œ Computationally Assisted Quality Control for Public Health Data Streams

Ananya Joshi*, Katie Mazaitis, Roni Rosenfeld, Bryan Wilder
DELPHI Group, Carnegie Mellon University

In this paper, we: [ArXiV][Code]
πŸ“£ Presentation to the Flu Division at the CDC
πŸ“£ IJCAI-23. International Joint Conference on Artificial Intelligence. 2023. Oral Presentation + Poster + Paper
πŸ“£ Speaking Skills Presentation at Carnegie Mellon University

πŸ“Œ BuzzNet: Data Driven Approaches to Preventing Vector-Borne Diseases in Singapore

Ananya Joshi*, Clayton Miller
Fulbright Urban Planning & Sustainable Design Award, Singapore, National University of Singapore

Towards a sensor-based feedback mechanism to prevent mosquito breeding grounds in Singapore, I developed a suite of micro/macro modules (including deep-learning models that use audio, visual, and spatiotemporal input data) designed to work together and assist users in designing urban spaces that are unappealing to mosquitoes. πŸ“„ Fulbright Blog [U.S. Embassy]
πŸ“„ Fulbright Blog [NUS]
πŸ“„ Essay in Fulbright in a Time of COVID: Essays by US Fulbrighters in Asia, 2019-2020
πŸ“„ Datathon Overview [NUS], 2nd place team
πŸ“„ Presentation at the College of Alice and Peter Tan for Master's Tea

πŸ“Œ Creating an Automated Ideological Transformer Using Moral Reframing [GPT2]

Ananya Joshi*, Christiane Fellbaum, Michael Guerzhoy
Princeton University

πŸ“„ Princeton Undergraduate Senior Thesis in Computer Science (A+)
🌟 Sigma Xi Award for Outstanding Undergraduate Research.
🌟 Funding from the Peter and Rosalind Friedland Endowed Senior Thesis Fund
🌟 Funding from the Wilson Senior Thesis Fund

Engineering, Systems & Data Science Projects:

πŸ“„ Cooperative Rule Caching for SDN Switches Ori Rottenstreich, Ariel Kulik, Ananya Joshi, Jennifer Rexford, GΓ‘bor RΓ©tvΓ‘ri, Daniel MenaschΓ©, 2020 IEEE 9th International Conference on Cloud Networking (CloudNet).
πŸ“„ Data Plane Cooperative Caching With Dependencies Ori Rottenstreich, Ariel Kulik, Ananya Joshi, Jennifer Rexford, GΓ‘bor RΓ©tvΓ‘ri, Daniel MenaschΓ©, 2020 IEEE Transactions on Network and Service Management.
πŸ“„ An open repository of real-time COVID-19 indicators Alex Reinhart, Logan Brooks… Ananya Joshi …, Roni Rosenfeld, Ryan J. Tibshirani, Proceedings of the National Academy of Sciences.

πŸ› οΈ NiChrome, Google : Intern project with Anthony Rolland, advised by Ron Minnich and Christopher Koch
πŸ› οΈ COVIDCast Engineering, Delphi: Contributor since 2020.
πŸ› οΈ Cases2Beds Project lead for a tool to provision hospital beds, developed in collaboration with the Allegheny County Health Department and shared with several other county public health departments during the COVID-19 Pandemic. Presented on 01/08/2020 at the COVID-19 Trends and Impact Surveys Data Users Meeting.
πŸ› οΈ Research at MIT Lincoln Laboratory on Probabiltiy of Cloud Free Line of Sight
πŸ“„ Urban Data Mining in Switzerland, ETH Zurich

Prepared Lectures

πŸ“• Time Series Data: Clarifying Practical Approaches
This was a 80-minute active-learning lecture for students in Machine Learning in Practice. In a post-class anonymous survey (85% completion rate), 83% found the material to be at the right level (1 student found it too easy and 1 found it too difficult) and, across all tasks, identified in the learning objectives, students reported being able to increase their skills (e.g. from being unfamiliar with the task, being able to define the task, being able to classify tasks that belong to the component, knowing at least 2 ways to approach the task, and being familiar with some technical insights/nuances of the tasks).

πŸ“• AI for Social Good in Public Health which used a follow-along Colab notebook to demonstrate statistical properties of data streams like nonstationarity, noisiness, and weekday effects and walkthrough methods to process data with these properties.

Please contact me if you would like to use these activities!

πŸ“• MLCommons Medical Working Group Presentation: Monitoring for Health Events: Bridging Healthcare and Public Health Approaches.
Motivated by the goal of enhancing collaboration between these fields, the lecture proposes a foundational step in scoping opportunities for integrating machine learning orchestration in healthcare and public health. Details available upon request.

Additional Highlights

🌟 Best Paper Award (SIGCSE β€˜23) [Group Award]
🌟 Carnegie Mellon University Graduate Student Service Award 2022 [Group Award]

🌟 Co-Organizer of the Delta Workshop for drift phenomena at KDD β€˜24! Link
🌟 Presenting at KDD Doctoral Consortium 2024

🌟 Presented a talk at CMU Artificial Intelligence Seminar Series (Apr 2024).
🌟 Presented a poster at the first annual InsightNet meeting (Apr 2024).

● Co-Instructor for 15-996 (Spring 2022/23) Article & TA for Machine Learning in Practice (Spring 2023)
● Completed the Eberly Center’s Future Faculty Program
● Selected Courses: Grad AI (A+), Mobile & Pervasive Computing [IoT] (A+)
● Partial travel grant to KDD, fee waiver for NU’s Future Faculty Workshop.

● Joined the Council of State and Territorial Epidemiologists’ (CSTE) Peer-to-Peer Technical Assistance network as a mentor for the Pennsylvania Dept. of Health and Santa Clara County Public Health Department. On a biweekly cadence I prepare lectures and hand-on Colab notebooks for concepts relevant to mentees! (2024+)
● Lead of SCS Ph.D. Wellness Group (Sep 2020-May 2022)
● Joined AI/Healthcare panel from the Coding School as part of a free machine learning course for high school students (2024).

● Mentored students in Pittsburgh Girls-of-Steel for data science basics weekly (Sep 2022-Jun 2023).
● Mentored my amazing undergraduate student Tara Lakdawala on a changepoint detection project (Sep 2022-Aug 2023) [Blog]. She’s now at Goldman Sachs!
● Congratulations to my incredible masters student, Richa Gadgil for her graduation from the Machine Learning Masters Degree at CMU! Richa was a core contributor for the FlaSH project and we are excited to keep working at Delphi with her this summer! (May 2024)

Hobbies: Pittsburgh is a great place to explore new hobbies. I’ve enjoyed starting rock climbing and martial arts here! My views and opinions are my own!

Blogs & News

Word Count

July 2, 2024

Bridging Gaps in Research and Public Health: CSTE Conference Takeaways

A few weeks ago, I had the opportunity to attend the Council of State and Territorial Epidemiologists (CSTE) Annual Conference right here in Pittsburgh! Getting involved with CSTE this past year, especially through the forecasting and modeling workgroup calls, has been incredibly rewarding, and it was great to see some familiar faces in person.


1009

July 2, 2024

Information on Moving to Nairobi

I arrived in Nairobi last week to start my summer internship with the AI team at IBM Research! I’m enjoying my project, and the team has a welcoming culture.


616

May 16, 2024

Comments on CMS proposed rule (CMS-1808-P)

The CMS proposed rule will impact how data related to public health will be collected. Two standout quotes related to monitoring large-scale systems are,


492

April 30, 2024

Teaching Hands-On/Practical ML Classes

This semester, I had the opportunity to put together two practical, active learning lectures related to AI x Public Health!


726

April 25, 2024

Takeaways from the first InsightNet Meeting

Last week, I attended the first annual InsightNet conference in North Carolina!


1023

April 25, 2024

Getting Started with Public Health Monitoring Literature

Interesting Papers, Talks, Videos Related to Public Health Monitoring


117

April 20, 2024

Public Health Opportunities

If you are a CS/AI/ML student interested in getting exposure either with applying your methods to public health or are interested in developing methods for public health applications these opportunities/insights may be interesting to you:


152

April 10, 2024

AutoPrompt: Improving GPT Responses Using Personas and Synthetic Documents

Using Synthetic Persona Generation to Improve Prompt Responses


192

November 1, 2020

Initial Impressions of Epidemic Forecasting

Epidemic forecasting, by its very nature, is a direct application of computer science, math, and biology.


247

September 24, 2020

Shiftview, GPT2 for Moral Reframing

Deploying the Shiftview repo on Google Cloud creates a website that highlights my work on ShiftView, an ideological transformer based on the moral foundations theory. You can find a light demo of the work in the sidebar. Please be patient as the translations require time to generate. Additionally, because some of the data is uncensored, there may be cursing and confusing speech. I am not repsonsible for the translations generated. In its current state, ShiftView is simply a guide for what is possible with automated moral reframing.


603

June 30, 2020

Fulbright 2019-2020 Research

The following are from the bitsvbytes website that I could salvage - while this project was put on hiatus during COVID, I learned a lot about designing public health interventions using a data-driven approach!


4702

June 30, 2020

Fulbright 2019-2020 Recap & Notes

In 2019-2020, I was a Fulbright research student in Singapore. I kept a weekly blog (formerly called bitsvbytes because I was using ML techniques to reduce mosquito bites) that turned into a daily blog at the start of COVID. Unfortunately, I kept my blog (with all my writing, pictures, models, and demos for Buzznet) on Heroku (with a Heroku database) with the hope of eventually restarting the service once I could shrink the footprint (and the cost) of hosting this resource.


15131

Website made with Jekyll and Bootstrap.