Ananya A. Joshi

  • βœ‰οΈ : aa(last_name) at andrew dot cmu dot edu
  • CV

About me: I work on deployable and interpretable solutions for data-intensive research problems. In my Ph.D. thesis research, I developed a human-in-the-loop framework and related methods that are used daily by public health stakeholders to identify and diagnose data events from large volumes of public health data.

My expertise is in applied AI, data science, and computer systems. I also have extensive experience working directly with line-level and aggregated public health data. Outside of that, I have experience across the stack (e.g. research in caching, building NiChrome) and in applications like urban data, protein folding, and LLMs (automated prompt engineering and fine tuning).

[βž• New] I'm starting to look for post-Ph.D. opportunities. Let's chat!

Personal Website Picture


Current Roles

Delphi Logo

Carnegie Mellon University
Computer Science Ph.D. Student:
Aug 2020-exp. Dec 2024
The Delphi Group at Carnegie Mellon University
Lead and Project Manager for the FlaSH Project
Jan 2023-Present

Education History


Recent Projects

* indicates first/co-first author

πŸ“Œ Thesis Proposal

December 8th 2023
Chairs: Roni Rosenfeld (CMU) and Bryan Wilder (CMU), Rayid Ghani (CMU), Matt Biggerstaff (Centers for Disease Control and Prevention, Flu Division)

Public health data aggregators publish millions of data points across many data streams, like the daily number of influenza cases, hospitalizations, and deaths per county and state in the United States. So that their users, including public health experts, do not draw erroneous conclusions, these aggregators must identify noteworthy changes in their data, including those that result from data errors and outbreaks. However, given increasing data volumes and limited numbers of data reviewers, aggregators can only have some of their data inspected manually. <\br><\br> This thesis introduces a human-in-the-loop framework for public health data aggregators to inspect their data given their reviewer resources. Currently, an automated method ranks each new data point in context, with a correction for high-volume settings, so that reviewer attention is focused on the most noteworthy data. Still, reviewers need to investigate hundreds of data points from the ranked list before being able to understand the scope of the impacted data. Accordingly, we propose including an automated module that jointly identifies multiple data points that are noteworthy in context.

[Slides Here][YouTube Playlist Here]
πŸ“Œ Outlier Ranking for Large Scale Public Health Data

Ananya Joshi*, Tina Townes, Nolan Gormley, Luke Neurieter, Roni Rosenfeld, Bryan Wilder
DELPHI Group, Carnegie Mellon University

In this paper, we developed and deployed new human-in-the-loop machine learning algorithms to identify and rank outliers from large volumes of geospatial-temporal public health streams that enable reviewers to identify noteworthy data 9.1x faster. [Code][ArXiV]
πŸ“£ Accepted to appear at AAAI 2024 Presentation + Poster
πŸ“£ Accepted at the Addressing Socioethical Effects of AI Workshop at AAAI 2024

πŸ“Œ Computationally Assisted Quality Control for Public Health Data Streams

Ananya Joshi*, Katie Mazaitis, Roni Rosenfeld, Bryan Wilder
DELPHI Group, Carnegie Mellon University

In this paper, we: [ArXiV][Code]
πŸ“£ Presentation to the Flu Division at the CDC
πŸ“£ IJCAI-23. International Joint Conference on Artificial Intelligence. 2023. Oral Presentation + Poster + Paper
πŸ“£ Speaking Skills Presentation at Carnegie Mellon University

πŸ“Œ BuzzNet: Data Driven Approaches to Preventing Vector-Borne Diseases in Singapore

Ananya Joshi*, Clayton Miller
Fulbright Urban Planning & Sustainable Design Award, Singapore, National University of Singapore

Towards a sensor-based feedback mechanism to prevent mosquito breeding grounds in Singapore, I developed a suite of micro/macro modules (including deep-learning models that use audio, visual, and spatiotemporal input data) designed to work together and assist users in designing urban spaces that are unappealing to mosquitoes. πŸ“„ Fulbright Blog [U.S. Embassy]
πŸ“„ Fulbright Blog [NUS]
πŸ“„ Essay in Fulbright in a Time of COVID: Essays by US Fulbrighters in Asia, 2019-2020
πŸ“„ Datathon Overview [NUS], 2nd place team
πŸ“„ Presentation at the College of Alice and Peter Tan for Master's Tea

πŸ“Œ Creating an Automated Ideological Transformer Using Moral Reframing [GPT2]

Ananya Joshi*, Christiane Fellbaum, Michael Guerzhoy
Princeton University

πŸ“„ Princeton Undergraduate Senior Thesis in Computer Science (A+)
🌟 Sigma Xi Award for Outstanding Undergraduate Research.
🌟 Funding from the Peter and Rosalind Friedland Endowed Senior Thesis Fund
🌟 Funding from the Wilson Senior Thesis Fund

Engineering, Systems & Data Science Projects:

πŸ“„ Cooperate Rule Caching for SDN Switches Ori Rottenstreich, Ariel Kulik, Ananya Joshi, Jennifer Rexford, GΓ‘bor RΓ©tvΓ‘ri, Daniel MenaschΓ©, 2020 IEEE 9th International Conference on Cloud Networking (CloudNet).
πŸ“„ Data Plane Cooperative Caching With Dependencies Ori Rottenstreich, Ariel Kulik, Ananya Joshi, Jennifer Rexford, GΓ‘bor RΓ©tvΓ‘ri, Daniel MenaschΓ©, 2020 IEEE Transactions on Network and Service Management.
πŸ“„ An open repository of real-time COVID-19 indicators Alex Reinhart, Logan Brooks… Ananya Joshi …, Roni Rosenfeld, Ryan J. Tibshirani, Proceedings of the National Academy of Sciences.

πŸ› οΈ NiChrome, Google : Intern project with Anthony Rolland, advised by Ron Minnich and Christopher Koch
πŸ› οΈ COVIDCast Engineering, Delphi: Contributor
πŸ› οΈ Research at MIT Lincoln Laboratory on Probabiltiy of Cloud Free Line of Sight
πŸ“„ Urban Data Mining in Switzerland, ETH Zurich

Prepared Lectures

πŸ“• Time Series Data: Clarifying Practical Approaches
This was a 80-minute active-learning lecture for students in Machine Learning in Practice. In a post-class anonymous survey (85% completion rate), 83% found the material to be at the right level (1 student found it too easy and 1 found it too difficult) and, across all tasks, identified in the learning objectives, students reported being able to increase their skills (e.g. from being unfamiliar with the task, being able to define the task, being able to classify tasks that belong to the component, knowing at least 2 ways to approach the task, and being familiar with some technical insights/nuances of the tasks).

πŸ“• AI for Social Good in Public Health which used a follow-along Colab notebook to demonstrate statistical properties of data streams like nonstationarity, nosiness, and weekday effects and walkthrough methods to process data with these properties.

Please contact me if you would like to use this activity in your teaching.

Additional Highlights

🌟 Best Paper Award (SIGCSE β€˜23) [Group Award]
🌟 Carnegie Mellon University Graduate Student Service Award 2022 [Group Award]
🌟 Mentored undergrad student for changepoint detection (Sep 2022-Aug 2023) [Blog].
🌟 Mentored students in Pittsburgh Girls-of-Steel for data science basics weekly (Sep 2022-Jun 2023).
🌟 Lead of SCS Ph.D. Wellness Group (Sep 2020-May 2022)
🌟 Co-Instructor for 15-996 (Spring 2022/23) Article & TA for Machine Learning in Practice (Spring 2023)
🌟 Co-Organizer of the Delta Workshop for drift phenomnea at KDD β€˜24! Link
🌟 Completed the Eberly Center’s Future Faculty Program
🌟 Joined the Council of State and Territorial Epidemiologists’ (CSTE) Peer-to-Peer Technical Assistance network as a mentor
🌟 Selected Courses: Grad AI (A+), Mobile & Pervasive Computing [IoT] (A+)
🌟 Joined AI/Healthcare panel from the Coding School as part of a free machine learning course for high school students.
🌟 Presented a talk at CMU Artificial Intelligence Seminar Series (Apr 2023).
🌟 Presented a poster at the first annual InsightNet meeting (Apr 2023).

Hobbies: Pittsburgh is a great place to explore new hobbies. I’ve enjoyed starting rock climbing and martial arts here! My views and opinions are my own!

Blogs & News

Word Count

April 10, 2024

AutoPrompt: Improving GPT Responses Using Personas and Synthetic Documents

Using Synthetic Persona Generation to Improve Prompt Responses


1566

November 1, 2020

Initial Impressions of Epidemic Forecasting

Epidemic forecasting, by its very nature, is a direct application of computer science, math, and biology.


1802

September 24, 2020

Shiftview, GPT2 for Moral Reframing

Deploying the Shiftview repo on Google Cloud creates a website that highlights my work on ShiftView, an ideological transformer based on the moral foundations theory. You can find a light demo of the work in the sidebar. Please be patient as the translations require time to generate. Additionally, because some of the data is uncensored, there may be cursing and confusing speech. I am not repsonsible for the translations generated. In its current state, ShiftView is simply a guide for what is possible with automated moral reframing.


4306

June 30, 2020

Fulbright 2019-2020 Research

The following are from the bitsvbytes website that I could salvage - while this project was put on hiatus during COVID, I learned a lot about designing public health interventions using a data-driven approach!


34848

June 30, 2020

Fulbright 2019-2020 Recap & Notes

In 2019-2020, I was a Fulbright research student in Singapore. I kept a weekly blog (formerly called bitsvbytes because I was using ML techniques to reduce mosquito bites) that turned into a daily blog at the start of COVID. Unfortunately, I kept my blog (with all my writing, pictures, models, and demos for Buzznet) on Heroku (with a Heroku database) with the hope of eventually restarting the service once I could shrink the footprint (and the cost) of hosting this resource.


125075

Website made with Jekyll and Bootstrap.