Skip to main content
Apply

Arts and Sciences

Open Main MenuClose Main Menu
Department of Computer Science

Summer 2021 Online REU Site at OSU


Students participated in specific research projects related to big data analytics within the overall cohort research experience. They were introduced to the fundamentals of big data analytics and work on real-world projects. They got some hands-on experience with Python and MATLAB libraries on many machine learning and data mining models. They generated a hypothesis, designed and implemented a research plan to test their hypothesis and analyze the results with the help of research mentors. The students engaging in any of the tasks are given co-authorship in research papers generated.

Professional development talks

Professional development talks

pic 1

pic 2

pic_3

pic_4.png

brief descriptions of various projects below

 

Project 1: Quantifying Online Polarization (Dr. Bagavathi)

Social media has amplified the polarized views of society and has had catastrophic effects on both the individual and collective response to events like COVID-19. By tracing the polarized news propagation from different media outlets to social media, we will be able to capture the footprint of political bias and quantify the impact it has on problems like rumor detection. Some pressing questions in computational social science research are to identify "how?" misinformation originates and "why?" it becomes viral in social media. In this work, we aim to add a dimension to this line of work with our contributions in mapping and quantifying polarization networks in social media. This project will establish the correspondence of news media content, social media user responses to these contents, and user interaction dynamics to quantify online social media polarization. 

 

REU students involved in this project will work under the supervision of Dr. Bagavathi with his graduate students to develop to achieve these objectives. They will gain knowledge about state-of-the-art text and graph mining methods, understanding and developing methodologies for social media data, and introduce students to the computational social science domain. Furthermore, these projects will give an arena for undergraduate students to think and tackle on-going polarized and hateful situations in social media. 

 

pic 5

 

Project 2: Knowledge Discovery on Biomedical data (Dr. Akbas)

The goal of this project is to utilize applied machine learning to massive interaction data among biomedical entities like drugs, diseases, proteins and side effects. This project construct different network structure from this biomedical relational data and also develop network-based learning methods to address critical problems in biology and medicine. 

 

pic 6

 

Project 3: Extracting Information from EHR data. (Dr. Shamsuddin)

EHR is a valuable source of data for any medical research. However, the quality and ease of use of EHR is questionable. Thus, we propose to develop a standard framework for converting the EHR data into easily accessible patient profiles. In one of our previous works, we define a patient profile as a low dimensional data structure that contains a relevant summary of each patient. The quality of the patient profiles will be determined through the ability of various machine learning models to map the patient profile to the corresponding diagnosis accurately. This will facilitate the use of objective decision-support systems that used HER data for real-life clinical practice. 

 

Students will measure the quality of the patient profiles with various machine learning models to map the patient profile to the corresponding diagnosis accurately. We will work with publicly available EHR data through PhysioNet Challenges (e.g., Early Prediction of Sepsis from clinical data). Moreover, healthcare data has missing information or attribute values since data collection is a lengthy, time-consuming process. Students will learn how to use statistical tools (such as mean, median, variance and temporal relations) to fill out missing data. 

 

pic 7

 

Project 4: Characterizing COVID-19 spread (Dr. Thieu)

The novel coronavirus pandemic has put the world under new normality. In an effort to join hands fighting the pandemic, we set a goal to investigate geospatial, societal, and infrastructure factors that affect the spread of SARS-CoV-2. We use Natural Language Processing (NLP) to extract entities and relations associated with spreading of the virus from the COVID-19 Open Research Dataset (CORD-19). We not only extract information about SARS-CoV-2 but also extract information on other coronavirus strains in CORD-19. Next, we aggregate the extracted information into a knowledge graph. This graph embodies factors that affect pandemic spread throughout the history of humankind. We hypothesize that the knowledge graph will provide interpretable insights into the rapid spread of COVID-19. The insights can then be used for policy making, increase preparedness, and prevention of any future pandemic outbreak. 

 

REU students involved in these projects will work under the supervision of Dr. Thieu. They will construct a knowledge graph that aggregates all extracted entities and relations, then visualize the graph using Cytoscape. Throughout the projects, students will learn NLP concepts, including named entity recognition (NER) and relation extraction (RE) and hands-on utilization of the popular NLTK library and the state-of-the-art Transformers library. 

 

pic 8

 

Project 5: Self-supervised Predictive Learning for Visual Understanding (Dr. Aakur)

The visual spectrum is a rich source of information encoding different modes of knowledge such as physics, geometry, and semantics into a concise representation of pixels. An image, or a video, can often contain rich sources of information that expresses a lot of information about the scene such as the background, activity, location, and context that describes the primary semantic content of the captured scene. In this project, we aim to apply state-of-the-art deep learning techniques like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in an unsupervised representation learning paradigm to extract meaningful visual representations of unlabeled data from the internet for better visual understanding. Traditional approaches to visual understanding required structured, labeled data for training and as such, can be prone to labeling errors. The overall goal of this research is to create a real-time visual event understanding system in streaming videos using continuous-valued deep learning algorithms to build unsupervised representations of visual data for better visual understanding. 

 

Students will implement a predictive coding stack that will form the basis for learning representations in a self-supervised manner. They will integrate multiple modalities such as text and audio into the prediction framework to help ground the concepts in visual data. 

 

Project 6: Explainable and engineerable machine learning (Dr. Crick)

Machine learning has become a vast engine of the economy, owing to the availability of big data and developments in deep convolutional neural networks. However, CNNs have serious problems, in that they consist of collections of hundreds of thousands of parameters which are completely opaque to human inspection. Even if CNNs do well in learning tasks, the fact that we cannot follow their decision procedure renders their decisions less trustworthy, and also impedes our efforts to engineer systems that could accomplish the same tasks with fewer resources than CNNs require. Investigations into the semantic structure of such networks - what information is contained where, how it can be intelligently adjusted, when decisions are made and on which basis - are urgently needed. 

 

This data analysis project will involve constructing and analyzing deep neural networks to identify patterns of learning behavior, attempting to construct structures for specific purposes, tracking the patterns of performance in transfer learning from one context to another. Research projects will include defense against malicious attacks, using generative networks for training and introspection, and examining the interrelationship and performance of different network architectures such as convolutional, autoencoding, long short-term memory, and others. 

 

Game Nights

Every friday, we had a game night and played video games together. It was fun.

pic 9

 

pic 10

 

pic 11

Back To Top
MENUCLOSE