

A dynamic and scalable data cleaning system for Watson analytics
Poor data quality is a serious and costly problem affecting organizations across all industries. Real data is often dirty, containing missing, erroneous, incomplete, and duplicate values. It is estimated that poor data quality cost organizations between 15% and 25% of their operating budget. Existing data cleaning solutions focus on identifying inconsistencies that do not conform to prescribed data formats assuming the data remains relatively static. As modern applications move towards more dynamic search analytics and visualization, new data quality solutions that support dynamic data cleaning are needed. An increasing number of data analysis tools, such as Watson Analytics, provide flexible data browsing and querying abilities. In order to ensure reliable, trusted and relevant data analysis, dynamic data cleaning solutions are required. In particular, current data quality tools fail to adapt to: (1) fast changing data and data quality rules (for example as new datasets are integrated); (2) new data governance rules that may be imposed for a particular industry; and (3) utilize industry specific terminology and concepts that can refine data quality recommendations for greater accuracy and relevance. In this project, we will develop a system for dynamic data cleaning that adapts to changing data and rules, and considers industry specific models for improved data quality.
Industry Partner(s): IBM Canada Ltd.
Academic Institution: McMaster University
Academic Researcher: Fei Chiang
Platform: Cloud
Focus Areas: Cybersecurity, Digital Media

Change Your Game
Tracking athletic performance in basketball can require significant time and resources, but the resulting information can be extremely valuable for a developing player and their coach. Fortunately, advancements in computer vision can provide an opportunity to automatically track shooting performance. Further, this information can be provided to the players in an entertaining and fun gaming atmosphere to encourage player development. The objective of this project is to advance deep learning algorithms to provide real-time biomechanical feedback that can be used to develop training and entertainment software for youth basketball players. This exciting work by Pipeline Studios Ltd., in collaboration with McMaster University, involves the use of deep learning-based approaches to track shooting mechanics and performance. Significant computational resources are initially required to test the limits of these deep learning models for image classification, object detection, event detection, and human pose estimation. The advancement of these algorithms on large computational networks will be required for the training and entertainment software which can support basketball athlete development in Canada.
Industry Partner(s): Pipeline Studios INC.
Academic Institution: McMaster University
Academic Researcher: Dylan Kobsar
Platform: GPU
Focus Areas: Health


Dynamic microscopy image processing and analysis for infectious diseases, diagnosis and treatments
In vaccine and therapy development for infectious diseases, advanced optical imaging is used to measure the interaction of virus and the host cell. Current microscopes are relatively slow that will only provide a “snapshot” of the biological interactions. In order to develop diagnostic methods and effective treatment, it is necessary to image these dynamic interactions continuously like a movie. Additionally, the stream of images will also need to be processed and analyzed rapidly using new computation methods. To address such a challenge, we plan to develop high-speed microscopic imaging instruments and related image processing and analysis technology. These technologies will enable us to build a customized microscope capable of high-speed quantitative imaging of virus-host interactions in live cells for infectious disease diagnosis and therapeutic treatment research. This project module will primarily focus on image processing and analysis algorithm development.
Industry Partner(s): McFocal
Academic Institution: McMaster University
Academic Researcher: Hayward, Joseph
Focus Areas: Advanced Manufacturing, Health



HPC cloud analytics / machine learning support for Watson Pepper clinical study
Skin cancer is the most common type of cancer. 80,000 cases of cancer are diagnosed in Canada every year. 5000 of these cases are melanoma, the deadliest form of cancer (Canadian Skin Cancer Foundation). Current prevention efforts to reduce skin cancer focus on educating individuals on preventative actions that they can take to reduce the risk of this cancer. However, research has shown that both communication failure and information overload are significant problems affecting the quality of patient centered care. Social robotics and artificial intelligence have been used effectively to communicate and positively influence behavior, thus this research proposes to develop and test these combined technologies as an intervention for skin cancer prevention education.
The research team and collaborating research partner IBM will integrate IBM Watson cognitive computing applications with Softbank Robotics advanced robotics platform, the Pepper robot. The Watson Pepper prototype will be used as a controlled variable in a randomized controlled clinical trial (N = 200) to assess the efficacy of socially assistive robotics intervention for behavioural change in skin cancer prevention knowledge and practices among medical patients, the first clinically tested implementation of a Watson Pepper robot for healthcare communication. The research proposes commercialization and business implementation of the integrated IBM Watson robot in an expanded scale and scope of healthcare communication applications. To support the achievement of this innovative technology milestone, SOSCIP will provide the critical cloud data analytics and memory capacity to support the analysis and modeling of the large multivariate data sets associated with this project.
Industry Partner(s): IBM Canada Ltd.
Academic Institution: McMaster University
Academic Researcher: David Harris Smith
Co-PI Names: Hermenio Lima, Frauke Zeller
Platform: Cloud
Focus Areas: Cybersecurity, Digital Media, Health


Personalized predictive risk for medical imaging radiation exposure
This project will build the expert team and create the tools required to understand the long-term effects of low dose radiation exposure from medical imaging on populations and facilitate the adoption of best practices to decrease the impact of imaging related radiation exposure. Although this project will focus on low dose radiation exposure from medical imaging, the tools and approaches that will be developed will be transferrable to other specialties in the health care field.
To achieve these goals, we propose to:
(1) Develop a standard-based, extensible data model for a provincial platform to reconcile radiation dose with patient medical information;
(2) Develop generic data mining tools that can be customized to query Big Data repositories for use with major modalities, including Electronic Patient Records and Electronic Medical Records across the province;
(3) Investigate algorithmic approaches to extract and structure data from reports and text in DICOM headers, and from free text in notes and summaries in electronic patient records;
(4) Develop intelligent terminology mapping and quality assurance mechanisms to ensure data quality; and,
(5) Develop a feedback mechanism for decision support at the point of care related to imaging procedures.
Industry Partner(s): Real Time Radiology d.b.a Real Time Medical
Academic Institution: McMaster University
Academic Researcher: David Koff
Co-PI Names: Thomas Doyle & Reza Samavi
Platform: Cloud
Focus Areas: Digital Media, Health


Virtual City Environment
The project goal is to develop a proprietary 3D city-modeling platform (Virtual City Environment VCE) based on the automated co-registration of LIDAR, GIS, and street view image datasets. The platform would integrate an interactive and interoperable online client, supporting multiple user content and scripting modification, and support the parsing of 3D object types, such as terrain, buildings, transient objects (vehicles) and vegetation. The platform will feature a data interface allowing users to incorporate, visualize and highlight different types of data, such as public health data, air quality, traffic, demographics, energy, commercial activity, zoning, etc. The resulting 3D visualization and data interface platform will satisfy a market opportunity for planning visualization and communication services required of professional planning, architectural, engineering businesses, as well as provincial and municipal governments to meet contemporary civic engagement standards. The proposed platform advances SOSCIP strategic priorities by innovating application of supercomputing to existing and emerging urban data sets to improve planning information systems.
Industry Partner(s): GeoDigital Canada , Rethink/ReNewal Urban Planners
Academic Institution: McMaster University
Academic Researcher: David Harris Smith
Platform: Cloud, Parallel CPU
Focus Areas: Cities, Digital Media