State-of-the-art of data science finds hidden relationships in data through human steerable AI

19 Aug 2020

State-of-the-art of data science finds hidden relationships in data through human steerable AI

Artificial and human intelligence work together in new research supported by the National Security Agency (NSA) and University of Hawai‘i (UH). Researchers at the Hawai‘i Data Science Institute at University of Hawai‘i are using state-of-the-art artificial intelligence (AI) that is common in visual analytic systems to find hidden relationships between data elements.

Analysts working to understand highly-dimensional data sets like large document collections not only rely on data but on specific user questions. Using an expanded design space of semantic interaction systems applied to state-of-the-art pipelines, analysts can make insights using their domain knowledge and steer AI to obtain a desired visualization.

Semantic interaction enables the direct manipulation of two-dimensional views of high-dimensional data, representing similar data in clusters. The Zexplorer system developed by Ph.D candidate Alberto González Martínez creates a visualization that allows users to manually connect and categorize documents and then embeds user knowledge into the system models. The system is built atop Zotera, a widely used document organization system.

Zexplorer allows users to manually make data connections by manipulating the visualization, changing document positions and clusters.
(A) User repositions documents that do not agree with its own knowledge.
(B) System interprets new user representation.
(C) System adds new documents about security (pink) mimicking the user defined model.

This is how Zexplorer embeds the user knowledge back to the original ML models without manipulating the algorithm parameters.

“Traditional analytical pipelines are driven solely by algorithms or models and without a human in the loop they can potentially limit sense-making by masking expected or known structure in the data. When doing data analysis the insights are not solely based on the data but are mainly driven by the users’ specific questions.” said González, who will earn a degree in Computer Science from University of Hawai‘i at Mānoa. “For example the same document or corpus of text may hold numerous orthogonal pieces of information each of which is valuable to different users with different degrees. This work provides a first step towards embedding the domain knowledge and questions into the system models, allowing analysts to steer the system model towards their own mental model.”

Imagine the points in these pictures represent documents on various topics. Depending on the readerʻs point of view they can be grouped in 2 or more different ways (bottom left and bottom right). AI may find groupings that the user did not intend. ZExplorer lets the user see AIʻs groupings and correct them if necessary, all the while without requiring them to be experts in AI- just as we are able to drive a car without having to be an expert auto mechanic.

While the work has been applied to textual data, the methods are general enough to be used in any other high dimensional domains such as astronomy or genomics and can improve current research in visual analytics and drive the future discovery of more powerful pipelines that emphasize explainability in visual analytic systems.

González along with Masterʻs candidate Billy Troy Wooten, Ph.D candidates Nurit Kirshenbaum and Dylan Kobayashi, and Hawai‘i Data Science Institute Co-Director Jason Leigh were awarded ʻBest Paper in trending now – machine learning and artificial intelligence’ at the 2020 Practice and Experience in Advanced Research Computing (PEARC) national conference for this research.

González and the team of UH researchers presented this research at PEARC20 which was held as a virtual conference from July 27 to 31, 2020.