Data Science Workshops

The Hawaiʻi Data Science Institute offers free workshops to learn essential cyberinfrastructure (CI) and data science skills.

October 21, 2022

FAIR Data Management Security and Ethics

Synopsis: Focuses on introducing FAIR data management practices — FAIR stands for Findability, Accessibility, Interoperability, and Reusability.

CI Tools: Modern web browser, Hydroshare account

October 28, 2022

Scientific Software Basics

Synopsis: Focuses on foundations and essentials of scientific software.  Attendees will learn how to use the Unix command line for navigating Linux environments like the Mana high performance computing cluster, command line git and GitHub for software version control and GitHub pages for sharing your science.

CI Tools: Mana, Git/GitHub, Open OnDemand

November 4, 2022

High Performance Computing

Synopsis: Focuses on utilization of high performance computing (HPC) clusters, such as Mana, for deep learning tasks and the benefits of understanding the different types of file systems available on HPC clusters and basic ways one would stage data from slower file systems to faster file systems.

CI Tools: Mana, Open OnDemand, JupyterLab

December 2, 2022

Data Movement, Dissemination and Archiving

Synopsis: introduction to understanding the challenges and options in moving scientific data over the network. learn about some of the different network infrastructure and tools available and the use cases to apply them towards and highlight any disadvantages or drawbacks to a particular technology.

CI Tools: Mana, SFTP, GridFTP, LFTP, rsync, Globus

February 3, 2023

Data Wrangling with Computational Notebooks

Synopsis: Introduction to understanding how to utilize Jupyter Notebooks to create reproducible computational workflows. Learn how to build notebooks that contain a combination of explanatory markdown formatted text and python code. 

CI Tools: Python, Pandas, Matplotlib, Jupyter Notebook

February 17, 2023

Machine Learning Approaches in Climate Science

Synopsis: Introduce the basics of time-series and geospatial data modeling using modern data science software tools: Jupyter notebooks, ScikitLearn, Keras, and Tensorflow on High Processing Computers.

February 24, 2023

Data Visualization

Synopsis: Vision is the primary channel through which humans interpret most of the information in the world. So it isn’t surprising that half of the human brain is dedicated to processing visual imagery. Visualization is particularly important in the age of Big Data, Data Science and Artificial Intelligence, as it enables humans to remain in the loop, ensuring that discoveries made by these advanced data analytics methods are truly valid. This hands-on workshop will introduce participants to examples of good as well as bad visualizations; the process of creating effective data visualizations; and popular data visualization tools that can help researchers and instructors produce visualizations of their data.

CI Tools:, Dash, Tableau, Paraview, SAGE3

March 24, 2023

Smart Data Collection for Sensor Networks

Synopsis: Covers the usage of the Tapis Streams API for representing sensor networks and streaming/processing real-time data. 

CI Tools: Mana, Abaco, Jupyter notebooks, python 3

March 31, 2023

Creative Thinking Workshop

Synopsis: Introduction to what science informs us about creativity. Drawing from the science and the experience of creative practitioners such as animators and comedians, the workshop will then describe conditions and activities that can help as well as hurt creativity.

CI Tools: CyberCANOE, SAGE3

April 21, 2023

Scientific Workflows and Gateways

Synopsis: Introduce gateways and workflows to foster a collaborative environment for research and scientific reproducibility. Learn to navigate HydroShare tools and resources to create data resources that can be used in collaboration. Use existing workflow to demonstrate ease of community-developed tools. 

CI Tools: HydroShare, JupyterHub, Python