BIBC2025 workshop - Introduction
RSFAS, ANU
I am Patrick Li.
Education Background I hold a PhD in Statistics. My research focused on computer vision and data visualization, with an emphasis on developing visual analytics methods to assess residual plots.
Current Work I am currently working at ANU, primarily on machine learning, image analytics, and plant phenotyping projects.
Past Experience in Computer Vision My PhD research involved applying computer vision techniques to evaluate residual plots. I have worked on a food safety project where I helped collect human-subject data, trained vision language models to answer food safety–related questions, and trained object detection models for food safety area detection. I also contributed to a project analysing sperm–egg cell videos and images to understand factors related to successful IVF outcomes.
reticulate basicsCode and theory
This workshop is a hands-on introduction to modern computer vision in R. You’ll work through code examples and practical exercises. Theory will be covered only to the extent needed to understand the key ideas and statistical foundations behind the models.
Reproducibility
Computer vision models can be only partly reproducible due to parallel computation. Don’t be surprised if your results differ from those shown in the materials.
Hardware and time limit
Training time-consuming models will not be required during the workshop, as participants may have different hardware setups. Example training codes are provided in the materials, and exercises will focus on exploring pre-trained models.
How confident should I be coming into this workshop?
If you have a background in statistics, you are good to go!
There will be some new concepts and terminology, but almost everything can be understood using your statistical intuition.
Don’t be intimidated by CV jargon!
Computer vision is a broad field concerned with enabling machines to interpret and understand visual information from the world.
Early CV research relied heavily on geometric reasoning, algebraic formulations, and handcrafted rules.
Edge and line detection, Sobel operator (1968)

Digit recognition, N-tuple method (1959)

Image segmentation, recursive region splitting (1978)

Image classification, texture analysis (1973)

The first convolutional neural network (CNN) Neocognitron was introduced in 1980, but it was LeCun’s LeNet for digit recognition (late 1980s – 1990s) popularised it.
Document recognition, LeNet-5 (1998)

Face detection (1998)

Lung nodule detection (1995)

Image segmentation, Cresceptron (1992)

During the 2000s, due to hardware limitations, research largely shifted toward hand-crafted feature descriptors and training classifiers such as SVMs on top of them.
Scale Invariant Feature Transform (2004)

Histograms of Oriented Gradients (2005)

Bag of visual words (2004)

Speeded up robust features (2006)

Powerful GPUs and the introduction of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) sparked the rapid advancement of modern computer vision models.
Image classification, AlexNet (2012)

Object detection, R-CNN (2015)

Image segmentation, Fully Convolutional Network (2015)

Image generation, Generative Adversarial Nets (2014)

The 2020s are defined by transformer-based architectures, emphasising global context, scalability, and multi-modal learning.
Image classification, Vision Transformer (2020)

Image segmentation, Segment Anything Model (2023)

Text-to-Image generation, Latent Diffusion Model (2022)

Vision question answering, Visual Language Model (2022)

We don’t know if a breakthrough on the scale of CNNs or Transformers will occur.
Nor is it clear whether traditional statistical knowledge will remain relevant to future computer vision research.
By reading the literature from the 1950s to today, you fill find less and less traditional statistics involved.
Research is relying more on higher-level conceptual thinking.
Back in the day, computer vision tools were not packaged software like today but rather collections of custom routines written mainly in assembly, C, or Fortran.
Image processing libraries began to emerge in 1980s, such as:
With CNNs gaining popularity, new open-source deep learning frameworks appeared in the 2010s.
These frameworks all shares common characteristics:
Theano was released by the MILA Lab, Montréal around 2010.
Caffe was released by UC Berkeley in December 2013 by Yangqing Jia.
CNTK (Microsoft Cognitive Toolkit) was released by Microsoft Research in 2015.
TensorFlow was released by Google Brain in November 2015.
Keras was released by François Chollet in March 2015.
PyTorch was released by Facebook AI Research (FAIR) in January 2017.
JAX was released by Google Research in December 2018.
We will use PyTorch as the backend and Keras as the high-level API.
Keras was originally tied to TensorFlow, but TensorFlow development has slowed, and internal interest at Google seems to have decreased, similar to other past Google products.
Keras 3.0 now supports multiple backends, so the same code can run on TensorFlow, PyTorch, or JAX.
Today, PyTorch is the main choice for research and fast experimentation, and most new models are built with it.
It matters less which framework you start with. Most share similar core designs, so once you understand the fundamentals, your experience is easily transferable.

Slides URL: https://ibsar-cv-workshop.patrickli.org/ | Canberra time