Dr. Hansung Kim

Computer Vision Researcher

Recent Research Projects

Egocentic human interaction in an Extended Reality environment

- Programme: International Collaborative Research
- Funding body: ETRI South Korea
- Duration: Jan 2024 - Dec 2028
- Grant Award: £360K
- Role: Principal Investigator
- Summary: This project aims to develop an immersive eXtended reality (XR) environment for realistic egocentric interaction between users in the XR space. The main research goals are: human interaction between remote users in the XR enviroment, AI-based egocentric photorealistic view generation using NeRF, free-view visualisation of remote environment, and 3D environment analysis/visualisation.

Real-time 3D Scene Understanding and Human Tracking

- Programme: International Collaborative Research
- Funding body: KIST South Korea
- Duration: Jan 2022 - Dec 2024
- Grant Award: £105K
- Role: Principal Investigator
- Summary: This project aims to develop a 3D environment understanding and 3D human detection/pose estimation system using a single omni-directional camera and several depth sensors. The main target application will be providing an integrated 3D scene understanding to a digital human or humanoid robot. The key techniques will be implementing 3D human tracking/visualization using a camera and additional sensors, 3D pose estimation and understanding, and physical attribute estimation of the environmental scene.

Immersive Audio-Visual 3D Scene Reproduction Using a Single 360 Camera

- Programme: New Investigator Award
- Funding body: EPSRC
- Duration: September 2021 - February 2024
- Grant Award: £263K
- Role: Principal Investigator
- Summary: The goal of this project is to develop a simple and practical solution to estimate the full 3D geometry and acoustic properties of a scene from a single 360 photo and reproduce it as a virtual space allowing real-time 3D interaction with spatial audio adapted to the environment and listener locations. The idea is based upon two essential and complementary contributions: 1) 3D scene reconstruction and analysis at the capture side and 2) user-adaptive rendering at the reproduction side. This addresses the challenge of inferring audio-visual properties of a scene from a single image. The project is geared towards developing the next generation of media and communication systems. The project will push the focus of research towards multiple senses for more realistic reproduction of the target environment. This research will bridge the gap between audio and vision research communities by building a strong cross-disciplinary links to fuse complementary information for improved user experience, and ultimately unlock the creative potential of joint audio-visual signal processing to deliver a step change in various applications such as entertainment and communication.

COMFY: COntactless Mask Fit-testing sYstem

- Programme: Research Stimulus Fund
- Funding body: Institute for Life Sciences
- Duration: May 2021 - October 2021
- Grant Award: £6.8K
- Role: Principal Investigator
- Summary: This research is to evaluate a digital scanner to improve efficiency of the fit testing process of filtered face-piece respirators. The development work was first pump-primp through the AHSC Innovation competition for ideas to solve real world problems related to Covid-19. We now have developed a low-cost system to assess facial features of users and match the most appropriate respirator. Our aim is to determine if this system correctly predicts the respirator with which a user achieves an adequate fit.

Audio-Visual Media Research Platform

- Programme: UK Research and Innovation
- Funding body: EPSRC
- Duration: Aug. 2017 - July 2022
- Grant Award: £1.5M
- Role: Researcher
- Summary: This research addresses the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. These research advances are of central interest to both the audio and vision research communities and will bring together advances in machine perception. Joint audio-visual processing is essential to overcome the inherent ambiguities in either sensing modality such as occlusion, limited field of view and uniform appearance in visual sensing which commonly occur and can result in failure of visual understanding. Audio cues can overcome these limitations providing wide-area information and allowing the continuous sensing of objects which are visually obscured. For example both audio and visual cues are essential for non-contact monitoring of people in healthcare and assisted living applications.

ADAMS - Autonomous Driven truck for Management Operations

- Programme: The EIT Digital – European entrepreneurs driving digital innovation & education
- Funding body: EIT EU
- Duration: 2019 - 2020
- Role: Researcher
- Summary: ADAMS aims at the development of an autonomous driven vehicle for the automated transportation of passengers between the terminal and the car parks in Southampton Airport. The business pain is related to the large investments needed to introduce autonomous driven solutions into the business operative. In the case of ADAMS, the pain in solved by delivering a retrofitted solution to be applied in airports. The solution enables subcontractors to pilot the concept in a real scenario, checking its feasibility in technical, operational, financial and environmental terms.

Intelligent Virtual Reality: Deep Audio-Visual Representation Learning for Multimedia Perception and Reproduction

- Programme: BEIS Global Partnership Fund (BEIS GPF)
- Funding body: UK Science & Innovation Network
- Duration: Sep. 2017 - Mar. 2019
- Grant Award: £100,000 (UK £20,000 - Korea £80,000)
- Role: Principal Investigator
- Summary: This project aims to unlock the creative potential of Audio-Visual Machine Perception (AI) to deliver a step change in immersive VR experiences for entertainment and training. This requires highly intelligent technologies, including machine learning (deep audio-visual data learning), computer vision (object and action recognition), and audio signal processing (audio/speech analysis). Together with building a UK-Korea research network in audio-visual machine intelligence, this project will build a strong cross-disciplinary link between audio processing and computer vision to fuse complementary information for improved scene understanding.

S3A: Future Spatial Audio for an Immersive Listener Experience at Home

- Programme: EPSRC Research Grant (EP/L000539/1)
- Funding body: Engineering and Physical Sciences Research Council (EPSRC)
- Duration: Dec. 2013 - June 2019
- Grant Award: £5.4M, Industry Support: £0.6M
- Role: Leader of Computer Vision team
- Website: S3A website
- Summary: The goal of S3A is to deliver a step-change in the quality of audio consumed by the general public, using novel audio-visual signal processing to enable immersive audio to work outside the research laboratory in everyone’s home. S3A aims to unlock the creative potential of 3D sound and deliver to listeners a step change in immersive experiences. To achieve this S3A brings together leading experts and their research teams at the Universities of Surrey, Salford and Southampton and the BBC Research & Development.

IMPART (Intelligent Management Platform for Advanced Real-Time media processes)

- Programme: EU FP7 ICT (FP7-ICT-2011-8)
- Funding body: European Commission
- Duration: Nov. 2012 - Oct 2015
- Grant Award: €5M
- Role: Work Package Leader
- Website: IMPART website
- Video: IMPART video
- Summary: The European Commission funded IMPART focused on 'big (multimodal) data' problems in the field of the digital cinema production. The tools produced have been integrated in the production software of Double Negative, and new products from FilmLight have resulted. Twenty papers in journals and 70 conference publications demonstrate the research carried out. The project has created Open Data for research in this field and Open Source software (for acceleration and 3D web). The multimodal data have been unified though a 3D paradigm, leading to tools to speed up in orders of magnitude and improve 3D reconstruction, for assessing quality (of capture environments, of 3D reconstructions, in-focus, …). Video semantic analysis suitable for large scale multi-view has been provided, as well as 3D-2D (mainly web) integrated visualizations.

SyMMM (Synchronising Multimodal Movie Metadata)

- Programme: Technology Strategy Board (TSB 11702-76150)
- Funding body: Innovate UK
- Duration: Nov. 2011 - Apr. 2013
- Grant Award: £1.3M ( TSB £630,000M - Industry £500,000)
- Role: Researcher
- Summary: Movie making is changing from a two-dimensional process (in which scenes are shot on a camera and ‘composited’ as 2D layers) to one that combines digital video, computer-generated models, animations and effects in a three-dimensional world. This increases the director’s creative freedom, and supports the production of both 2D and 3D stereo versions, but is very technically demanding. SyMMM is developing ways to capture and process many kinds of metadata from video streams, photographs, laser scans and other measurements to support 3D approaches to movie making. The project will advance the state of the art in 3D video and automatic metadata extraction. It will lead to new methods and tools for blockbuster movie production, using multimodal metadata to control the way that scenes are put together and to help the creative team to visualise what is happening. The project leader is SME technology developer FilmLight, which won four ‘technical Oscars’ in 2010 and the Queen’s Award for Innovation in 2012. The project partners are Double Negative, Europe’s largest visual effects company (winner of the 2011 Oscar, as 2011 and 2012 BAFTAs for best VFX) and the University of Surrey.

i3DPost (Intelligent 3D content extraction and manipulation for film and games)

- Programme: EU FP7 ICT (FP7-ICT-211471)
- Funding body: European Commission
- Duration: Jan. 2008 - Dec. 2010
- Grant Award: €4.25M
- Role: Researcher
- Summary: i3DPost will develop new methods and intelligent technologies for the extraction of structured 3D content models from video, at a level of quality suitable for use in digital cinema and interactive games. The research will enable the increasingly automatic manipulation and re-use of characters, with changes of viewpoint and lighting. i3DPost will combine advances in 3D data capture, 3D motion estimation, post-production tools and media semantics. The result will be film quality 3D content in a structured form, with semantic tagging, which can be manipulated in a graphic production pipeline and reused across different media platforms.

E-nightingale (in Japan)

- Funding body: NICT, Japan
- Duration: Jan. 2004 - Dec. 2008
- Grant Award: ¥1,800,000,000 (JPY)
- Role: Researcher
- Summary: E-nightingale is a research project on a knowledge sharing system that observes certain activities in our everyday life with ubiquitous sensors, interprets these activities, constructs knowledge such as the general tendencies of these activities from the interpretations, and provides useful knowledge to those developing improvements for the targeted activities.

Development of next generation 3D Imaging systems (in Japan)

- Funding body: Korea Research Foundation, Korea
- Duration: Nov. 2005 - Oct. 2006
- Grant Award: ₩30,000,000 (KRW)
- Role: Principal Investigator
- Summary: A 3D video system that uses environmental stereo cameras to display a target object from an arbitrary viewpoint was developed. In order to create 3D models from captured 2D image pairs, a real-time segmentation algorithm, a fast depth reconstruction algorithm and a simple and efficient shape reconstruction method were developed. For viewpoint generation, the 3D surface model is rotated toward the desired place and orientation, and the texture data extracted from the original camera is projected onto this surface. Finally, a real-time system that demonstrates the use of the above algorithms was implemented.

Old Research Projects (in Korea)

SOC development for processing 3D media (Ministry of Science and Technology, Aug. 2003 – Aug. 2005)
Development of view-selectable multi-view video CODEC (ETRI, June. 2002 – Nov. 2003)
Research and development of next generation intelligent broadcasting technology (Ministry of Information and Communication, Nov. 2001 – Oct. 2005)
Development of multi-view video CODEC using motion/disparity estimation (Korea Research Foundation, Oct. 2001 – Sep. 2002)
Development of 3D SD level multi-view CODEC and preprocessing (Ministry of Commerce, Industry and Energy, Oct. 1999 – Sep. 2002)
Development of object-based CODEC for 3D TV (ETRI, May 1998 – Nov. 2000)
Development of 3D imaging display technology with high reality (Ministry of Commerce, Industry and Energy, Jan. 1998 – Dec. 2000)