SINBAD Consortium Meeting - Fall 2013

Whistler, Canada

Date

Dec 1 (6:00 PM) - Dec 4 (12:00 PM), 2013

Venue

Fairmont Chateau Whistler 4599
Chateau Boulevard
Whistler, British Columbia

Reserve through: https://resweb.passkey.com/go/slimresearch

Or call: 1-800-606-8244 (Group Code "1113UBCU")



Transportation

SLIM Shuttle

Pickup Sun Dec 1:

YVR International Terminal: 2:00 PM
UBC Bookstore 2125 East Mall: 3:00 PM
Downtown Fairmont Waterfront 900 Canada Place Way: 3:45 pm

Depart Whistler Wed Dec 4, 1:00PM

3:00 PM: Dropoff Waterfront
3:45 PM: UBC
4:30 PM: YVR

Other Transport Options:

Bus: Pacificcoach YVR-Whistler-Skylynx
Limousine service: Star Limousine
Car rentals: Airport Rental Cars (use YVR for airport) Driving Directions airport to Whistler



SLIM Contacts

Miranda Joyce
+1 (604) 822-5674 (office)
mjoyce@eos.ubc.ca
SLIM - EOAS Dept University of British Columbia
2020-2207 Main Mall,
Vancouver, B.C. CANADA V6T 1Z4

2013 Fall SINBAD Consortium meeting—HPC/Big Data Forum

Location: Empress Ballroom Fairmont Chateau Date: Monday 2nd December 2013 05:30 — 08:00 PM

Chair: Ian Hanlon

Attendees

  1. Fusion IO - Daniel St-Germain, Thomas Armstrong
  2. Scalar - John Gardner, Neil Bunn
  3. Cray - Geert Wenes, Wulf Massel
  4. SGI - Paul Beswetherick, Angela Littrell
  5. IBM - Peter Madden, Josh Axelson, David Decastro
  6. Limitpoint Systems - David M. Butler
  7. Maxeler - Jacob Bower, Richard Veitch
  8. CGG - Alan Dewar
  9. UBC - Steve Cundy
  10. BG Group - Hamish Macintyre
  11. ConocoPhillips - Larry Morley
  12. SINBAD Consortium members

Program—HPC/Big Data Forum

Monday December 2nd (Empress Ballroom)

05:30—07:45 PM HPC/Big Data Forum (Beer & pizza)
05:30—05:35 PM Ian Hanlon Welcome to SLIM HPC Forum
05:35—05:45 PM Neil Bunn The challenge of sustainable academic HPC in Canada
05:45—05:50 PM discussion open to floor
05:50—06:00 PM Steve Cundy Understanding the business of science and cyber infrastructure challenges within academia
06:00—06:05 PM discussion open to floor
06:05—06:15 PM Felix J. Herrmann The need for sustainable HPC in academia
06:15—06:20 PM discussion open to floor
06:20—06:30 PM Hamish Macintyre International Inversion Initiative in Brazil — an example of an enabling public-private partnership
06:30—06:35 PM discussion open to floor
06:35—06:45 PM David M. Butler Effective data management for cross-disciplinary scientific computing projects
06:45—06:50 PM discussion open to floor
06:50—07:00 PM Thomas Armstrong Accelerating data centric workloads
07:00—07:05 PM discussion open to floor
07:05—07:15 PM Paul Beswetherick SGI produces 3D images of oil & gas below Earth’s surface to enhance research using energy efficient hardware with lower energy costs
07:15—07:20 PM discussion open to floor
07:20—07:30 PM Richard Veitch Maxeler and seismic applications
07:30—07:35 PM discussion open to floor
07:35—07:40 PM Geert Wenes Use cases for Fused HPC and BigData in seismic processing
07:40—07:45 PM Felix & Ian Thank you to speakers & participants, Follow up actions and closing comments

Motivation—HPC/Big Data Forum

Resources are increasingly harder to find as the easy targets have been delineated and exploited with relatively simple processing flows which, whilst still not easy because of the large data volumes involved, are achievable on relatively vanilla cluster hardware. With the easy pickings gone and demand for oil & gas increasing, portfolios of proven reserves are more difficult to come by. So we find ourselves challenged by having to create models and images in increasingly complex geological settings that include sub-salt, sub-basalt, intra-carbonate and unconventional (Shale).

While traditional—typically ray-based but also more modern wave-equation based imaging workflows—have served us well they tend to fail in these complex areas and this has led to the emergence of iterative optimization driven wave-equation based imaging and inversion technology that work on massive seismic data volumes. These recent developments where we are moving away from “one-data-pass-only” seismic data processing flows to “multiple-pass” inversion flows is driving a step change in the industry where we are seeing a explosion in demand for data-intensive HPC to handle cycle-intensive algorithms on extremely large seismic data volumes.

Aside from having to move towards exascale compute to grind through petabytes of field data, the industry is challenged by

  • replacing (single-iteration) processing flows by iterative optimization schemes where data is touched multiple times. This puts strains on IO and interconnects of even the largest and most state-of-the-art HPC systems

  • developing and maintaining complex code bases that are in sync with current developments in wave-equation based imaging (modelling, FWI, WEMVA, etc.), “big data” (Hadoop, map reduce, etc.), and HPC technology (connection fabric, accelerators, etc.)

  • intake of new external technology developed in academia and governmental laboratories into their scale-up workflows

  • finding talented personnel that understands, is familiar with, and productive in modern-day data-intensive HPC environments

Academia, on the other hand, are challenged by

  • having sustained access to HPC hardware and software capable of handling “big data” and “big models”—the latter refers to problems that involves many unknowns—which is quite different from “classical” big data and HPC. This challenge is compounded by “HPC fatigue” amongst funding agencies and lack of true cost of research models of most academic consortia to cover the cost of even modestly sized HPC

  • developing codes that maximize uptake by industry. While groups like SLIM have greatly benefited from recent developments of high-level programming languages (e.g. matlab and the parallel matlab toolbox) that allow for parallel implementations of complex (iterative) algorithms, the industry has been struggling with testing and industrializing these codes in their systems. This leads to missed opportunities and hampers the ability to innovate

  • develop, QC, and make reproducible parallel codes that are of an increasing level of complexity in an academic environment where there is little to no support/training/recognition for this sort of work. While we are seeing increased discussions on this important topic, there is no funding model in place to adequately support this type of activity

The goal of our HPC/Big Data Forum is to start a discussion on how address some of these challenges by

  • having and informal discussion by asking interested parties (SINBAD member representatives and “HPC” guests) to prepare short 5 minute presentations highlighting how their organizations are approaching these challenges with emphasis on the role industry-academic partnerships can play

  • shaping industry-academic collaboration in the form of targeted initiatives, such as the International Inversion Initiative (III) in Brazil, which includes a large (0.5 petaflop) HPC component and cost of research recovery, and a to-be-formed HPC Consortium with the aim to create a platform that

    • provides access for academic group such as SLIM to the latest hard/software technology

    • supports training of the next generation of computational geoscientists

    • takes a leading role in engaging in joint-venture projects encouraging involvement of HPC vendors and joint-industry projects (JIPs)

To facilitate the discussion, we aim to seek answers to some of the following questions:

Oil & Gas Industry

  1. What are the main bottlenecks of industry to roll out HPC technologies?

  2. What are the main challenges to validate and incorporate technology from JIPS into your software systems?

  3. To what extend is industry hampered by legacy codes or by engrained possibly outdated workflows that could stifle innovation?

  4. What sort of software license models would you like to see as part of JIPs?

  5. What type of hardware & software developments would you like to see to address challenges to make technologies, such as (poro-elastic) FWI commercially viable?

Contractor companies

  1. How does your organization view the development of public-domain software, developed by many, opposed to proprietary solutions, developed by few?

  2. How does your organization look at industry-academic partnerships, exclusivity, ownership of (to-be-developed) IP, etc. ?

  3. What sort of support can your organization deliver meeting the above challenges?

  4. What is the preferred license model for IP developed by academia and for IP developed as part of targeted projects?

HPC & software vendors

  1. How would your organization like to engage with the exploration seismological community and with SLIM in particular?

  2. How does your organization look at industry-academic partnerships, exclusivity, etc.?

  3. What sort of technologies do you envision as key innovations needed to meet the above challenges?

  4. What role do you see academia play in being the driving force behind these innovations?

Academia

  1. What form would a workable industry-academic partnership have to take to help you meet the above challenges?

  2. How do you envision to train your students and PDFs in HPC/big-data related technology?

  3. How are you planning to fund HPC for big data sustainably in a climate of decreasing funding levels and consolidation1 of HPC support in academia?

  4. What measures would you like to see industry, vendors, funding agencies, universities, take to maximize uptake of academic (e.g.,SLIM’s) technology by industry and society as whole?

  5. What sort of measures are you taking to manage the complexity and QC your code bases?

General

  1. What form would you like industry-academic partnerships to take? One option would be to invite specialized HPC hard- and software vendors with cutting edge technology to join the SINBAD Consortium. In return, SINBAD expects in-kind and match-able2 contributions.

  2. There is an increased interest in “big data” by the government and UBC and there is also a need to buildup capacity in British Columbia to steward and risk mitigate the exploration and production of natural gas (shale) involving large amounts of (geophysical) data collection and processing. Would there be an interest to join such an initiative?

  3. HPC and “big data” will not be going away and the next generation of HPC/big-data aware researchers will need to be trained to meet demand from industry. At UBC, we are well positioned to start a professional program designed to train graduate students. Would there be an interest in your organization to support this type of program?

  4. Wave-equation based technology is complex and calls for substantial resources that are typically beyond what single academic groups can handle. SINBAD’s involvement in III, a partnership between Imperial College London, UFRN in Brazil, UBC, and BG Group, is an example of a joint effort that will make us more productive by giving us access to significant HPC in Brazil and to joint research capacity. What role could your organization play in this type of initiative, which entails commitments that are more inline with the financial involvement of the Foundation CMG in the field of reservoir engineering. Is there an interest to further exploit possibilities for joint research involving multiple groups and a cost model that covers the true cost of research including support for HPC?

  5. HPC calls for domain specific expertise and training. While SINBAD has strong ties with UBC’s Computer Science Department meeting demands to train and supervise students in HPC related areas calls for additional faculty involvement possibly in the form of a Canadian Industrial Research Chairs. Would there be an interest to support an Industry Chair in computational seismology?

Format

The meeting will commence with an introduction and then proceed with three-by-three 5 minute presentations, interspersed with workshop dialogue from the group. The 5 minute time constraint is designed to limit the talks to a single main message. Please send Ian Hanlon a title and couple lines abstract so we can put the schedule together. Up to 6 HPC invitees and 3 SINBAD members will be allowed to speak.

Agenda

  1. Icebreaker: Pizza and Beverages (to be sponsored by HPC guests)

  2. Opening words

  3. Forum with 6 speakers in groups of 4 and 5 minutes talks each with 10 minutes of discussion

Follow-up plan

We will write up a short summary and an action plan that will be distributed amongst the attendees.


Abstracts—HPC/Big Data Forum

The challenge of sustainable academic HPC in Canada

Speaker: Neil Bunn, Scalar Architect

From the perspective of world-wide, large, sustainable HPC initiatives we see an increased focus on three key elements

  1. long term iterative systems development in vendor agnostic projects (NCSA, PRACE, etc.)
  2. challenge based procurements & evaluations (Grand Challenge Problems)
  3. domain / science area specific systems & collaborations (Bio-Pharma shared research clouds, CERN LHC, NOAA, etc.)

None of these approaches are particularly prevalent either in the funding model, or procurement model for HPC in Canada, and as such despite well intentioned funding (CFI), organizations (Compute-Canada) and declarations of in-kind contributions there is little progress towards modernizing our national view of HPC and limited true industry participation. To succeed in tackling these challenges and encourage broader participation of industry we need to change our evaluation and criteria for considering a system a ‘success’, and ‘productive’. Scalar would like to advocate for modifications on how proposals are evaluated, and partnerships formed."

Understanding the business of science and cyber infrastructure challenges within academia

Speaker: Steve Cundy, UBC Research Infrastructure Manager

Awaiting Abstract

The need for sustainable HPC in academia

Speaker: Felix J. Herrmann, SLIM

As societal demand for answers to complex problems is increasing funding for HPC/Big data is becoming more and more challenging. While public funding has significantly contributed to the success of HPC there is a clear “HPC fatigue” that is hampering funding and this may lead to missed opportunities. In this presentation, I will briefly discuss the need for HPC/Big data in our community and propose a model for sustained funding to meet this need.

International Inversion Initiative in Brazil — an example of an enabling public-private partnership

Speaker: Hamish Macintyre

During this talk, we will discuss the challenges & opportunities of a public-private HPC partnership in Brazil designed to enable the development of 3D full-waveform inversion technology.

Effective data management for cross-disciplinary scientific computing projects

Speaker: David M. Butler, Limit Point Systems, Inc.

Effective “one source of truth” data management for cross-disciplinary scientific and technical computing projects requires a data model and associated software specifically designed to support the entire scope of mathematically structured data found in scientific computing. The sheaf data model is the only data model that satisfies this requirement. An open source implementation of the sheaf data model is available at www.sheafsystem.org.

Accelerating data centric workloads

Speaker: Thomas Armstrong, Fusion-io Solution Architect

Fusion-io provides a high performance, low latency persistent memory platform optimized for highly random access to large data sets. This talk will give a brief overview of how this memory tier can be leveraged to efficiently explore large data sets with minimal overhead.

SGI produces 3D images of oil & gas below Earth’s surface to enhance research using energy efficient hardware with lower energy costs

Speaker: Paul Beswetherick, SGI

The era of ‘easy oil’ is over. One of the challenges of the oil and gas industry is to identify more complex prospects and optimize development to build more profitable projects. There is an effort to constantly increase the understanding of how oil and gas reservoirs are formed and seismic imaging technology is enabled. Geophysicists locate new targets deeper within the earth’s crust with SGI technology. SGI enables customers to generate 3D images of oil & gas below Earth’s surface. This enhances research by delivering critical information created by the world’s fastest and energy efficient HPC hardware.

Maxeler and seismic applications

Speaker: Richard Veitch

Use cases for Fused HPC and BigData in seismic processing

Speaker: Geert Wenes


  1. Funding in Canada from the Canadian Foundation for Innovation (more or less the only source for HPC funding) insists that hardware purchased by their grants is hosted by Compute Canada. While this model works for particular types of HPC, it is not conducive to the computational research of SLIM that involves concurrent development of parallel codes with lots of runs and parameter tweaking. Please refer to our latest progress report and to our most recent HPC proposal that can be found on our website for more details.

  2. Canadian funding agencies (NSERC and CFI) accept in-kind contributions in the form of significant discounts and tangible (time or other means) support and match the contributions with cash. This matching leads to increased leverage of the financial support SINBAD is receiving from industry.