Google Summer of Code 2018 Projects List

The following projects are available for GSOC 2018 students.

  • Project 1. Wildbook for Whale Sharks: Listen for YouTube Responses and Extract Wildlife Data (When/Where)
  • Project 2. Flukebook: Multi-species Configuration for Cetaceans with YAML
  • Project 3. Wildbook: Quickstart Wizard for Common Configuration Parameters
  • Project 4. Wildbook: Computer Vision Visualization (Help Us See What the Computer Sees!)
  • Project 5. Wildbook: Visualize Animal Co-Occurrence with D3.js
  • Project 6. Flukebook: Build an Intelligent Agent for a Flickr Community of Whale Enthusiasts
  • Project 7. Wildbook: i18n for a Multilingual Userbase
  • Project 8. Wildbook: Improved Data Navigation (“Breadcrumbs”, Object Hierarchy)

Related repo:

Mentor bios: All mentors are full-time Wild Me staff.

Project 1. Wildbook for Whale Sharks: Listen for YouTube Responses and Extract Wildlife Data (When/Were)

Summary: Use Java programming to listen for reply comments on whale shark-related YouTube videos, and run those comments through Google Cloud Vision, Google Machine Translation, and Natural Language Processing to help qualify whale shark sightings with relevant information about when and where the animal was sighted. This is a great opportunity to get exposure to multiple forms of A.I. (computer vision, machine translation, OCR, NLP, etc.) while developing unique software that aids wildlife researchers in the field.

Difficulty: Moderate (High concept, intricate work but low volume)

Description: In June 2017, we deployed a novel use of our existing computer vision and basic artificial intelligence tools (see to pro-actively data mine YouTube for whale shark sightings and replace human labor with automation and an A.I. that interacts with YouTube posters. The software performs the following tasks:

  1. Download each 24 hours of YouTube videos tagged\titled “whale shark” (English or Spanish).
  2. Use computer vision to extract keyframes and detect and cluster those in which a whale shark has been identified. A trained neural network is used to make this decision.
  3. Send video keyframes to Google’s Cloud Vision API to extract text from the frame (e.g., embedded dates) through optical character recognition (OCR)
  4. Take OCR text output, video title, and video description and send it as a single string to Google Translate for language detection (neural machine translation). If non-English language is detected, translate to English.
  5. Use Natural Language Processing (NLP) - another form of A.I. - to detect the date of the whale shark sighting (e.g., “yesterday”, “last week”, “11/13/2014”, etc.)
  6. Use string matching to try to determine where the sighting occurred based on our existing location categorizations in
  7. Organize the relevant keyframes and metadata (where and when) and submit them altogether to
  8. Automatically post questions (where? when?) to posters if metadata is missing.
  9. Post scientific decisions (“This is whale shark A-100 and here’s a link to everything we know about it!”) back to YouTube posters on YouTube.

Expected outcomes: We need your help with this step above: “Automatically post questions (where? when?) to posters if metadata is missing.” Currently, we have no automated way of listening to those comments and must manual review them for date and location information. We need you to create a listening service that monitors identified videos for comment replies and feeds those into our natural language processing pipeline automatically, allowing needed information to be automatically fed into the whale shark research community and then thanking OPs for their follow up to our questions!

Figure 1. Individual whale sharks can be identified from tourist videos

Skills required/preferred: Java programming, Google Machine Translation and YouTube API experience.

Possible mentors: Jason Holmberg (founder) and/or Mark Fisher, PhD. We will orient you to the existing APIs and help you build your listening service.

Project 2. Flukebook: Multi-species Configuration for Cetaceans with YAML

Summary: Some of our Wildbooks are used to study multiple species as they co-occur in nature. For each species, researchers may record different types of measurements and different genetic values. Help us better support researchers by converting Wildbook from key-value pair configuration to a hierarchical format with YAML that allows for configuration by species.

Difficulty: Easy

Description: The Wildbook project grew out of a single species research effort to apply computer vision to whale sharks. As other researchers asked us to use the software, we open sourced the code and allowed its behavior to change according to simple key-value pairs in .properties files. However, the project is now leading the way in studying multiple species simultaneously, and not every chosen configuration applies to all species. Patterning codes, types of body measurements, microsatellite DNA markers, types of photo keywords, and other values differ from species to species. We need your help to re-write our properties loader and related classes to support hierarchical, species-specific configuration in YAML, allowing Wildbook to flexibly define its displayed data and functions based on the species under study. HTML, CSS and JavaScript work is also needed to allow web pages to adapt to the loaded configuration dynamically.

Expected outcomes: Use Java to re-write out configuration loader to support YAML. Collect requirements for a multi-species cetacean project ( and create the required configurations in YAML to reflect real world research data collection of wildlife biologists studying humpback whales and sperm whales in the field. Use JavaScript and CSS to make pages responsive to species-specific configurations.

Possible mentors: Jason Holmberg and/or Drew Blount ( lead developer).

Project 3. Wildbook: Quickstart Wizard for Common Configuration Parameters

Summary: Use your JavaScript, CSS, HTML, and Java skills to build a quick start wizard for Wildbook, helping biologists to quickly configure it for their species upon first startup.

Difficulty: Easy. Best done with Project 2 above (YAML configuration).

Description: The behavior and configuration options for Wildbook are currently defined in properties files, which are generally not understood by nor well described for biologists seeking to use Wildbook. Use your programming skills to develop a slick, question-based interface that describes needed configuration choices to biologists first starting Wildbook and guide them through the process of making good choices before saving their results to the configuration resources for persistence.

Expected outcomes: When Wildbook first starts, your configuration wizard quickly and easily helps biologists configure Wildbook for their wildlife species and research techniques of interest.

Possible mentors: Jason Holmberg, Colin Kingen

Project 4. Wildbook: Computer Vision Visualization (Help Us See What the Computer Sees!)

Summary: The Wildbook Image Analysis pipeline can detect instances of animal species in images and even detect the individual animals from each detection. Use your RESTful JavaScript skills to render detected objects and their weights in the browser for user review.

Difficulty: Moderate (High concept, moderate coding complexity)

Description: Wildbook uses the OS library PhotoSwipe to display images to users review sighting records, search results, individual life histories, etc. Many of these images have been run through computer vision, and objects, object types, and weights have been determined from the image content. Each detection inside an image is called an “annotation”. However, these annotations exist in the Image Analysis database but are not directly displayed to users to evaluate and potentially provide feedback to the computer vision algorithms (e.g., “No, this is not a giraffe in this picture.”). We want you to use your JavaScript skills to modify PhotoSwipe to make RESTful/JSON calls to our Image Analysis server and render annotation details to users in the browser.

Expected outcomes: Your awesome JavaScript consults Image Analysis for displayed images in Wildbook, determines if they have been run through computer vision (or if they are currently running), and displays detected “annotations” with bounding boxes, types, and weights.

Possible mentors: Jon Van Oast

Project 5. Wildbook: Visualize Animal Co-Occurrence with D3.js

Summary: Wrestle D3.js to the ground and make it render animal co-occurrence diagrams for search results in Wildbook.

Difficulty: Hard (but awesome!)

Description: D3.js has a bit of a learning curve, but once you know it, it is an amazing visualization tool for complex relationships. We want you to help wildlife biologists visualize the co-occurrence and social relationships of their study animals. A “picture speaks a thousand words” and might lead to new insights into how animals migrate and interact. Create a force-directed D3.js layout (or choose a better layout!), and use your strong JavaScript and RESTful/JSON skills to render relationships between individual animals in the browser. Expect biologists' minds to be blown!

Expected outcomes: When an Individual Search is executed in Wildbook, a D3.js force-directed graph of co-occurrence relationships is visible as a results option in Wildbook.

Possible mentors: Jason Holmberg

Project 6. Flukebook: Build an Intelligent Agent for a Flickr Community of Whale Enthusiasts

Summary: Apply your Java skills, learn the Flickr API, and blend multiple forms of A.I. to create an intelligent agent that automatically helps a community of whale watching naturalists in Flickr rapidly identify the humpback whales they photograph and share online.

Difficulty: Hard (but seriously…A.I. experience is a great thing for a resume), integrating computer vision calls (existing APIs), NLP, neural machine translation, OCR, etc. Google Cloud Vision and Machine Translation APIs are used for processing visual and textual data. We have built similar agents for YouTube and Twitter, so we know we can help you succeed.

Description: The Flukematcher Flickr community is a group of humpback whale watching enthusiasts manually comparing photos pg humpback flukes, which are individually identifiable by their white and black contrast and by the unique, soundwave-like signature of their trailing edges. We already have the computer vision technology to match these flukes in images. We need you to apply your programming skills to build an intelligent agent that listens for new photo posts to the Flukematcher community and to run those posts automatically through our computer vision and NLP pipelines, quickly responding back to this community with the answer to “Which whale does this fluke belong to?” Social media APIs can be squirrely, so you'll need to build original code that connects Flickr to Wildbook, but the result will be a truly amazing and interactive blend of humans and A.I. studying whales!

Expected outcomes: As a result or your amazing coding skills, the Flukematcher FLickr community has an intelligent agent suggesting the identities of the humpback whales sighted by a community of whale watching naturalists and guides.

Possible mentors: Jon Van Oast and Jason Holmberg, who have succeeded in deploying a similar agent for YouTube videos.

Project 7. Wildbook: i18n for a Multilingual Userbase

Summary: Review i18n static code analysis results and help us find and fix embedded strings, locale-sensitive methods, and other i18n issues to improve the usability of Wildbook in multiple languages.

Difficulty: Easy (or difficult if you want bonus points for helping us support right-to-left languages, such as Arabic)

Description: Wildbook is used by wildlife biologists across the globe in a variety of languages. While we have externalized strings and worked hard to ensure UTF-8 text is used universally, we haven't caught all of the potential bugs that can appear in non-English usage if Wildbook. Use your Java and JavaScript programming to help us expand and improve our Spanish, French, and Finnish support for Wildbook to other languages by reviewing i18n static code analysis results and removing embedded strings, changing to UTF-8 compliant methods, and reviewing localized UI interfaces for a good experience in non-English left-to-right languges.

Expected outcomes: Your awesome efforts have fixed i18n bugs in Wildbook, giving biologist users in non-English languages a great, localized experience with the software.

Possible mentors: Jason Holmberg

Project 8. Wildbook : Improved Data Navigation and Visualization ("Breadcrumbs", Object Hierarchy)

Summary: Use Java programming to retrieve objects, determine their relationships and build a intuitive visual navigation system at the top of primary pages using Javascript, CSS and HTML.

Difficulty: Easy

Description: The hierarchy of objects in Wildbook is long, and can be challenging to navigate and visualize in complex projects. We will implement a “breadcrumbs” style UI at the top of main data pages to improve this. At its greatest depth, a user should be able to see the chain of relationship from a individual animal to an “Encounter” with an animal at a certain time and place, to an “Occurrence” or group sighting of animals, all the way back to the survey where the data was recorded. A the top level, a user should be able to see from the Survey all the associated data points, i.e. this survey has 14 Occurrences, 39 Encounters, 11 Identified individuals and provide navigation options. This simple UI will reside at the top of high usage pages and improve workflow for developers and researchers across multiple Wildbook implementations.

Expected outcomes:

  1. Use Java to retrieve objects and relationships from the database in an efficient way that does not significantly hurt load times.
  2. Create a responsive visualization bar for these relationships using CSS and HTML and Bootstrap.
  3. Create lightweight navigation between these levels, handling presence and lack thereof using Javascipt, jQuery.
  4. Possible new SQL/Datanucleus queries to improve load times.

Skills required/preferred: Java, CSS, HTML. Nice to have: SQL (Postgres), Datanucleus.

Possible mentors: Colin Kingen, Wildbook Engineer.