Syngenta AI Challenge - Harness data to help feed our rising population

Organized by mahimaA - Current server time: Jan. 16, 2018, 11:43 a.m. UTC

First phase

Registration
Jan. 28, 2017, midnight UTC

End

Competition Ends
June 2, 2017, 4 a.m. UTC

Soybean

As our world population increases and arable land for growing food decreases, AI technology such as data and mathematical modeling tools can help scientists grow more food using fewer resources. Our world needs innovative thinkers that can use AI tools to harness innovations in plant breeding. That’s why we are inviting you to help humanity face one of its toughest challenges: feeding a rising population, sustainably.

The 2017 Syngenta AI Challenge in the first of two collaborations between Syngenta and the AI for Good Foundation that focuses on bringing data and analytical modeling to the agriculture industry.

Background

Our world is running out of cropland. We’ll add 2 billion more people by the year 2050,1 but we’re currently using our arable land and water 50 percent faster than the planet can sustain.At the same time, the crops farmers plant face an unprecedented set of obstacles due to increasingly limited growing conditions and climate change.

How will we be able to grow enough food to meet world demand?

Today, the agriculture industry works to optimize the amount of food we gain from each plant by breeding varieties with the strongest, highest-yielding genetics. Scientists at research and development organizations like Syngenta create stronger plants by crossing two plant varieties as parents, and then selecting the best offspring over time to provide to farmers.

The current breeding process, however, is highly technical and cumbersome. One cycle takes about nine years, requires vast testing resources and results in only moderate yield increases (called genetic gain) in crops. It includes many failures along the way.

We believe data-driven strategies can help our industry breed better seeds, faster. Developing models that identify robust patterns in seed genetic data may help us more accurately choose seeds that increase the genetic gain of the crops we plant – and will help us address the growing global food demand.

Research Problem

Each seed variety of any plant has a unique genetic composition and must pass through a series of “stage gates” in order to be selected by scientists to breed (Figure 1). Each year, after the data from yield tests are analyzed, breeders decide whether to continue testing the variety or discard it. At the final stage gate is the decision to offer the seed variety to growers.

Selection Stages for 2014 Figure 1: Testing and selection scheme for the class of 2014 seeds.

Several hundred experimental soybean varieties were evaluated at up to 10 locations in 2012. After the experiments were harvested and the yield data collected, 15% of the varieties were selected to advance to the next year of testing, while the rest were discarded. In 2013, the selected varieties were evaluated at up to 30 locations with the top performing 5% selected for the final year of evaluation. Following testing in 2014, the top performing 5% of varieties were selected to become commercially available for farmers to buy. Though this is one way to select varieties, this method doesn’t show a variety’s true fitness once it is planted. Many varieties are not successful (non-elite) after they become commercial. We consider this a Type I error.

References

  1. United Nations Report, 2013
  2. Global Footprint Network

The AI Challenge

It is November 2014, and you are a breeder at the end stage of seed selection. You are responsible for selecting varieties for commercial release (Table 1). You have data from the current testing year (2014) and variety performance data from previous years.

Table with commercialization year, class and year of testing  Table 1: Data structure: commercialization year, class, and year of testing. The 2011 to 2013 classes can be used to train a model to make prediction for the class of 2014.

Goal of the Challenge

To develop a model that could be used to help scientists analyze large amounts of seed data more efficiently and effectively, leading to improvements in the world’s ability to grow more without using more resources.

Research Question

Which soybean varieties will perform better in farmers’ fields in 2015 and 2016?

Submission:

  1. Design a model that predicts the 2015-2016 yield of the seed varieties from the class of 2014 (a proxy of true fitness). Your yield predictions should be provided as a full codalab.org software/model pipeline that operates on the data set provided, and outputs a file with one prediction per line: VARIETY_ID, PREDICTION

You can refer to the process of building a submission here, or download a sample submission bundle

In a 5-20 page scientific write-up, explain the following:

  1. Predict and explain which seed varieties tested for commercialization in 2014 were truly ‘elite’
  2. Use information from previous years to provide estimates of Type I errors, and provide recommendations to reduce them
  3. Identify patterns in the genetic information that predict whether a variety is ‘elite’ and support how you arrived at your conclusion

Submission components 2-4 may refer to additional codalab.org pipelines and external models where appropriate. Submit final write-up through the AI Challenge participant portal at IdeaConnection.com, and additional models through this codalab.org competition page.

In order to help promote generalizable models, we plan to release training data for the Syngenta AI challenge in three stages.

Evaluation Criteria

The challenge is to identify patterns in the data that identify elite experimental varieties and expose the non-elite varieties prior to commercialization. Entries will be judged by the clarity of the solution, the technical strength of the methodology, the uniqueness of the approach, and the degree to which the evaluation data support your conclusions.

  • (40%) Scientific rigor of the solution, as shown and explained in an accompanying paper (up to 20 pages in length)
  • (40%) Effectiveness of the approach in identifying the best varieties for the test years (submitted in codalab.org)
  • (20%) Transparency, interpretation, and self-documentation of models

Syngenta AI Challenge 2016-2017

Data Use Agreement

  1. Agreement. The following Data Use Agreement (“the Agreement”) is between Syngenta Crop Protection LLC and its affiliates (together “Syngenta”) and the undersigned individual (“the Participant Researcher”). The agreement specifies the rights and responsibilities of participant researchers in the “Syngenta AI Challenge”.
  2. Research Data. The Participant Researcher acknowledges that by entering into this agreement they will receive individual non-transferrable access to certain confidential and highly proprietary information (“the Research Data”). This may include
    1. technical information, including data, patent, copyright, trade secret, and other proprietary information, techniques, sketches, drawings, models, inventions, know-how, processes, apparatus, equipment, algorithms, software programs, software source documents, and formulae related to the current, future and proposed products and services of Syngenta, or
    2. non-technical information relating to Syngenta's products, including without limitation pricing, margins, merchandising plans and strategies, finances, financial and accounting data and information, suppliers, customers, customer lists, purchasing data, sales and marketing plans, future business plans and any other information which is proprietary and confidential to Syngenta.
  3. Derivative Works. A derivative work is defined as the output of any transformation applied to the Research Data that would make reconstruction of the original impossible without knowledge of and access to the original Research Data. This includes the creation of analytical models through analysis of certain patterns in the data, the “vectorization” and “binarization” of the Research Data into a reduced-dimensional form, and other common methodologies in the Statistics, Artificial Intelligence, and Machine Learning research communities.
  4. Authorized Use.The Participant Researcher has the right to use the Research Data provided by Syngenta to perform scientific research, including:
    1. Creating and disseminating Derivative Works based on the Research Data, strictly for non-commercial, non-competitive, scientific uses;
    2. Merging the Research Data with other sources of information, whether open or proprietary, and publishing/disseminating the procedure by which this could be achieved by other Participant Researchers;
    3. Creating, publishing, and disseminating scholarly works based on scientific findings from the Research Data;
    4. Publishing and disseminating descriptive statistics of the Research Data;
    5. Disseminating the information and methodologies necessary to allow those with access to the Research Data to reproduce any and all scientific findings;
  5. Restricted Use. The Participant Researcher agrees to:
    1. Not sell, lease, distribute, or otherwise make available the Research Data outside of the Participant Researcher community;
    2. Not use or distribute the Research Data for commercial gain;
    3. Not use or distribute the Research Data for uses that are competitive with the business of Syngenta, or that may be damaging to Syngenta’s business or its clients’ businesses;
  6. Ownership. All interest in the Research Data remains the sole property of Syngenta. Intellectual Property derived from the Research Data will be the property of the Participant Researcher for non-commercial, non-competitive research-related uses as specified in Paragraph 4 (“Authorized Use”). The Participant Researcher agrees that all submitted original models, source code, and documentation, as well as any other original submitted artifacts of the competition, shall be released under the BSD-3-Clause Open Source license (referenced in https://opensource.org/licenses/BSD-3-Clause).
  7. Survival. The Participant Researcher understands that the obligations under Paragraph 5 ("Restricted Use") shall survive the termination of any other relationship between the parties.
  8. Term. This agreement will remain in force until the termination date of the Syngenta AI Challenge. Participant Researchers wishing to continue using the Research Data must obtain written confirmation and approval from Syngenta per the contact information in Paragraph 9 (“Contact”).
  9. Contact. All inquiries regarding this agreement and Participant Researcher use of the Research Data must be made in writing to:
    • joseph.byrum@syngenta.com
  10. Early Termination. Syngenta may, at its sole discretion, disqualify and terminate the participation of any Participant Researcher that it deems to have violated any of the terms of this agreement or the rules of the Syngenta AI Challenge. In such circumstance this agreement will be automatically revoked and the Participant Researcher agrees to remove all research data from their possession, or otherwise render it inaccessible.
  11. Termination Date. The Syngenta AI Challenge will end on December 31st, 2017.
  12. Governing Law. This Agreement shall be governed in all respects by the laws of British Columbia, Canada.

Awards

Awards

Prizes awarded will be $7,500 for first place, $5,000 for second place and $1,000 for third place. Finalists will be announced in June 2017 and invited to present at a global conference in fall 2017, where winners will be announced.

Rules

The competition is open to all participants 18 years of age or older, where allowable by law, except people or organizations who are employed by, or connected to seed biotechnology companies or their affiliates. No purchase is necessary to enter or win.

Submissions are due before midnight Eastern Time, June 1, 2017. Submissions must be made electronically via the submission form.

Submissions must be in Microsoft Word or LaTeX format using the appropriate submission template.

AI for Good Foundation is the sponsor of this contest and will be responsible for administering and judging the contest and choosing the winners. Syngenta will only have access to information necessary to assist scientific evaluation. Syngenta is not a sponsor of this contest.

Selected finalists will be invited to present their submission at a global conference in the early fall of 2017.

Prizes awarded will be $7,500 for first place, $5,000 for second place and $1,000 for third place. In the event that submissions of sufficient quality are not submitted to justify the awarding of all three prizes, the award committee reserves the right to eliminate any or all of the prize levels.

All decisions by the award committee are final. The award will be made in U.S. Dollars. All currency exchange costs are the responsibility of the winner.

The award may be subject to taxes depending on the winner’s country. Winners are responsible to comply with all tax requirements. The sponsor and Syngenta disclaim all liability for any prize recipient’s compliance with tax obligations.

Participants agree that personal data collected by AI for Good Foundation and IdeaConnection may be stored and displayed to the public within the context of the contest.

Participants represent and warrant that they have the right to submit their idea to the contest, that they are the sole and exclusive owner of the idea, and that the idea or submission of the idea does not violate any applicable law in any country.

Additionally, participants agree and warrant that they will not submit any idea that infringes on and 3rd party intellectual property, trade secret or confidentiality obligation.

To the maximum extent permitted by law, each participant indemnifies and agrees to keep indemnified and hold harmless Syngenta, AI for Good Foundation, and IdeaConnection, their affiliates, agents, directors, officers, employees, representatives and assigns harmless from and against any liability, claims, demands, losses, damages and costs resulting from any act, default or omission of the participant and/or a breach of any warranty set forth herein.

AI for Good Foundation and IdeaConnection shall have the right to use, modify and make available to the public the submitted idea for the purposes of judging, advertising, promotion, administration, testing and demonstration.

To the extent permitted by law the rules, terms and conditions for the contest are governed by the laws of Delaware without regard to conflict of law principles.

This contest is being administered by AI for Good Foundation.

Glossary of Terms

  • Advancement: The decision to advance a variety to the next stage of testing or commercialization.
  • Commercial variety: Soybean varieties that are being sold in the marketplace.
  • Commercialize: The decision to begin selling a variety.
  • Elite: Commercial varieties that have relatively high performance in farmers’ fields.
  • Experimental variety: A non-commercial soybean variety that is being evaluated in yield trials.
  • Grower: Farmers or farm managers that select, purchase, and plant commercial soybean varieties.
  • Performance: The amount of grain per unit of land that a soybean variety produces. Grain yield in soybeans in the United States is measured in bushels per acre.
  • Selection: The act of choosing a variety for advancement or commercialization.
  • Stage: The current testing level of a hybrid. It is analogous to grade levels in school.
  • Stage gate: The requirements of a variety to qualify to a higher stage of testing or commercialization.
  • Type I error A false positive. In this case, it refers to advancing or commercializing a variety that does not actually deserve to advance.
  • Yield Test: The experiment in which an experimental soybean variety is grown where grain production per variety is the primary characteristic of performance.
  • Parent line: Variety that is crossed with another to generate offspring.
  • Offspring: Varieties created from the cross of two parent lines.
  • SNP genetic marker: Genetic information of parents and offspring can be characterized by single nucleotide polymorphisms (SNP). An SNP is a single base-pair difference in the DNA sequence of individual variety.

FAQ

Submit your questions about the challenge here. Please check back frequently for new answers.

  1. How do I submit my solution?

    Written submissions must be in MS-Word or LaTeX format using the appropriate submission template. You can download the submission template here. Once your solution is completed, you can submit it on the submission page. Technical submissions should be made through competitions.codalab.org/, following the guidelines described here.

  2. Can several people enter as a team?

    Yes, a team may participate. The only requirement is that each person on the team must register and click the "Download data" link in the participant's dashboard to sign the non-disclosure agreement (NDA). Please make note of all the team members on your submission.

  3. I am overseas from the United States. Since the event will be held in the USA, will I be able to participate?

    Yes you are eligible to compete in the contest.

  4. Am I allowed to publish my entry in a journal?

    The work may be published per the Data Use Agreement you sign when downloading data for the competition.

  5. May I use the Syngenta AI Challenge data for educational purposes?

    Please refer to the Data Use Agreement for the Syngenta AI Challenge.

  6. I work for a large agricultural company, can I compete?

    Employees or people who are associated with large agricultural companies are not eligible to compete. Please contact us if you have any questions about whether or not you are allowed to participate.

Registration

Start: Jan. 28, 2017, midnight

Description: Register for the competition with your CodaLab worksheets account for access to competition data.

Challenge - Stage I

Start: Jan. 30, 2017, midnight

Description: Participants will receive data describing all varieties that were tested in the 2014 class for the year 2012, along with the experimental yield data for those varieties, geographic, soil and genetic characteristics.

Challenge - Stage II

Start: March 1, 2017, midnight

Description: Participants will receive data describing varieties of the 2014 class that cover all experimental trials for year 2013, indicating which of the Stage 1 varieties advanced to the following stage.

Challenge - Stage III

Start: May 1, 2017, midnight

Description: Participants will receive data describing varieties of the 2014 class that cover all experimental trials for the year 2014, indicating which of the Stage 1 and Stage 2 varieties advanced to the final stage.

Competition Ends

June 2, 2017, 4 a.m.

You must be logged in to participate in competitions.

Sign In

Top Three

Rank Username Score
No data