Science Data Challenge 2

SDC2 ran from 1/2/21 to 31/7/21 and is now closed! 

And the winner is...

The leaderboard was frozen at the challenge close, showing that the winner is team MINERVA: many congratulations! Congratulations also to our runners up in a very close second place: team FORSKA-Sweden, and to all who took part. 

Reproducibility awards

We are delighted to announce the results of the SDC2 reproducibility awards. Six teams took part in this aspect of SDC2. 

EPFL     Bronze

FORSKA-Sweden     Silver

HI-FRIENDS     Gold

NAOC-Tianlai     Bronze

SHAO     Bronze

Team SoFiA     Silver

We warmly congratulate all six teams, with a special mention of HI-FRIENDS,  who provided an excellent Gold-standard solution containing many examples of Open Science best practice. We would also like to thank our expert panel and our partners at the Software Sustainability Institute (SSI).

Have a taste of what SKA data will be!

You are welcome to perform the challenge exercise by yourself, by using the Data and Scoring code available.

You are welcome to use the data for your own research, and to perform analyses and tests beyond the set challenge. Please acknowledge the use of these data as “SKAO data challenges, Science Data Challenge 2”. 


Welcome to the second SKA Science Data Challenge. Our latest challenge will see participants analyse a simulated datacube 1 TB in size, in order to find and characterise the neutral hydrogen content of galaxies across a sky area of 20 square degrees.

Neutral hydrogen – or HI – exists in large quantities beyond the visible edges of most star-forming galaxies. Emitting light at a fixed radio wavelength during occasional electron ‘spin-flips’, HI traces the rotation of galaxies, allowing astronomers to infer the amount of mass – both visible and dark – contained within. The unprecedented sensitivity of the SKA will be used to map HI out to the formation of the first galaxies, just 380,000 years after the Big Bang. This period, known as “Cosmic Dawn”, began some 13.5 billion years ago. The challenge dataset will be a simulation of an SKA HI observation up to a distance of 4 billion light years.

In order to provide such a large dataset for analysis, we have teamed up with high performance computing facilities around the world. Participants are invited to compete in teams and create accounts at one facility per team,  on which the data will be accessed and processed directly.  We thank all resource facilities very much for their generous support and collaboration.