OpenTalks #28


构建可重复研究的流程: Why, what, and how?

虽然研究者越来越强调实验结果的可重复性(reproducibility),但要想让研究的结果具有计算上的可重复性,其方法和流程远比想象的复杂。最近几年,高质量的期刊对可重复性的要求越来越高,对研究者也提出了新的挑战。仅仅公开共享数据并不意味着研究结果具有计算上的可重复性。在实践中,分析环境、编程语言/工具包/工具包的版本、分析流程及流程中的参数设置等,都可能造成研究结果的差异,降低推断的合理性和结论的有效性。从某种程度上来讲,可重复研究不仅涉及到数据共享(open data),也涉及到如何使用共享数据。

在当下的科研中,尤其是神经成像研究中,研究数据的共享大大促进了可重复性,也在推动公共科学基金效用的最大化。(当然,一项研究不会因为数据不公开而受到质疑,也不会因为使用了共享数据而丧失其独特性和学术价值。)以数据共享为基础的大型研究项目不胜枚举。例如Alzheimer’s Disease Neuroimaging Initiative (ADNI), Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD)等。

值得注意的是,open data并不是free data,尤其是涉及到生物信息的共享数据。使用共享数据必须遵守数据共享平台的规则,例如一些平台需要签署研究机构之间的数据使用协议,提交伦理审查报告等。不当使用共享数据引起的撤稿也时有发生,说明研究者对共享数据的使用规则方面仍然有不了解之处。

本次OpenTalk邀请了今年OHBM Traintrack “reproducible workflow”的主讲人Stephan Heunis针对公开数据的使用、如何构建可重复性工作及其相关工具进行分享。


题目:Reproducible scientific workflows: an overview of helpful tools and practices


报告人:Stephan Heunis

Stephan has an M.Sc. in Biomedical Engineering and Robotics from Stellenbosch University in South Africa. He worked as a commercial and software engineer for four years before moving to the Netherlands with the goal of conducting research in neuroscience. His doctoral research at the Eindhoven University of Technology and in collaboration with Philips Research focused on developing new acquisition and signal processing methods for functional magnetic resonance imaging (MRI) that allow improved tracking and visualisation of brain activity in real-time. Stephan is passionate about brains, accessible education, and making scientific practice more transparent and inclusive. Throughout his doctoral research, he has been active in the Dutch network of Open Science Communities and he founded OpenMR Benelux, a community working on wider adoption of open science practices in MRI research through talks, discussions, workshops and hackathons. Stephan has since continued this passion as a Research Data and Software Engineer at the Forschungszentrum Jülich in Germany, where he works on software solutions for neuroinformatics and decentralised research data management. He also holds post-doctoral positions in the SYNC developmental neuroscience lab at Erasmus University Rotterdam and Leiden University in the Netherlands.

摘要:Ideally, our scientific outputs, i.e. the data and results from which we draw our inferences and influence decisions, should be reproducible. Reproducibility allows replication and verification of our work, fosters public trust in science, promotes collaboration, and underlies scientific progress. Practically, however, reproducibility is much easier to talk about than to achieve. While challenges stem from various origins, a key challenge relates to the complexity of our data, how we choose to process it (and there are many possibilities!), and the high variance in terms of the infrastructure used by researchers globally. This talk covers some of these practical challenges, and will provide an overview of common tools available to researchers for addressing these challenges. Highlights include git and GitHub, requirements files for software scripts, containers, myBinder, and DataLad.


北京时间[GMT+8] 11月24日(周三) 21:00
欧洲中部时间[CEST] 11月24日(周三) 14:00
美国东部时间[EDT] 11月24日(周三) 08:00

美国太平洋时间[PDT] 11月24日(周三) 05:00

zoom信息:Meeting ID: 9139 4010 836


主持人:Han Zhang(A*STAR, Singapore)



张磊 (博士), University of Vienna, Austria

张晗(博士), A*STAR, Singapore

楊毓芳(博士), Freie Universität Berlin, Germany

杨金骉, MPI Psycholinguistics, the Netherlands

王鑫迪(博士), 北京慧脑云

王庆(博士), Montreal Neurological Institute, Canada

金海洋(博士), New York University Abu Dhabi, UAE

胡传鹏(博士), 南京师范大学

耿海洋(博士), 香港大学

金淑娴, Vrije Universiteit Amsterdam, the Netherlands

葛鉴桥 (博士), 北京大学

高梦宇(博士), University of Utah, USA


张文昊(University of Chicago, USA)


张磊(University of Vienna, AUT)


徐婷(Child Mind Institute, USA)


滕相斌(MPI for Human Development, DEU)




邸新(New Jersey Institute of Technology, USA)