2021-04-15, 18:56
Hello! I am an academic researcher leading a project to teach AI agents how to assemble and disassemble Lego models. So far we have built a learning environment that exposes certain basic operations (adding/removing/moving bricks), and now we are looking for data! We have already taken a look at the OMR (amazing work!) but would like to include as many model files as we can get our hands on. I see that there are many files posted to various forums here, but we want to be good citizens, ask permission and not just scrape everything in sight. With that in mind, I have a few questions:
A. Is it OK to write a script to crawl these forums just to find what's out there?
B. Once we have a list of what's out there, can we reach out to individual authors seeking permission to use the files in a way that isn't spammy or obnoxious?
C. Anyone reading this who is excited about the prospect of Lego building AI agents and has any reproductions of official sets or MOCs (or really any .ldr or .mpd files) that they would like to share with us, please get in touch, we would love to have this data!
D. If anyone has any suggestions of other forums to check out or better yet, bulk repositories of public model files, we would love to hear about them. We have done some poking around, but I'm sure there are things we haven't found yet.
E. Are we correct in assuming that we can use the OMR model files as long as we abide by the attribution terms of the relevant creative commons license?
Details on how we will use the data:
A. We would like to be able to publicly share a bulk collection of these files with other researchers so that they can reproduce and improve upon our work. This is important, and one of the main reasons we want to be careful and get permission from the authors of files that don't already come with explicit licensing information.
B. We will abide by all existing licensing terms that come with any content we make available.
C. When we release data, we will prominently feature attribution to the original authors of all included content (unless of course an author explicitly asks us not to attribute them).
D. This is an academic science project, and we will not use the data for commercial purposes.
We are very excited about the potential of Lego + Machine Learning, any other thoughts and suggestions for data collection are of course welcome. Again I want to stress that we want to be ethical in how we gather data, and will do our best to do right by the community. We know this material represents a lot of hard work, and we want to respect that effort. Also, while our team loves Lego, we are all new to this community, so if there's a better place to post this, or if there is any way to improve how we go about this, please let us know.
Thanks!
-Aaron
A. Is it OK to write a script to crawl these forums just to find what's out there?
B. Once we have a list of what's out there, can we reach out to individual authors seeking permission to use the files in a way that isn't spammy or obnoxious?
C. Anyone reading this who is excited about the prospect of Lego building AI agents and has any reproductions of official sets or MOCs (or really any .ldr or .mpd files) that they would like to share with us, please get in touch, we would love to have this data!
D. If anyone has any suggestions of other forums to check out or better yet, bulk repositories of public model files, we would love to hear about them. We have done some poking around, but I'm sure there are things we haven't found yet.
E. Are we correct in assuming that we can use the OMR model files as long as we abide by the attribution terms of the relevant creative commons license?
Details on how we will use the data:
A. We would like to be able to publicly share a bulk collection of these files with other researchers so that they can reproduce and improve upon our work. This is important, and one of the main reasons we want to be careful and get permission from the authors of files that don't already come with explicit licensing information.
B. We will abide by all existing licensing terms that come with any content we make available.
C. When we release data, we will prominently feature attribution to the original authors of all included content (unless of course an author explicitly asks us not to attribute them).
D. This is an academic science project, and we will not use the data for commercial purposes.
We are very excited about the potential of Lego + Machine Learning, any other thoughts and suggestions for data collection are of course welcome. Again I want to stress that we want to be ethical in how we gather data, and will do our best to do right by the community. We know this material represents a lot of hard work, and we want to respect that effort. Also, while our team loves Lego, we are all new to this community, so if there's a better place to post this, or if there is any way to improve how we go about this, please let us know.
Thanks!
-Aaron