LDraw.org Discussion Forums

Full Version: Lego AI
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello! I am an academic researcher leading a project to teach AI agents how to assemble and disassemble Lego models.  So far we have built a learning environment that exposes certain basic operations (adding/removing/moving bricks), and now we are looking for data!  We have already taken a look at the OMR (amazing work!) but would like to include as many model files as we can get our hands on.  I see that there are many files posted to various forums here, but we want to be good citizens, ask permission and not just scrape everything in sight.  With that in mind, I have a few questions:

A. Is it OK to write a script to crawl these forums just to find what's out there?
B. Once we have a list of what's out there, can we reach out to individual authors seeking permission to use the files in a way that isn't spammy or obnoxious?
C. Anyone reading this who is excited about the prospect of Lego building AI agents and has any reproductions of official sets or MOCs (or really any .ldr or .mpd files) that they would like to share with us, please get in touch, we would love to have this data!
D. If anyone has any suggestions of other forums to check out or better yet, bulk repositories of public model files, we would love to hear about them.  We have done some poking around, but I'm sure there are things we haven't found yet.
E. Are we correct in assuming that we can use the OMR model files as long as we abide by the attribution terms of the relevant creative commons license?

Details on how we will use the data:

A. We would like to be able to publicly share a bulk collection of these files with other researchers so that they can reproduce and improve upon our work.  This is important, and one of the main reasons we want to be careful and get permission from the authors of files that don't already come with explicit licensing information.
B. We will abide by all existing licensing terms that come with any content we make available.
C. When we release data, we will prominently feature attribution to the original authors of all included content (unless of course an author explicitly asks us not to attribute them).
D. This is an academic science project, and we will not use the data for commercial purposes.

We are very excited about the potential of Lego + Machine Learning, any other thoughts and suggestions for data collection are of course welcome.  Again I want to stress that we want to be ethical in how we gather data, and will do our best to do right by the community.  We know this material represents a lot of hard work, and we want to respect that effort.  Also, while our team loves Lego, we are all new to this community, so if there's a better place to post this, or if there is any way to improve how we go about this, please let us know.

Thanks!
-Aaron
Hi Aaron,

personally speaking I'm fine with the above but obviously it has to be discussed within the Steering Committee. However all files in the OMR are "Redistributable under CCAL version 2.0" as pointed out here. I'm sure you've already checked out Rebrickable, Eurobricks, Brickshelf or Brickhub.

Willy Tschager
LDraw Content Manager
(2021-04-16, 10:40)Willy Tschager Wrote: [ -> ]Hi Aaron,

personally speaking I'm fine with the above but obviously it has to be discussed within the Steering Committee. However all files in the OMR are "Redistributable under CCAL version 2.0" as pointed out here. I'm sure you've already checked out Rebrickable, Eurobricks, Brickshelf or Brickhub.

Willy Tschager
LDraw Content Manager

Hi, thanks so much for your reply!  Is there a good way to bring this up with the steering committee, or is the original post here enough to get their attention?  Just want to make sure I'm using proper channels.  Thanks!
(2021-04-16, 15:47)Aaron Walsman Wrote: [ -> ]Hi, thanks so much for your reply!  Is there a good way to bring this up with the steering committee, or is the original post here enough to get their attention?  Just want to make sure I'm using proper channels.  Thanks!

Hi Aaron,

as I'm a member of the Steering Committee I already posted it in our subsection.

w.
(2021-04-16, 15:59)Willy Tschager Wrote: [ -> ]Hi Aaron,

as I'm a member of the Steering Committee I already posted it in our subsection.

w.

Oh great, thanks so much!
(2021-04-16, 10:40)Willy Tschager Wrote: [ -> ]Hi Aaron,

personally speaking I'm fine with the above but obviously it has to be discussed within the Steering Committee. However all files in the OMR are "Redistributable under CCAL version 2.0" as pointed out here. I'm sure you've already checked out Rebrickable, Eurobricks, Brickshelf or Brickhub.

Willy Tschager
LDraw Content Manager

I have not built in any bot detection in BrickHub, so they are free to sweep it for the MPD files. They all have building instructions steps if that is what they want to do some research on. It was even my plan to do so myself in order to see if I can train an AI to create steps.
(2021-04-16, 19:57)Lasse Deleuran Wrote: [ -> ]I have not built in any bot detection in BrickHub, so they are free to sweep it for the MPD files. They all have building instructions steps if that is what they want to do some research on. It was even my plan to do so myself in order to see if I can train an AI to create steps.

Thanks for that Lasse!  BrickHub is on our list of places to start looking next.  Also, auto-generating steps sounds like a cool problem.  At the moment, we are working on vision and interaction tasks, but if you are interested in exploring that further, I'd be happy to share whatever resources I can.
If you need a single large model versus many little ones you could try this:

https://github.com/mjhorvath/Datsville
(2021-04-18, 5:32)Michael Horvath Wrote: [ -> ]If you need a single large model versus many little ones you could try this:

https://github.com/mjhorvath/Datsville

Wow, that's a great resource, thanks a lot!
Hi Aaron,

as long as the licence terms are respected and kept untouched and proper attribution is given, the SteerCo has no objections to the above.

However, we would love to learn more about you and we wonder if you're going to post the results of your research?

Willy Tschager
on behalf of the LDraw Steering Committee
(2021-04-20, 8:57)Willy Tschager Wrote: [ -> ]Hi Aaron,

as long as the licence terms are respected and kept untouched and proper attribution is given, the SteerCo has no objections to the above.

However, we would love to learn more about you and we wonder if you're going to post the results of your research?

Willy Tschager
on behalf of the LDraw Steering Committee

That's great, thanks so much!  We will start getting a list of files together and figure out next steps.

We have not made our results or code or any data public yet, but are planning to soon.  The first version of our paper, is currently in submission to a conference (this version uses files from the OMR), and we will be releasing a preprint version along with our code in the next few weeks once we finish cleaning a few things up.  I suspect that the first code release will only contain files from the OMR, but that we will follow up with more batches of data as we get access to them.  In general we hope that this will not simply be a one-time project, but a dataset/environment that will be useful for us and other researchers to explore various building and vision problems.

I will post back here when things go online though.  In general we like to get as many eyes on our research as possible!