[Tool/Web] LDraw to Studio Exporter


RE: [Tool/Web] Get specific part by id out or LDraw with all needed files included
#7
So I kind of did a Python crash course and tried to do a small crawler for the unofficial stuff.

It looks pretty good so far, doing what I intended it to do, given my Python experience of 0, I consider this a major goal :-)

you pass it a URL of a part in the unofficial lib and it will compile a list of subparts required to be downloaded so that the part "works" You can then (in future, once I manage the download and zipping) pass the package to e.g. Studio

So it looks into the the section of the website "Required (unofficial) subfiles" and recursively walks through them.

Code:
import requests
import time
from bs4 import BeautifulSoup
from urllib.parse import urljoin

class CrawledPart():
    def __init__ (self, Part, PartLink, DATLink):
        self.Part = Part
        self.PartLink = PartLink
        self.DATLink = DATLink

class PartFetcher():
    def fetch(self, partno):
       
        # u9247 (lots of data)
        # u9576 (no subfiles)
        url="https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/"
        liburl="https://www.ldraw.org/library/unofficial/"
       
        time.sleep(0.5)
        print ("Part to Fetch: " + partno )
        r= requests.get(partno)

        # cut off the parent files at the marker for the RELATED subfiles, we do not need them
        doc = BeautifulSoup(r.text.split ("Related")[0],"html.parser")
        # doc = BeautifulSoup(r.text,"html.parser")

        # class .list contains the list of the required sub-parts
        link = doc.select (".list")

        if len(link)> 0:
            for subpart in (link[0].select(".header")):
                Part = subpart.attrs["href"]
                PartLink = urljoin (url, subpart.attrs["href"])
                DATLink = urljoin (liburl, subpart.attrs["href"].split("=")[1])
                crawled = CrawledPart(Part, PartLink, DATLink)
                crawledparts.append (crawled)
                print ("Subpart: " + PartLink)
               
                subparts=PartFetcher()
                subparts.fetch(PartLink)
               
        return crawledparts


and the call
Code:
subparts = PartFetcher()
crawledparts=[]
parturl= "https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/71603.dat"
for item in subparts.fetch(parturl):
    print (item.DATLink)

What remains:
- Include the root parts link as well
- download now the single files
- pack them in a ZIP with the correct folder structure
Reply
« Next Oldest | Next Newest »



Messages In This Thread
RE: [Tool/Web] Get specific part by id out or LDraw with all needed files included - by Gerald Lasser - 2022-07-13, 16:37

Forum Jump:


Users browsing this thread: 2 Guest(s)