RE: [Tool/Web] Get specific part by id out or LDraw with all needed files included
2022-07-13, 16:37
2022-07-13, 16:37
So I kind of did a Python crash course and tried to do a small crawler for the unofficial stuff.
It looks pretty good so far, doing what I intended it to do, given my Python experience of 0, I consider this a major goal :-)
you pass it a URL of a part in the unofficial lib and it will compile a list of subparts required to be downloaded so that the part "works" You can then (in future, once I manage the download and zipping) pass the package to e.g. Studio
So it looks into the the section of the website "Required (unofficial) subfiles" and recursively walks through them.
and the call
What remains:
- Include the root parts link as well
- download now the single files
- pack them in a ZIP with the correct folder structure
It looks pretty good so far, doing what I intended it to do, given my Python experience of 0, I consider this a major goal :-)
you pass it a URL of a part in the unofficial lib and it will compile a list of subparts required to be downloaded so that the part "works" You can then (in future, once I manage the download and zipping) pass the package to e.g. Studio
So it looks into the the section of the website "Required (unofficial) subfiles" and recursively walks through them.
Code:
import requests
import time
from bs4 import BeautifulSoup
from urllib.parse import urljoin
class CrawledPart():
def __init__ (self, Part, PartLink, DATLink):
self.Part = Part
self.PartLink = PartLink
self.DATLink = DATLink
class PartFetcher():
def fetch(self, partno):
# u9247 (lots of data)
# u9576 (no subfiles)
url="https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/"
liburl="https://www.ldraw.org/library/unofficial/"
time.sleep(0.5)
print ("Part to Fetch: " + partno )
r= requests.get(partno)
# cut off the parent files at the marker for the RELATED subfiles, we do not need them
doc = BeautifulSoup(r.text.split ("Related")[0],"html.parser")
# doc = BeautifulSoup(r.text,"html.parser")
# class .list contains the list of the required sub-parts
link = doc.select (".list")
if len(link)> 0:
for subpart in (link[0].select(".header")):
Part = subpart.attrs["href"]
PartLink = urljoin (url, subpart.attrs["href"])
DATLink = urljoin (liburl, subpart.attrs["href"].split("=")[1])
crawled = CrawledPart(Part, PartLink, DATLink)
crawledparts.append (crawled)
print ("Subpart: " + PartLink)
subparts=PartFetcher()
subparts.fetch(PartLink)
return crawledparts
and the call
Code:
subparts = PartFetcher()
crawledparts=[]
parturl= "https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/71603.dat"
for item in subparts.fetch(parturl):
print (item.DATLink)
What remains:
- Include the root parts link as well
- download now the single files
- pack them in a ZIP with the correct folder structure