SOT: Programmers and Database Builders, Last bump.... Bummer |
|
Porsche, and the Porsche crest are registered trademarks of Dr. Ing. h.c. F. Porsche AG.
This site is not affiliated with Porsche in any way. Its only purpose is to provide an online forum for car enthusiasts. All other trademarks are property of their respective owners. |
|
SOT: Programmers and Database Builders, Last bump.... Bummer |
McMark |
May 7 2015, 10:45 AM
Post
#1
|
914 Freak! Group: Retired Admin Posts: 20,179 Joined: 13-March 03 From: Grand Rapids, MI Member No.: 419 Region Association: None |
I'd like to transcribe the PET file into a usable database. I've tried a few times to undertake this project myself, but it's daunting. So I realized that we could set up a site where our members could add a little bit of the data at a time. With everyone's help, it'll be done in no time. Having this data available will enable future developments, such as adding real pictures of the parts, better how-to threads, linking to part numbers in posts, etc. Andy and I have both planned on setting this up, but neither of us has actually found the time to get started. Since this doesn't really need to be tied into the 914World forum in any way, we don't need to build it on the forum servers. We can set this up independently and then import the completed database file when we're done...
Anyone interested in helping with this project? Here's a bit of overview on what I had planned: ***Split the PET file into JPG/GIF files*** I planned on splitting the file up into usable image files, which could also be stored in the database (it's own table?). The tricky part, is that besides splitting the PDF by page numbers, we also have to split SOME of the pages in half. ***Phase 1 Data Entry*** This one is more simple, just build a page that will display one of the PET images (not the exploded diagrams, just the parts list) and display a HTML form that matches the formatting, so a user could log in, and transcribe line by line as much of the image as they felt like. The form should save the data automatically (AJAX) so the user doesn't have to complete a page, or remember to click save, etc. This means that when a user 'starts work' they could be presented with a partially complete image to add data to. In that case, it would also be useful to add a checkbox at the end of each line used to indicate that the previous work has been double-checked and is correct. Once all of the data is entered, new requests for 'work' would be presented with completed images for double-checking. Once a line has been triple-checked, it could be locked as accurate. Eventually we would have all the data transferred and triple checked. ***Phase 2 Real World Descriptions*** Since a lot of the listing in the PET are translated from German incorrectly, it would be worthwhile to go through all the listings again to translate them. This would be a slightly different process from above. We would display an exploded diagram and the details for that image from the database, not from the PET images. The only form field would be an [i]additional[i] field for a new description. I think it would be useful to maintain and original listing of the description from the PET, as well as our own description. It would also be useful to collect multiple descriptions, which may not be shown publicly, but would be useful for searching for parts. For something like the 'Taco plate', it's listed in the PET as 'cover for oil sump' but everyone knows it as a taco plate. But it could also be called an oil temp sender plate. All of these descriptions would be useful for searching. ***Phase 3 Further Expansion*** This phase is probably where the project would end and the data integrated into the forum software, and future development handled by Andy or myself. But in order to describe the full process, I've included it here. This phase would be where members could add pictures of the parts (alone or on the car), as well as things like original finishes (paint, plating, etc), manufacture materials, possible replacements (using 911 Sport Mounts instead of Transmission Mounts). Attached thumbnail(s) |
SirAndy |
May 7 2015, 10:55 AM
Post
#2
|
Resident German Group: Admin Posts: 41,626 Joined: 21-January 03 From: Oakland, Kalifornia Member No.: 179 Region Association: Northern California |
- Is this in a PDF?
- If so, is the text on the right accessible? - If so, it can be scraped, parsed and then put into a spreadsheet/database. (IMG:style_emoticons/default/type.gif) |
type47 |
May 7 2015, 11:02 AM
Post
#3
|
Viermeister Group: Members Posts: 4,254 Joined: 7-August 03 From: Vienna, VA Member No.: 994 Region Association: MidAtlantic Region |
Have you seen the Parts Vault on this site (sub category of Originality and History)? Maybe something there related to your project...
|
7TPorsh |
May 7 2015, 11:17 AM
Post
#4
|
7T Porsh Group: Members Posts: 2,691 Joined: 27-March 06 From: Glendale Ca Member No.: 5,782 Region Association: Southern California |
Maybe set it up like a Wikipedia site. Dump all the part numbers in and everyone has a shot at updating it.
|
gms |
May 7 2015, 11:35 AM
Post
#5
|
Advanced Member Group: Members Posts: 2,695 Joined: 12-March 04 From: Chicagoland Member No.: 1,785 Region Association: Upper MidWest |
I put all the parts numbers and descriptions in a database about 20 years ago, I will see if I can find the floppy disk (IMG:style_emoticons/default/biggrin.gif) that it is on
|
BeatNavy |
May 7 2015, 11:42 AM
Post
#6
|
Certified Professional Scapegoat Group: Members Posts: 2,924 Joined: 26-February 14 From: Easton, MD Member No.: 17,042 Region Association: MidAtlantic Region |
This is a very cool idea.
- Is this in a PDF? - If so, is the text on the right accessible? - If so, it can be scraped, parsed and then put into a spreadsheet/database. (IMG:style_emoticons/default/type.gif) It is PDF. It is the kind where the text is accessible. I have Acrobat (full) and tried exporting it, but it seems to be locked down in terms of what you are allowed to do. It would not let me save as a Rich Text File or perform any sort of export operation. I'm sure one could eventually figure out the password to lift the security settings, but I can't do anything with it (at least the version I have). |
McMark |
May 7 2015, 12:46 PM
Post
#7
|
914 Freak! Group: Retired Admin Posts: 20,179 Joined: 13-March 03 From: Grand Rapids, MI Member No.: 419 Region Association: None |
Here's the extracted text. The problem is that it's dumped one column at a time, without reference to which row it applies to. So the model designations all come out in a chunk, but not every row has a model designation. Not only that but I selected a page and tried to make sense of the order of the output, but couldn't.
Attached File(s) Extract_Text_Output.txt ( 777.66k ) Number of downloads: 1221 |
stevegm |
May 7 2015, 01:14 PM
Post
#8
|
Advanced Member Group: Members Posts: 2,111 Joined: 14-July 14 From: North Carolina Member No.: 17,633 Region Association: South East States |
- Is this in a PDF? - If so, is the text on the right accessible? - If so, it can be scraped, parsed and then put into a spreadsheet/database. (IMG:style_emoticons/default/type.gif) I agree. I can have one of the programmers that works for me do this if you like. |
Andyrew |
May 7 2015, 01:29 PM
Post
#9
|
Spooling.... Please wait Group: Members Posts: 13,376 Joined: 20-January 03 From: Riverbank, Ca Member No.: 172 Region Association: Northern California |
Jpegs can be converted to PDF pretty easily...
I do this all the time with PDF/TIF plan pages and door schedules... Extract the data into a workable excel sheet... How many pages is this PET file? |
McMark |
May 7 2015, 01:54 PM
Post
#10
|
914 Freak! Group: Retired Admin Posts: 20,179 Joined: 13-March 03 From: Grand Rapids, MI Member No.: 419 Region Association: None |
|
bandjoey |
May 7 2015, 02:00 PM
Post
#11
|
bandjoey Group: Members Posts: 4,925 Joined: 26-September 07 From: Bedford Tx Member No.: 8,156 Region Association: Southwest Region |
I think it's a great idea but as usual with P----- proceed with caution before spending a lot of time. Remember Pelican used exploded pop up PET pages on their site and P----- made them take it down. Come up with a secret web name so it won't attract Google attention. Etc.
|
SixerJ |
May 7 2015, 02:33 PM
Post
#12
|
Member Group: Members Posts: 448 Joined: 24-June 13 From: UK Member No.: 16,042 Region Association: England |
Really cool idea, a while ago I transcribed the 914-6 GT Parts manual to excel. More than happy to share / tack to the back end of the PET project?
|
McMark |
May 8 2015, 10:27 AM
Post
#13
|
914 Freak! Group: Retired Admin Posts: 20,179 Joined: 13-March 03 From: Grand Rapids, MI Member No.: 419 Region Association: None |
(IMG:style_emoticons/default/icon_bump.gif) Anyone want to take lead on this? I was hoping for some actual action/progress on this project. (IMG:style_emoticons/default/wink.gif)
|
McMark |
May 8 2015, 09:49 PM
Post
#14
|
914 Freak! Group: Retired Admin Posts: 20,179 Joined: 13-March 03 From: Grand Rapids, MI Member No.: 419 Region Association: None |
Okay, last (IMG:style_emoticons/default/icon_bump.gif)
I thought we would get some help here. (IMG:style_emoticons/default/sad.gif) |
Mike Bellis |
May 8 2015, 10:51 PM
Post
#15
|
Resident Electrician Group: Members Posts: 8,345 Joined: 22-June 09 From: Midlothian TX Member No.: 10,496 Region Association: None |
I'm unlocking it and running text recognition as we speak...
I mean, no that's not what I'm doing... (IMG:style_emoticons/default/biggrin.gif) My dumputer is running sloooww right now... |
Mike Bellis |
May 8 2015, 11:24 PM
Post
#16
|
Resident Electrician Group: Members Posts: 8,345 Joined: 22-June 09 From: Midlothian TX Member No.: 10,496 Region Association: None |
Sure is taking a long time... (IMG:style_emoticons/default/sad.gif)
I'll attach it here when complete. |
Mike Bellis |
May 9 2015, 09:05 AM
Post
#17
|
Resident Electrician Group: Members Posts: 8,345 Joined: 22-June 09 From: Midlothian TX Member No.: 10,496 Region Association: None |
Here it is, in all it's glory.
Unlocked and word searchable. The original file was 8.4MB the file size expanded to 130MB after I finished. Here is a link to download it as it's too big for this site. https://app.box.com/s/vupfsixyln4bfn3sya1wl1vaaf1noo8j Now what? (IMG:style_emoticons/default/confused24.gif) |
Mike Bellis |
May 9 2015, 09:31 AM
Post
#18
|
Resident Electrician Group: Members Posts: 8,345 Joined: 22-June 09 From: Midlothian TX Member No.: 10,496 Region Association: None |
Now I'm converting it to a word doc to see what it looks like.
I'm using a program called Bluebeam Revu. It is way more powerful than Acrobat and has a better word doc generator. I will post it to the same link when ready. |
altitude411 |
May 9 2015, 09:38 AM
Post
#19
|
I drove my 6 into a tree Group: Members Posts: 1,306 Joined: 21-September 14 From: montana Member No.: 17,932 Region Association: Rocky Mountains |
(IMG:style_emoticons/default/cheer.gif) (IMG:style_emoticons/default/cheer.gif) (IMG:style_emoticons/default/cheer.gif) Operation " black ops" is on! Way to go Mike.
|
ConeDodger |
May 9 2015, 09:59 AM
Post
#20
|
Apex killer! Group: Members Posts: 23,581 Joined: 31-December 04 From: Tahoe Area Member No.: 3,380 Region Association: Northern California |
Mark,
I have an original Porsche hard copy if you want to start from scratch and re digitize it... |
Lo-Fi Version | Time is now: 11th May 2024 - 06:32 AM |
All rights reserved 914World.com © since 2002 |
914World.com is the fastest growing online 914 community! We have it all, classifieds, events, forums, vendors, parts, autocross, racing, technical articles, events calendar, newsletter, restoration, gallery, archives, history and more for your Porsche 914 ... |