Help - Search - Members - Calendar
Full Version: SOT: Programmers and Database Builders
914World.com > The 914 Forums > 914World Garage
McMark
I'd like to transcribe the PET file into a usable database. I've tried a few times to undertake this project myself, but it's daunting. So I realized that we could set up a site where our members could add a little bit of the data at a time. With everyone's help, it'll be done in no time. Having this data available will enable future developments, such as adding real pictures of the parts, better how-to threads, linking to part numbers in posts, etc. Andy and I have both planned on setting this up, but neither of us has actually found the time to get started. Since this doesn't really need to be tied into the 914World forum in any way, we don't need to build it on the forum servers. We can set this up independently and then import the completed database file when we're done...

Anyone interested in helping with this project? Here's a bit of overview on what I had planned:

***Split the PET file into JPG/GIF files***
I planned on splitting the file up into usable image files, which could also be stored in the database (it's own table?). The tricky part, is that besides splitting the PDF by page numbers, we also have to split SOME of the pages in half.

***Phase 1 Data Entry***
This one is more simple, just build a page that will display one of the PET images (not the exploded diagrams, just the parts list) and display a HTML form that matches the formatting, so a user could log in, and transcribe line by line as much of the image as they felt like. The form should save the data automatically (AJAX) so the user doesn't have to complete a page, or remember to click save, etc. This means that when a user 'starts work' they could be presented with a partially complete image to add data to. In that case, it would also be useful to add a checkbox at the end of each line used to indicate that the previous work has been double-checked and is correct. Once all of the data is entered, new requests for 'work' would be presented with completed images for double-checking. Once a line has been triple-checked, it could be locked as accurate. Eventually we would have all the data transferred and triple checked.

***Phase 2 Real World Descriptions***
Since a lot of the listing in the PET are translated from German incorrectly, it would be worthwhile to go through all the listings again to translate them. This would be a slightly different process from above. We would display an exploded diagram and the details for that image from the database, not from the PET images. The only form field would be an [i]additional[i] field for a new description. I think it would be useful to maintain and original listing of the description from the PET, as well as our own description. It would also be useful to collect multiple descriptions, which may not be shown publicly, but would be useful for searching for parts. For something like the 'Taco plate', it's listed in the PET as 'cover for oil sump' but everyone knows it as a taco plate. But it could also be called an oil temp sender plate. All of these descriptions would be useful for searching.

***Phase 3 Further Expansion***
This phase is probably where the project would end and the data integrated into the forum software, and future development handled by Andy or myself. But in order to describe the full process, I've included it here. This phase would be where members could add pictures of the parts (alone or on the car), as well as things like original finishes (paint, plating, etc), manufacture materials, possible replacements (using 911 Sport Mounts instead of Transmission Mounts).
SirAndy
- Is this in a PDF?
- If so, is the text on the right accessible?
- If so, it can be scraped, parsed and then put into a spreadsheet/database.

type.gif
type47
Have you seen the Parts Vault on this site (sub category of Originality and History)? Maybe something there related to your project...
7TPorsh
Maybe set it up like a Wikipedia site. Dump all the part numbers in and everyone has a shot at updating it.
gms
I put all the parts numbers and descriptions in a database about 20 years ago, I will see if I can find the floppy disk biggrin.gif that it is on
BeatNavy
This is a very cool idea.
QUOTE(SirAndy @ May 7 2015, 12:55 PM) *

- Is this in a PDF?
- If so, is the text on the right accessible?
- If so, it can be scraped, parsed and then put into a spreadsheet/database.

type.gif

It is PDF. It is the kind where the text is accessible. I have Acrobat (full) and tried exporting it, but it seems to be locked down in terms of what you are allowed to do. It would not let me save as a Rich Text File or perform any sort of export operation. I'm sure one could eventually figure out the password to lift the security settings, but I can't do anything with it (at least the version I have).
McMark
Here's the extracted text. The problem is that it's dumped one column at a time, without reference to which row it applies to. So the model designations all come out in a chunk, but not every row has a model designation. Not only that but I selected a page and tried to make sense of the order of the output, but couldn't.
stevegm
QUOTE(SirAndy @ May 7 2015, 12:55 PM) *

- Is this in a PDF?
- If so, is the text on the right accessible?
- If so, it can be scraped, parsed and then put into a spreadsheet/database.

type.gif



I agree. I can have one of the programmers that works for me do this if you like.
Andyrew
Jpegs can be converted to PDF pretty easily...

I do this all the time with PDF/TIF plan pages and door schedules... Extract the data into a workable excel sheet...

How many pages is this PET file?
McMark
QUOTE(Andyrew @ May 7 2015, 12:29 PM) *

How many pages is this PET file?

330, but accuracy is very important
bandjoey
I think it's a great idea but as usual with P----- proceed with caution before spending a lot of time. Remember Pelican used exploded pop up PET pages on their site and P----- made them take it down. Come up with a secret web name so it won't attract Google attention. Etc.
SixerJ
Really cool idea, a while ago I transcribed the 914-6 GT Parts manual to excel. More than happy to share / tack to the back end of the PET project?
McMark
icon_bump.gif Anyone want to take lead on this? I was hoping for some actual action/progress on this project. wink.gif
McMark
Okay, last icon_bump.gif

I thought we would get some help here. sad.gif
Mike Bellis
I'm unlocking it and running text recognition as we speak...

I mean, no that's not what I'm doing... biggrin.gif

My dumputer is running sloooww right now...
Mike Bellis
Sure is taking a long time... sad.gif

I'll attach it here when complete.
Mike Bellis
Here it is, in all it's glory.

Unlocked and word searchable. The original file was 8.4MB the file size expanded to 130MB after I finished. Here is a link to download it as it's too big for this site.

https://app.box.com/s/vupfsixyln4bfn3sya1wl1vaaf1noo8j

Now what? confused24.gif
Mike Bellis
Now I'm converting it to a word doc to see what it looks like.

I'm using a program called Bluebeam Revu. It is way more powerful than Acrobat and has a better word doc generator. I will post it to the same link when ready.
altitude411
cheer.gif cheer.gif cheer.gif Operation " black ops" is on! Way to go Mike.
ConeDodger
Mark,
I have an original Porsche hard copy if you want to start from scratch and re digitize it...
Kansas 914
Mike - well done! The search works like a charm in Acrobat X Standard.

Thanks for doing this. Cheers!
Mike Bellis
A word doc version is on the link. I am now trying to extract an excel version. We will see how this one turns out. I will add it when done.

Here's the link again for lazy people.
https://app.box.com/s/vupfsixyln4bfn3sya1wl1vaaf1noo8j
Mike Bellis
the excel version is taking longer. I will update later. Going to the range now to throw lead.
McMark
QUOTE(Mike Bellis @ May 9 2015, 08:05 AM) *

Here it is, in all it's glory.

Unlocked and word searchable. The original file was 8.4MB the file size expanded to 130MB after I finished. Here is a link to download it as it's too big for this site.

https://app.box.com/s/vupfsixyln4bfn3sya1wl1vaaf1noo8j

Now what? confused24.gif

The PET already was searchable... confused24.gif plus it's still not in a database that we can expand to include extra info. Maybe the excel version will be something we can use. Hopefully the order of cells makes some sense.
Mike Bellis
QUOTE(McMark @ May 9 2015, 10:17 AM) *

The PET already was searchable... confused24.gif plus it's still not in a database that we can expand to include extra info. Maybe the excel version will be something we can use. Hopefully the order of cells makes some sense.

But it was a secure pdf and not modifiable. Now it is an open file that can be modified.

The first attempt to convert it to a .xlsx file failed. I'm now trying a .xls version.
Mike Bellis
I added a text version and a very basic excel version.

Someone needs to go through the 25000 lies of text and insert a comma between items to separate the columns. hen it can be imported into excel as a smart spreadsheet and not just a list.
.
I'm not sure how motivated I am to go through that many lines of text
McMark
That looks a lot like what I posted a few days ago.
Mike Bellis
Well, here's how it works. Take the text file, align the data on the same line and add a comma between items of separate columns. Like this.
QUOTE

,-,tool jack,-,-,-
,914 721 001 10, tool bag comprising:, ,1, 914-6
,-,tool bag,-,1,-,-
,477 012 203 A, spark plug wrench, ,1, ,
,477 012 227, angular screw driver, ,1, ,
,999 571 045 02, rim wrench, ,1, ,
,999 195 011 02, double-end spanner8X9/ 110 MM , ,
,999 195 012 02, double-end spanner 10 X 11 / 126 MM, ,1,
,999 195 007 02, double-end spanner 12 X 13 / 140 MM, ,1,
,999 195 015 02, double-end spanner 17 X 19 / 187 MM, ,1,
,999 195 014 02, double-end spanner 14 X 15 / 157 MM, ,1,
,999 571 017 02, ring wrench, ,1, , ,

Each comma makes a new block for the data. When imported to excel it will look like this. If a comma is missing it will shift the data left or right depending on the missing comma.
Click to view attachment

I might be able to do all this but it will take some time as I run out of normal things in life to do. Like working, eating and sleeping.

What can be done with this info when complete? That's above my skill level to integrate this into a web page.
nsyr
I have a spider program that can put pdfs into a data base. I am out of town until next week but will try it when I get back.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2024 Invision Power Services, Inc.