Extraction of information from born-digital PDF documents for reproducible research

Bogdan Wiszniewski; Jacek Siciarek

doi:10.12720/joams.4.3.238-244

Extraction of information from born-digital PDF documents for reproducible research

Abstrakt

Born-digital PDF electronic documents might reasonably be expected to preserve useful data units of their source originals that suffice to produce executable papers for reproducible research. Unfortunately, developers of authoring tools may adopt arbitrary PDF generation strategies, producing a plethora of internal data representations. Such common information units as text paragraphs, tables, function graphs and flow diagrams, may require numerous heuristics to handle properly each vendor specific PDF file content. We propose a generic Reverse MVC interpretation pattern that enables to cope with that arbitrariness in a systematic way. It constitutes a component of a larger framework we have been developing for making executable papers out of PDF documents without injecting in the PDF file any extra data or code

Cytowania

0

CrossRef
0

Web of Science
0

Scopus

Autorzy (2)

Cytuj jako

Pełna treść

pobierz publikację

pobrano 47 razy

Wersja publikacji: Accepted albo Published Version
Licencja: otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:: Publikacja w czasopiśmie
Typ:: publikacja w in. zagranicznym czasopiśmie naukowym (tylko język obcy)
Opublikowano w:: Journal of Advanced Management nr 4, strony 238 - 244,
ISSN: 2168-0787
Tytuł wydania:: ICIME 2014 : 2014 6th International Conference on Information Management and Engineering strony 238 - 244
Język:: angielski
Rok wydania:: 2016
Opis bibliograficzny:: Wiszniewski B., Siciarek J.. Extraction of information from born-digital PDF documents for reproducible research. Journal of Advanced Management, 2016, Vol. 4, iss. 3, s.238-244
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.12720/joams.4.3.238-244
Weryfikacja:: Politechnika Gdańska