Go to main content

Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2023-09-04 12:00:59

Gallex
Member
Registered: 2006-10-08
Posts: 1,289

export/convert/save all texts in a webpage into plain text

hi!

is there any easy solution or plugin to export/convert/save all texts in a webpage into plain text all at once? i need to send them to translation and translator asked it. i hope i did not explain very confusingly

Offline

#2 2023-09-04 12:33:23

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,599
Website

Re: export/convert/save all texts in a webpage into plain text

There are tools for crawling websites and saving the text of pages, but…

… an easy way is to make your own page template to do that. Make it loop over all the sections you need and output the fields as regular semantic HTML but without any layout … or perhaps just an easily identifiable separator between each article. Include any meta text or page titles in your output too (otherwise easily forgotten).

Then assign that template to a temporary section name, visit that section in your browser and you should see all your articles as a long text stream. Copy that and paste it into Word and Bob’s your uncle.


TXP Builders – finely-crafted code, design and txp

Offline

#3 2023-09-04 13:07:50

Gallex
Member
Registered: 2006-10-08
Posts: 1,289

Re: export/convert/save all texts in a webpage into plain text

jakob wrote #335706:

… an easy way is to make your own page template to do that. Make it loop over all the sections you need and output the fields as regular semantic HTML but without any layout … or perhaps just an easily identifiable separator between each article. Include any meta text or page titles in your output too (otherwise easily forgotten).

could you provide a code jakob? please

Last edited by Gallex (2023-09-04 13:08:13)

Offline

#4 2023-09-04 17:38:15

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,273
Website GitHub

Re: export/convert/save all texts in a webpage into plain text

Untested but this is what jakob was alluding to:

1. Make a new page called everything.
2. Make a new section called export and assign the everything page to it. No CSS needed.
3. In everything:

<article::custom section="your, section, names, to, export, here" limit="99999">
<txp:title wraptag="h2" />
<txp:body />
</article::custom>

Then view example.org/export and save the page output. Or simply give the link to your translator for them to work through online. That might get you close, albeit if you save the page, it’ll have some html in it too.

Sprinkle your own extra markup in the article custom tag if you want to output anything else like metadata or custom fields in each article.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Online

#5 2023-09-06 12:28:42

Gallex
Member
Registered: 2006-10-08
Posts: 1,289

Re: export/convert/save all texts in a webpage into plain text

Bloke wrote #335710:

Untested but this is what jakob was alluding to:

1. Make a new page called everything.
2. Make a new section called export and assign the everything page to it. No CSS needed.
3. In everything:

<article::custom section="your, section, names, to, export, here" limit="99999">...

Then view example.org/export and save the page output. Or simply give the link to your translator for them to work through online. That might get you close, albeit if you save the page, it’ll have some html in it too.

Sprinkle your own extra markup in the article custom tag if you want to output anything else like metadata or custom fields in each article.

that ease? perfect!
one question: images – how to disable them to show up?

Offline

#6 2023-09-06 14:52:46

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 4,599
Website

Re: export/convert/save all texts in a webpage into plain text

Untested but try adding escape="img" to your txp:body or txp:excerpt tag. In the docs on tag escaping there’s the option “some-tag”.

Otherwise, I guess you could hide all img tags with css, or run a little js script at the end of the page load to hide or delete all img tags (e.g. as described here but using remove() in place of …style.display = none).


TXP Builders – finely-crafted code, design and txp

Offline

#7 2023-09-06 15:21:03

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 11,273
Website GitHub

Re: export/convert/save all texts in a webpage into plain text

Gallex wrote #335713:

that ease?

Yes! Try doing that in WordPress, haha. Situations like this are where the power of Textpattern really shine.

jakob wrote #335714:

Try adding escape="img" to your txp:body or txp:excerpt tag.

That should work a treat. It’s what it is designed for.

And Gallex, if there are any other HTML tags you want to strip from the output, just comma-separate them in the escape attribute.

Last edited by Bloke (2023-09-06 15:23:50)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Online

#8 2023-09-06 18:35:57

mistersugar
Member
From: North Carolina
Registered: 2004-04-13
Posts: 141
Website

Re: export/convert/save all texts in a webpage into plain text

Just adding that I set up something similar years ago to give me the ability to copy my blog posts and then past them into the Apple Pages document in which I keep an archive of my writing. I print this archive so I have a physical record of my writing.

Thanks, as always, to the devs and Txp community.

Offline

#9 2023-09-08 09:26:17

Gallex
Member
Registered: 2006-10-08
Posts: 1,289

Re: export/convert/save all texts in a webpage into plain text

Bloke wrote #335715:

That should work a treat. It’s what it is designed for.
And Gallex, if there are any other HTML tags you want to strip from the output, just comma-separate them in the escape attribute.

thank you jakob and bloke!

Offline

#10 2023-09-12 20:17:24

ax
Plugin Author
From: Germany
Registered: 2009-08-19
Posts: 165

Re: export/convert/save all texts in a webpage into plain text

For a pure, non-textiled output, and without images, I used the following form, to be called from txp:article_custom:

TITLE: <txp:title /><br />
AUTHOR: <txp:author /><br />
DATE: <txp:posted format="%m/%d/%G %I:%M:%S %p" /><br />
PRIMARY CATEGORY: <txp:category1 /><br />
SECONDARY CATEGORY: <txp:category2 /><br /><br />

BODY:
<txp:msv_show_article_field name="Body" />
<br /><br />
<txp:if_excerpt>
EXCERPT:<txp:msv_show_article_field name="Excerpt" /></txp:if_excerpt>
<br /><br />
--------<br /><br />

With some post-processing and a textile2latex script (Redcloth), the output then is converted into a Latex file to generate a nicely formated PDF.

Offline

Board footer

Powered by FluxBB