Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
#1 2023-09-04 12:00:59
- Gallex
- Member
- Registered: 2006-10-08
- Posts: 1,308
export/convert/save all texts in a webpage into plain text
hi!
is there any easy solution or plugin to export/convert/save all texts in a webpage into plain text all at once? i need to send them to translation and translator asked it. i hope i did not explain very confusingly
Offline
Re: export/convert/save all texts in a webpage into plain text
There are tools for crawling websites and saving the text of pages, but…
… an easy way is to make your own page template to do that. Make it loop over all the sections you need and output the fields as regular semantic HTML but without any layout … or perhaps just an easily identifiable separator between each article. Include any meta text or page titles in your output too (otherwise easily forgotten).
Then assign that template to a temporary section name, visit that section in your browser and you should see all your articles as a long text stream. Copy that and paste it into Word and Bob’s your uncle.
TXP Builders – finely-crafted code, design and txp
Offline
#3 2023-09-04 13:07:50
- Gallex
- Member
- Registered: 2006-10-08
- Posts: 1,308
Re: export/convert/save all texts in a webpage into plain text
jakob wrote #335706:
… an easy way is to make your own page template to do that. Make it loop over all the sections you need and output the fields as regular semantic HTML but without any layout … or perhaps just an easily identifiable separator between each article. Include any meta text or page titles in your output too (otherwise easily forgotten).
could you provide a code jakob? please
Last edited by Gallex (2023-09-04 13:08:13)
Offline
Re: export/convert/save all texts in a webpage into plain text
Untested but this is what jakob was alluding to:
1. Make a new page called everything
.
2. Make a new section called export
and assign the everything
page to it. No CSS needed.
3. In everything
:
<article::custom section="your, section, names, to, export, here" limit="99999">
<txp:title wraptag="h2" />
<txp:body />
</article::custom>
Then view example.org/export and save the page output. Or simply give the link to your translator for them to work through online. That might get you close, albeit if you save the page, it’ll have some html in it too.
Sprinkle your own extra markup in the article custom tag if you want to output anything else like metadata or custom fields in each article.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
#5 2023-09-06 12:28:42
- Gallex
- Member
- Registered: 2006-10-08
- Posts: 1,308
Re: export/convert/save all texts in a webpage into plain text
Bloke wrote #335710:
Untested but this is what jakob was alluding to:
1. Make a new page called
everything
.
2. Make a new section calledexport
and assign theeverything
page to it. No CSS needed.
3. Ineverything
:
<article::custom section="your, section, names, to, export, here" limit="99999">...
Then view example.org/export and save the page output. Or simply give the link to your translator for them to work through online. That might get you close, albeit if you save the page, it’ll have some html in it too.
Sprinkle your own extra markup in the article custom tag if you want to output anything else like metadata or custom fields in each article.
that ease? perfect!
one question: images – how to disable them to show up?
Offline
Re: export/convert/save all texts in a webpage into plain text
Untested but try adding escape="img"
to your txp:body or txp:excerpt tag. In the docs on tag escaping there’s the option “some-tag”.
Otherwise, I guess you could hide all img tags with css, or run a little js script at the end of the page load to hide or delete all img tags (e.g. as described here but using remove() in place of …style.display = none
).
TXP Builders – finely-crafted code, design and txp
Offline
Re: export/convert/save all texts in a webpage into plain text
Gallex wrote #335713:
that ease?
Yes! Try doing that in WordPress, haha. Situations like this are where the power of Textpattern really shine.
jakob wrote #335714:
Try adding
escape="img"
to your txp:body or txp:excerpt tag.
That should work a treat. It’s what it is designed for.
And Gallex, if there are any other HTML tags you want to strip from the output, just comma-separate them in the escape
attribute.
Last edited by Bloke (2023-09-06 15:23:50)
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: export/convert/save all texts in a webpage into plain text
Just adding that I set up something similar years ago to give me the ability to copy my blog posts and then past them into the Apple Pages document in which I keep an archive of my writing. I print this archive so I have a physical record of my writing.
Thanks, as always, to the devs and Txp community.
Offline
#9 2023-09-08 09:26:17
- Gallex
- Member
- Registered: 2006-10-08
- Posts: 1,308
Re: export/convert/save all texts in a webpage into plain text
Bloke wrote #335715:
That should work a treat. It’s what it is designed for.
And Gallex, if there are any other HTML tags you want to strip from the output, just comma-separate them in theescape
attribute.
thank you jakob and bloke!
Offline
#10 2023-09-12 20:17:24
- ax
- Plugin Author
- From: Germany
- Registered: 2009-08-19
- Posts: 165
Re: export/convert/save all texts in a webpage into plain text
For a pure, non-textiled output, and without images, I used the following form, to be called from txp:article_custom:
TITLE: <txp:title /><br />
AUTHOR: <txp:author /><br />
DATE: <txp:posted format="%m/%d/%G %I:%M:%S %p" /><br />
PRIMARY CATEGORY: <txp:category1 /><br />
SECONDARY CATEGORY: <txp:category2 /><br /><br />
BODY:
<txp:msv_show_article_field name="Body" />
<br /><br />
<txp:if_excerpt>
EXCERPT:<txp:msv_show_article_field name="Excerpt" /></txp:if_excerpt>
<br /><br />
--------<br /><br />
With some post-processing and a textile2latex script (Redcloth), the output then is converted into a Latex file to generate a nicely formated PDF.
Offline