Textpattern CMS support forum
You are not logged in. Register | Login | Help
- Topics: Active | Unanswered
Re: Situation report for Textpattern 4.6 and beyond
Maybe Oleg and Stef could jointly come up with a model for how to best do custom fields. Seems like you guys have some firm opinions on how it should work.
If you need me to step in later in the development cycle and do some UI stuff, then let me know.
Offline
Re: Situation report for Textpattern 4.6 and beyond
Hi Phil, amazingly, if we stored all custom stuff in one db field as
xml
<input name="custom_1" size="32" id="custom-1" type="text" value="something" />
<input name="custom_2" size="32" id="custom-2" type="text" value="something else" />
<input name="custom_3" id="custom-3" type="checkbox" checked="checked" />
...
the UI work would be minimal. But you have time to surf a half of the Alps (I’m jealous!) before my opinion becomes firm (and it’s Stef who decides anyway).
Offline
Re: Situation report for Textpattern 4.6 and beyond
etc wrote #279522:
The question is the MySQL XML functions performance on large datasets.
And how widely supported this is on other database platforms through PDO, given that we’re looking at going that way fairly soon now that the mysql_* functions are deprecated in PHP.
philwareham wrote #279523:
Seems like you guys have some firm opinions on how it should work.
I dunno about firm! Random ideas, that seem to work, yes. But it’s early days and I’m very open to suggestion on how best to do all this.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Situation report for Textpattern 4.6 and beyond
Relational, relational, relational. I can say that we are not going to use MySQL XML functions or any other non-standard SQL functions, or store the data as blobs (blobs are useless). If the new custom field system is not relational, I’m going to cry.
To bring everyone up-to-date and bring my ideas worth, there is quotation from our old “About the new custom field system” topic posted on our private mailing list:
Jukka on December 6, 2013 02:20:37 PM GMT+02:00 wrote:
Sup,
Pete asked on the forum about the database column types http://forum.textpattern.com/viewtopic.php?id=40313 which reminded me; when/if you are doing the new custom field system, remember to take the data types in to consideration.
The relation system must be able to store different data-types (ints, varchar and blob) and it should default to correct type. For instance, it must not be using TEXT where the data doesn’t require that. If it does use TEXT columns across board, we will see huge drop in performance. The options are either limiting the content to VARCHAR or having more complex date-type system.
Not that our queries are currently all that proper (there is that now() in there – it really, really, should be persistent timestamp, e.g. the last-modificaiton timestamp we already have — that would be super easy to patch in too fortunately).
Robert on December 6, 2013 02:34:42 PM GMT+02:00 wrote:
I’m currently leaning towards a general key/value store, roughly like so:
CREATE TABLE txp_keyvalue ( id int(12) NOT NULL AUTO_INCREMENT,
[…TBD: foreign key as appropriate…]
name varchar(50) NOT NULL, value text NOT NULL, created timestamp DEFAULT CURRENT_TIMESTAMP, updated timestamp NULL DEFAULT NULL, expires timestamp NULL DEFAULT NULL, PRIMARY KEY (id), INDEX expires (expires), INDEX name (name), FULLTEXT INDEX value (value)
)
.
I don’t see how this schema would support data types for the ‘value’ column.
Jukka on December 6, 2013 03:31:12 PM GMT+02:00 wrote:
I don’t think you want to use TEXT columns. If you do, all joins (meaning all article tag calls) will write a temporary table to the disk and you can not do any article operation involving custom fields in memory. This is stupidly slow compared to operations done in memory.
Joins and sorting cause temporary tables, and TEXT columns can not be mapped to memory. Instead they are stored separated, buffered and all query actions are done on disk. Buffering will also lead to memory issues. E.g. currently we can easily select all articles by custom field. Later on, well, TEXT columns are buffered.
To optimize performance, you will want to default to VARCHAR and INT types where ever possible. Only if content needs to be super-long should it use TEXT or any other BLOB type.
You may want to look into structure such as:
- txp_key_definition
- txp_key_item (associations)
- txp_key_value_int
- txp_key_value_varchar
- txp_key_value_blob
Tables etc.
You might store associations in the values itself, and pool the definition to find the storage type. If you use separate relational table, you can even use BTREE too, depending if you can eliminate race conditions and write IDs in order. txp_key_item.from … txp_key_item.to is the values you will want. It’s super fast, but slightly more complex.
You can also store the different data types in the same table, but then you will be wasting memory if a single value out of 250 selected uses BLOB. Unless of course, you do multiple joins based on the type which is an option too.
I would write the system myself (because it’s beginner stuff), but I’m too busy. The quoted thread outlines why you can’t use blobs; the biggest thing you can map to memory is a varchar, resulting in slow IO.
Offline
Re: Situation report for Textpattern 4.6 and beyond
Bloke wrote #279525:
And how widely supported this is on other database platforms through PDO, given that we’re looking at going that way fairly soon now that the mysql_* functions are deprecated in PHP.
Gocom wrote #279526:
I can say that we are not going to use MySQL XML functions or any other non-standard SQL functions, or store the data as blobs (blobs are useless).
Thanks for the explanation, one option less to explore, we’re progressing. I agree XML storage can not be in the core, but a plugin could test it for categories. No blob is needed, varchar(~65535)
should be enough to call them “unlimited”.
I don’t think you want to use TEXT columns. If you do, all joins (meaning all article tag calls) will write a temporary table to the disk and you can not do any article operation involving custom fields in memory.
Aren’t Body
and Excerpt
concerned?
Offline
Re: Situation report for Textpattern 4.6 and beyond
Sorry to disappoint everyone, but this is the current database structure I have:
# Dump of table txp_meta
# -> name: sanitized name, no special chars. Title provided via i18n
# -> content_type: number representing which type this CF belongs. 1=article, 2=image etc.
# Can be extended by plugins. Not considered 1 CF -> many content_types yet. Ideas?
# -> data_type: value matches the name of the table to which this field's values are stored
# (e.g. 'int', 'varchar', 'datetime', etc. Easy to extend by plugins)
# -> render: callback function to render the field
# -> group: arbitrary group name to assign the field to
# -> position: order of the field. Might be better named 'ordinal'
# -> created, modified, expires: self explanatory
# ------------------------------------------------------------
CREATE TABLE `txp_meta` (
`id` int(12) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`content_type` smallint(6) DEFAULT NULL,
`data_type` varchar(32) NOT NULL,
`render` varchar(255) DEFAULT 'text_input',
`group` varchar(255) DEFAULT NULL,
`position` smallint(5) unsigned DEFAULT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NULL DEFAULT NULL,
`expires` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_content` (`name`,`content_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# Dump of table txp_meta_options
# Option names for select lists, radio sets, etc. Titles from 18n
# One row per item, foreign keyed from txp_meta.id
# ------------------------------------------------------------
CREATE TABLE `txp_meta_options` (
`meta_id` int(12) NOT NULL,
`option` varchar(255) DEFAULT '',
`position` smallint(5) NOT NULL,
PRIMARY KEY (`meta_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# Dump of table txp_meta_value_datetime
# ------------------------------------------------------------
CREATE TABLE `txp_meta_value_datetime` (
`meta_id` int(12) NOT NULL,
`content_id` int(12) NOT NULL,
`value` datetime DEFAULT NULL,
UNIQUE KEY `meta_content` (`meta_id`,`content_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# Dump of table txp_meta_value_int
# ------------------------------------------------------------
CREATE TABLE `txp_meta_value_int` (
`meta_id` int(12) NOT NULL,
`content_id` int(12) NOT NULL,
`value` int(12) DEFAULT NULL,
UNIQUE KEY `meta_content` (`meta_id`,`content_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# Dump of table txp_meta_value_varchar
# ------------------------------------------------------------
CREATE TABLE `txp_meta_value_varchar` (
`meta_id` int(12) NOT NULL,
`content_id` int(12) NOT NULL,
`value` varchar(255) DEFAULT NULL,
UNIQUE KEY `meta_content` (`meta_id`,`content_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Repeat txp_meta_value_* for each major table type in use...
Notes:
- The value tables only get created when you actually use them to store something so, out of the box, there’s only one: txp_meta_value_varchar.
- The schema’s not fully normalized because the extra complexity and joins involved didn’t seem to deliver any real world benefits that some coding conventions couldn’t do better. Willing to be proved wrong.
- The time fields are synchronised with the article timestamps by eliminating all that SQL
NOW()
rubbish. - I was under the impression that the biggest varchar was 512 chars, since about v5.0.3 or something, but maybe that’s outdated now and we really can ditch Text types. Would be brilliant if we could, because they really are a pain for performance.
That structure’s pretty easy to query — though does require a tiny bit of post processing in PHP to juggle the multiple rows returned into one row per article, for pagination purposes. Perhaps someone with better SQL skillz than me (so, like, a geriatric badger) can get the data out directly, without the extra PHP step, in the regular article format:
ID, Title, Body, Body_html, Excerpt, Excerpt_html, custom_2, custom_3, custom_5, … custom_N
with one row per field and whichever custom fields are in use for the selected articles (and NULLs where data not applicable for that column). Same goes for the other content types.
No doubt that’s all wrong because I’m just a beginner at SQL and relational databases. The notion of storing things in flat tables is utterly retarded, imo. I get so frustrated having to “downgrade” my thinking from true direct ERD-mapped databases to this rigid rows and columns abomination that’s dominated the market for over thirty years. I started out working with E-R databases for ten years (had never heard of relational at the time). Then someone showed me tables, rows, columns, 4NF and SQL, and I died a little inside. *sigh*
Anyway, with my rudimentary table wrangling, that’s what I have so far. Shoot it down if you want, but I’d rather someone improved it instead.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Situation report for Textpattern 4.6 and beyond
etc wrote #279528:
No blob is needed, varchar(~65535) should be enough to call them “unlimited”.
When mapped to memory, fields become fixed length (the space is fixed). Each row with just that field will reserve 0.5 MB of memory even if the field doesn’t have anything in it. A simple query such as:
select 1 from textpattern order by custom_1
Should take 100 MB of memory if there are 200 rows with none of them having anything in custom_1 field. Needless to say, it will go over the tmp_table_size size limit and will result in both memory sorting (it tries it first, but aborts when it runs into the limit, but it uses pre-check as far as I’m aware) and default to on-disk temporary tables, where the varchar field will act like a text field.
Aren’t
Body
andExcerpt
concerned?
Not the slightest. Some things are necessary, and by design, others aren’t. You normally don’t sort by or compare those fields either.
Offline
Re: Situation report for Textpattern 4.6 and beyond
Bloke wrote #279529:
`group` varchar(255) DEFAULT NULL
That is a reserved keyword. Needs to be renamed from “group” to something else, like for instance to “event”.
ENGINE=InnoDB
We are still using MyISAM, but we can change all our tables to that if necessary. Locking will then become row based instead of being table based etc.
I was under the impression that the biggest varchar was 512 chars, since about v5.0.3 or something, but maybe that’s outdated now and we really can ditch Text types. Would be brilliant if we could, because they really are a pain for performance.
See my above post. Short: no and yes.
UNIQUE KEY `meta_content` (`meta_id`,`content_id`)
Will prevent multiples, but then it does help with selecting as it has an index. Maybe add a value key to it? E.g.
UNIQUE KEY `meta_content` (`meta_id`,`content_id`,`value_id`)
Where value_id is a TINYINT that defaults to 0.
Offline
Offline
Re: Situation report for Textpattern 4.6 and beyond
Gocom wrote #279531:
Needs to be renamed from “group” to something else, like for instance to “event”.
My bad. event
is way better.
We are still using MyISAM
Oh that was just because my local MySQL chose InnoDB by default for all tables.
[UNIQUE KEY] Will prevent multiples, but then it does help with selecting as it has an index. Maybe add a value key to it?
Good idea, thanks. I hadn’t tested the same CF more than once in the same article yet, so hadn’t noticed that problem.
The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.
Txp Builders – finely-crafted code, design and Txp
Offline
Re: Situation report for Textpattern 4.6 and beyond
Bloke wrote #279529:
Sorry to disappoint everyone, but this is the current database structure I have:
Thanks for clarification, no disappointment at all, group event field should make this structure flexible enough. I would only note that representing content_type
by a number yields a high degree of coordination between plugin authors to avoid collisions. Maybe article,image,smd_user
and so on is more practical.
Offline
Re: Situation report for Textpattern 4.6 and beyond
Oleg, have you ever thought of applying to the Textpattern development team?
Offline