Discussion:
Custom fields schema
(too old to reply)
Sean McAfee
2004-12-15 05:35:36 UTC
Permalink
To kick things off (I hope), here are some tables from my custom fields
schema, shorn of some of the niftier additions that have accrued over time.

CREATE TABLE custom_fields (
field_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
field_name VARCHAR(100) NOT NULL,
display_name TINYTEXT NOT NULL,
field_type CHAR BINARY NOT NULL,
selection_id INTEGER,
UNIQUE INDEX (field_name)
);

This is obviously the table which enumerates all custom fields defined at a
particular Bugzilla installation.

field_id is your basic primary key.

field_name is a unique name for the key, used in such contexts as CGI
parameters:

http://bugzilla.mycompany.com/buglist.cgi?my_custom_field=foo

...and names used in code:

print "field value is ", $bug->{my_custom_field}, "\n";

On ongoing but minor concern of mine has been how to guarantee that a custom
field's name does not conflict with any CGI parameter, now or in the future.
Adopting a naming convention, such as a short prefix followed by "_", has
seemed to suffice.

display_name is the field's name as it appears on Web pages.
my_custom_field's display name might be "My Custom Field", for example.

There has been disagreement on this mailing list in the past, since this
approach is not internationalization-friendly. Recently it occurred to me
that a compromise might be to use templates to look up field display names,
as others have proposed (or insisted upon), but to fall back on the contents
of this column if no template-based name is found.

field_type obviously describes the type of the field. Valid values are:

'i' - An integer.

'd' - A date with no associated time.

't' - A date/time.

's' - A short string, 255 or fewer characters in length. No vertical
whitespace allowed. Represented on CGI forms by a single-line text
element.

'l' - A long string, 65535 or fewer characters in length. No restriction on
contents. Represented on CGI forms by a textarea element.

'e' - A "selection" field, which takes its values from a set of valid
strings, its "domain". Such a field's value may be either exactly one
or zero of the elements of its domain.

'm' - A "multiselection" field. Like a selection field, but its value may
consist of any number of elements of its domain.

More recently, I've had a need to address custom fields and built-in
Bugzilla fields using the same range of integers, so I added an eighth field
type: 'b', for Bugzilla.

selection_id is non-null only for selection, multiselection, and Bugzilla
fields. In the first two cases, it is an index into the table
CF_SELECTIONS (see below). In the last case, it is an index into the
FIELDDEFS table.


CREATE TABLE cf_membership (
product_id INTEGER NOT NULL REFERENCES products (id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
sortkey INTEGER NOT NULL,
INDEX (product_id)
);

This table describes which fields belong to which products, in what should
be a straightforward manner. (The REFERENCES clauses have no effect for
most MySQL tables, but are useful for documentation purposes.) The sort
order described here may be overridden or ignored outright in templates; it
is a default only.


CREATE TABLE cf_selections (
selection_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
selection_name VARCHAR(100) NOT NULL,
unset_label TINYTEXT NOT NULL,
UNIQUE INDEX (selection_name)
);

This table describes the possible domains of all selection fields in the
installation. selection_id is the primary key. selection_name is a name
which is presented on administrative interfaces, and may be subject to the
same concerns as custom fields' display names. unset_label is a textual
label which is printed when a selection field's value is null. I created
this feature simply because it existed in my company's previous
incident-tracking system (TeamTrack), but it has proven to be of remarkably
little use, and I wouldn't be averse to ditching it.

Note that distinct custom fields may share the same selection domain.


CREATE TABLE cf_selection_labels (
label_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
selection_id INTEGER NOT NULL REFERENCES cf_selections (selection_id),
label VARCHAR(100) NOT NULL,
inactive BOOL NOT NULL,
sortkey INTEGER NOT NULL,
UNIQUE INDEX (selection_id, label)
);

This table describes the elements (or "labels") of all selection domains.
label_id is the primary key. selection_id describes to which domain each
label belongs, sorted by default in the order described by sortkey. label
is the text of the element. inactive is a boolean flag; if true, the
corresponding label may not be entered as a new field value. This feature
is necessary because otherwise there would be no way to remove a selection
element without annihilating all existing data that refers to that element.


CREATE TABLE cf_integer (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value INTEGER,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);

This table stores integer custom field data, in what again should be a
straightforward manner. The primary key is useful for looking up field
values for known bugs; the index on field_id is useful for querying. Note
that an integer field may be null, indicating that it has no value.

Similar comments apply to the next three tables:

CREATE TABLE cf_shortstring (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value TINYTEXT,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);

CREATE TABLE cf_longstring (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value TEXT,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);

CREATE TABLE cf_date (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value DATETIME,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);

Both date and date/time field values are stored in the CF_DATE table. Time
components of date field values are eliminated by the interface code when it
goes into or comes out of the database. Interface code will accept, as user
input for date fields, any input that Date::Parse can handle.


CREATE TABLE cf_selection (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
label_id INTEGER NOT NULL REFERENCES cf_selection_labels (label_id),
INDEX (bug_id, field_id),
INDEX (field_id)
);

This table is similar to the others, but since selection fields are
multivalued, no primary key is possible. Both selection and multiselection
fields are stored in the same table; the interface code must ensure that
single-selection fields do not refer to more than one element of their
domains.

So...comments?
--
Sean McAfee -- etzwane-QkQGYOOlIAQdnm+***@public.gmane.org
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Joel Peshkin
2004-12-15 14:32:53 UTC
Permalink
Post by Sean McAfee
CREATE TABLE custom_fields (
field_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
field_name VARCHAR(100) NOT NULL,
display_name TINYTEXT NOT NULL,
field_type CHAR BINARY NOT NULL,
selection_id INTEGER,
UNIQUE INDEX (field_name)
);
This is obviously the table which enumerates all custom fields defined at a
particular Bugzilla installation.
Let's merge this with fielddefs. It would be a shame to make every bit
of code have to query both fielddefs and custom_fields. At first glance
that may seem like more work, but it will not be by the time we are done
with buglist.cgi, Search.pm, and BugMail.pm.

Also, do we need accesskeys for customfields?
Post by Sean McAfee
print "field value is ", $bug->{my_custom_field}, "\n";
On ongoing but minor concern of mine has been how to guarantee that a custom
field's name does not conflict with any CGI parameter, now or in the future.
Adopting a naming convention, such as a short prefix followed by "_", has
seemed to suffice.
Sounds good.
Post by Sean McAfee
There has been disagreement on this mailing list in the past, since this
approach is not internationalization-friendly. Recently it occurred to me
that a compromise might be to use templates to look up field display names,
as others have proposed (or insisted upon), but to fall back on the contents
of this column if no template-based name is found.
Sounds good.
Post by Sean McAfee
'i' - An integer.
'd' - A date with no associated time.
't' - A date/time.
's' - A short string, 255 or fewer characters in length. No vertical
whitespace allowed. Represented on CGI forms by a single-line text
element.
'l' - A long string, 65535 or fewer characters in length. No restriction on
contents. Represented on CGI forms by a textarea element.
'e' - A "selection" field, which takes its values from a set of valid
strings, its "domain". Such a field's value may be either exactly one
or zero of the elements of its domain.
'm' - A "multiselection" field. Like a selection field, but its value may
consist of any number of elements of its domain.
I think a single character may not be sufficient in the long run. This
looks like a reasonable initial set, but I can see subsequent patches
adding 'u' for a userid. Ideally, it would be nice to be able to
specify i(userid) or some more inspired syntax to indicate that the
field is an integer holding a userid, etc...
Post by Sean McAfee
More recently, I've had a need to address custom fields and built-in
Bugzilla fields using the same range of integers, so I added an eighth field
type: 'b', for Bugzilla.
Please explain

I think that we need to determine what should happen with bugmail and
customfields. My suggestion is that, as we merge fielddefs and
custom_fields, we replace mailhead with something slightly more useful.

We should also determine what expectations we have from the UI. I still
believe that the right thing to do is to make things work if the
templates know nothing about the new fields, but give template authors
the ability to "take over" some fields while leaving others to the
default processing.



-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2004-12-15 22:48:41 UTC
Permalink
Post by Joel Peshkin
Post by Sean McAfee
CREATE TABLE custom_fields (
field_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
field_name VARCHAR(100) NOT NULL,
display_name TINYTEXT NOT NULL,
field_type CHAR BINARY NOT NULL,
selection_id INTEGER,
UNIQUE INDEX (field_name)
);
Let's merge this with fielddefs. It would be a shame to make every bit
of code have to query both fielddefs and custom_fields. At first glance
that may seem like more work, but it will not be by the time we are done
with buglist.cgi, Search.pm, and BugMail.pm.
Except for logging, I've never had much trouble with the two distinct
tables. Besides, they describe two different kinds of data; FIELDDEFS
describes columns of the BUGS table, but CUSTOM_FIELDS describes data that
lives in the various CF_* tables. Some FIELDDEFS columns are even computed,
such as "Days since bug changed", a concept which can't apply to custom
fields as currently defined.

A unification of the tables would be appropriate when what are currently
built-in fields are implemented as custom fields, or "fields" as I suppose
they would then be known.
Post by Joel Peshkin
Also, do we need accesskeys for customfields?
Er...what are they?
Post by Joel Peshkin
Post by Sean McAfee
'i' - An integer.
[snip]
Post by Joel Peshkin
I think a single character may not be sufficient in the long run. This
looks like a reasonable initial set, but I can see subsequent patches
adding 'u' for a userid. Ideally, it would be nice to be able to
specify i(userid) or some more inspired syntax to indicate that the
field is an integer holding a userid, etc...
That would be nice, but I think 26 field types would be excessive, let alone
the 255 types (or even more, given the i18n-ness of MySQL) the CHAR BINARY
column supports.
Post by Joel Peshkin
Post by Sean McAfee
More recently, I've had a need to address custom fields and built-in
Bugzilla fields using the same range of integers, so I added an eighth field
type: 'b', for Bugzilla.
Please explain
My current implementation logs changes to custom fields, attaching the
date/time and user IDs of the user making the change and the user of the new
assignee. (I haven't described this part of the schema yet, since I was
starting with just the basics.) I needed to be able to log fields
identified by numeric ID, using the same range of numbers for both built-in
and custom fields. The 'b' custom field type was my solution. Whenever I
needed to refer to a built-in field using the same range of integers as
custom fields, I would do something like this:

INSERT INTO custom_fields
SELECT NULL, name, description, 'b', fieldid
FROM fielddefs
WHERE field_name IN ('assigned_to', 'summary');

I suppose the built-in/custom distinction could also have been described in
the log table.
Post by Joel Peshkin
We should also determine what expectations we have from the UI. I still
believe that the right thing to do is to make things work if the
templates know nothing about the new fields, but give template authors
the ability to "take over" some fields while leaving others to the
default processing.
My current implementation is simple--too simple, probably. All custom
fields appear in a two-column HTML table on the show_bug page, above the
"Attachments" section, names on the left, values on the right. In previous
discussions, folks have expressed a desire to format fields arbitrarily,
anywhere on the page. The only difficulty with this is that we need a
default format and location for custom fields which are not already
explicitly placed elsewhere; the cleanest-looking such format would line up
the names and values, probably by placing them as columns of an HTML table,
so that this format needs to assume some knowledge of the surrounding HTML
context.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2004-12-15 23:15:26 UTC
Permalink
Post by Sean McAfee
That would be nice, but I think 26 field types would be excessive, let alone
the 255 types (or even more, given the i18n-ness of MySQL) the CHAR BINARY
column supports.
Just a thought, but couldn't we have something like validation_regex?
That is, a regex that specified what a valid value was for that field.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Gervase Markham
2004-12-16 00:10:59 UTC
Permalink
Post by Sean McAfee
Except for logging, I've never had much trouble with the two distinct
tables.
As this was a sticking point last time around, I should just mention -
the solution should have one single table for tracking bug changes, not
two. If that requires enhancing the existing one, that's absolutely fine.
Post by Sean McAfee
Besides, they describe two different kinds of data; FIELDDEFS
describes columns of the BUGS table, but CUSTOM_FIELDS describes data that
lives in the various CF_* tables. Some FIELDDEFS columns are even computed,
such as "Days since bug changed", a concept which can't apply to custom
fields as currently defined.
I think you may be confusing the fielddefs table with another one.
fielddefs does not have a "Days since bug changed" column, it has
fieldid, name, description, mailhead (which needs to die) and sortkey.
Post by Sean McAfee
Post by Joel Peshkin
Also, do we need accesskeys for customfields?
Er...what are they?
HTML accesskeys. And the answer is no; we're moving away from them,
because most people don't use them, and they annoy many keyboard users.
https://bugzilla.mozilla.org/show_bug.cgi?id=163007

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2004-12-16 00:36:30 UTC
Permalink
Post by Joel Peshkin
We should also determine what expectations we have from the UI. I
still believe that the right thing to do is to make things work if the
templates know nothing about the new fields, but give template authors
the ability to "take over" some fields while leaving others to the
default processing.
Right. There should be an "easy mode" for adding custom fields which
puts them into some area of the bug form along with a "manual mode" or
the like the administrator can use when she wants to add them to the
"edit bug" template by hand for control over their placement in the UI.
This should be selectable on a per-field basis.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Kevin Benton
2004-12-15 15:34:35 UTC
Permalink
It seems to me that any discussion of custom fields ought to include
existing fields and handle them all in a unified approach. In my view, the
best way to do this would be to define every field in a fields table, and
specify formatting / location, etc. in the table. I would hope that there
would even be a place where that table would have a definition for certain
types of handlers (subs) associated with each field. Maybe this is an
overly complex a view, but I think that this might offer us more flexibility
than we have with TT and yet, give us the ability to handle fields we
haven't dreamed of through plugins.

---
Kevin Benton
Perl/Bugzilla Developer
Advanced Micro Devices

The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information. Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices. This
communication is for the intended recipient(s) only. If you have received
this communication in error, please notify the sender, then destroy any
remaining copies of this communication.
-----Original Message-----
On Behalf Of Sean McAfee
Sent: Tuesday, December 14, 2004 10:36 PM
Subject: Custom fields schema
To kick things off (I hope), here are some tables from my custom fields
schema, shorn of some of the niftier additions that have accrued over time.
CREATE TABLE custom_fields (
field_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
field_name VARCHAR(100) NOT NULL,
display_name TINYTEXT NOT NULL,
field_type CHAR BINARY NOT NULL,
selection_id INTEGER,
UNIQUE INDEX (field_name)
);
This is obviously the table which enumerates all custom fields defined at a
particular Bugzilla installation.
field_id is your basic primary key.
field_name is a unique name for the key, used in such contexts as CGI
http://bugzilla.mycompany.com/buglist.cgi?my_custom_field=foo
print "field value is ", $bug->{my_custom_field}, "\n";
On ongoing but minor concern of mine has been how to guarantee that a custom
field's name does not conflict with any CGI parameter, now or in the future.
Adopting a naming convention, such as a short prefix followed by "_", has
seemed to suffice.
display_name is the field's name as it appears on Web pages.
my_custom_field's display name might be "My Custom Field", for example.
There has been disagreement on this mailing list in the past, since this
approach is not internationalization-friendly. Recently it occurred to me
that a compromise might be to use templates to look up field display names,
as others have proposed (or insisted upon), but to fall back on the contents
of this column if no template-based name is found.
'i' - An integer.
'd' - A date with no associated time.
't' - A date/time.
's' - A short string, 255 or fewer characters in length. No vertical
whitespace allowed. Represented on CGI forms by a single-line text
element.
'l' - A long string, 65535 or fewer characters in length. No restriction on
contents. Represented on CGI forms by a textarea element.
'e' - A "selection" field, which takes its values from a set of valid
strings, its "domain". Such a field's value may be either exactly one
or zero of the elements of its domain.
'm' - A "multiselection" field. Like a selection field, but its value may
consist of any number of elements of its domain.
More recently, I've had a need to address custom fields and built-in
Bugzilla fields using the same range of integers, so I added an eighth field
type: 'b', for Bugzilla.
selection_id is non-null only for selection, multiselection, and Bugzilla
fields. In the first two cases, it is an index into the table
CF_SELECTIONS (see below). In the last case, it is an index into the
FIELDDEFS table.
CREATE TABLE cf_membership (
product_id INTEGER NOT NULL REFERENCES products (id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
sortkey INTEGER NOT NULL,
INDEX (product_id)
);
This table describes which fields belong to which products, in what should
be a straightforward manner. (The REFERENCES clauses have no effect for
most MySQL tables, but are useful for documentation purposes.) The sort
order described here may be overridden or ignored outright in templates; it
is a default only.
CREATE TABLE cf_selections (
selection_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
selection_name VARCHAR(100) NOT NULL,
unset_label TINYTEXT NOT NULL,
UNIQUE INDEX (selection_name)
);
This table describes the possible domains of all selection fields in the
installation. selection_id is the primary key. selection_name is a name
which is presented on administrative interfaces, and may be subject to the
same concerns as custom fields' display names. unset_label is a textual
label which is printed when a selection field's value is null. I created
this feature simply because it existed in my company's previous
incident-tracking system (TeamTrack), but it has proven to be of remarkably
little use, and I wouldn't be averse to ditching it.
Note that distinct custom fields may share the same selection domain.
CREATE TABLE cf_selection_labels (
label_id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
selection_id INTEGER NOT NULL REFERENCES cf_selections (selection_id),
label VARCHAR(100) NOT NULL,
inactive BOOL NOT NULL,
sortkey INTEGER NOT NULL,
UNIQUE INDEX (selection_id, label)
);
This table describes the elements (or "labels") of all selection domains.
label_id is the primary key. selection_id describes to which domain each
label belongs, sorted by default in the order described by sortkey. label
is the text of the element. inactive is a boolean flag; if true, the
corresponding label may not be entered as a new field value. This feature
is necessary because otherwise there would be no way to remove a selection
element without annihilating all existing data that refers to that element.
CREATE TABLE cf_integer (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value INTEGER,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);
This table stores integer custom field data, in what again should be a
straightforward manner. The primary key is useful for looking up field
values for known bugs; the index on field_id is useful for querying. Note
that an integer field may be null, indicating that it has no value.
CREATE TABLE cf_shortstring (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value TINYTEXT,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);
CREATE TABLE cf_longstring (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value TEXT,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);
CREATE TABLE cf_date (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
value DATETIME,
PRIMARY KEY (bug_id, field_id),
INDEX (field_id)
);
Both date and date/time field values are stored in the CF_DATE table.
Time
components of date field values are eliminated by the interface code when it
goes into or comes out of the database. Interface code will accept, as user
input for date fields, any input that Date::Parse can handle.
CREATE TABLE cf_selection (
bug_id INTEGER NOT NULL REFERENCES bugs (bug_id),
field_id INTEGER NOT NULL REFERENCES custom_fields (field_id),
label_id INTEGER NOT NULL REFERENCES cf_selection_labels (label_id),
INDEX (bug_id, field_id),
INDEX (field_id)
);
This table is similar to the others, but since selection fields are
multivalued, no primary key is possible. Both selection and
multiselection
fields are stored in the same table; the interface code must ensure that
single-selection fields do not refer to more than one element of their
domains.
So...comments?
--
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2004-12-15 23:11:25 UTC
Permalink
Post by Kevin Benton
It seems to me that any discussion of custom fields ought to include
existing fields and handle them all in a unified approach. In my view, the
best way to do this would be to define every field in a fields table, and
specify formatting / location, etc. in the table.
I'd like to get there eventually, but for an initial custom fields release,
I'd prefer to think a little smaller. After all, converting built-in fields
to custom fields later should be a snap:

INSERT INTO custom_fields VALUES (
NULL, 'creation_ts', 'Creation Time', 't', NULL
);

SELECT LAST_INSERT_ID(); -- say it's "42"

INSERT INTO CF_DATE
SELECT bug_id, 42, creation_ts
FROM bugs;

ALTER TABLE bugs DROP COLUMN creation_ts;

The data conversion would be a snap, I mean; lots of code would obviously
have to be rewritten.
Post by Kevin Benton
I would hope that there
would even be a place where that table would have a definition for certain
types of handlers (subs) associated with each field. Maybe this is an
overly complex a view, but I think that this might offer us more flexibility
than we have with TT and yet, give us the ability to handle fields we
haven't dreamed of through plugins.
I've also implemented something like this. I have a module Bugzilla::Local,
where site-specific behavior can be defined. For example, there's an
exported subroutine in there called "auto_fields", which describes how to
automatically fill in certain fields when a bug is created. We wanted to
maintain some continuity in the bug-numbering scheme of our old tracking
system, so I created a custom field to hold this number, and when a bug is
created in the relevant product, auto_fields() sets that field's value in
the new bug to one higher than the current highest value.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Gervase Markham
2004-12-16 00:07:17 UTC
Permalink
Post by Kevin Benton
It seems to me that any discussion of custom fields ought to include
existing fields and handle them all in a unified approach.
I would be extremely wary of this approach, because it unnecessarily
increases the work involved in the initial patch. Why not implement
custom fields, and then port existing fields over as and when?

? In my view, the
Post by Kevin Benton
best way to do this would be to define every field in a fields table, and
specify formatting / location, etc. in the table.
Absolutely not :-). Formatting and location live in templates. As Joel
suggests, we need a system which deals with unknown fields but also
allows specific placing of known fields.

For example, unknown text fields would just be appended to the list of
four we have now, and unknown dropdown fields would probably go down the
right somewhere. It doesn't have to look pretty, it just has to work -
if an admin wants pretty, they can do explicit positioning. Hopefully
Kiko's UI rewrite will take these issues into account.
Post by Kevin Benton
I would hope that there
would even be a place where that table would have a definition for certain
types of handlers (subs) associated with each field.
I think such an association would work better using a code plugin system
- but again, that's way down the line. Version 1 should not include
the ability for admins to define new field _types_.

I can see the use of a validation regex for text fields, though.

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2004-12-16 00:30:48 UTC
Permalink
Post by Sean McAfee
To kick things off (I hope), here are some tables from my custom fields
schema, shorn of some of the niftier additions that have accrued over time.
The basic issue with this approach is that it replicates infrastructure
that MySQL and other databases already provide for us and that we use
for non-custom fields. In particular, the integer, shortstring,
longstring, and date tables provide:

* links between columns (i.e. attributes of bugs) and fields (the
values of columns for particular bugs): the "field_id" column;
* links between fields and their bugs: the "bug_id" column;
* field values: the "value" column.

But MySQL already provides:

* links between columns and fields: columns;
* links between fields and their bugs: fields;
* field values: field values.

MySQL also provides us tools for querying and modifying the structure of
tables in the database so that we can build tools to manage the columns,
and its search and other algorithms are designed with the assumption
that databases are built using its architecture for structuring and
relating data.

So I'm loathe to replicate this architecture. I think we're better off
making custom fields be real columns in the database, just like standard
fields, and creating independent value list and mapping tables for
select and multi-select custom fields, just as we do for standard fields
of those types.

Besides making custom fields work like standard fields, simplifying
their integration with existing code and the migration of fields from
standard to custom (and vice versa), this approach minimizes the
side-effect risk associated with building independent field management
on top of MySQL's existing infrastructure and the development cost of
maintaining two ways of accessing and manipulating fields.

Of course, there are some things MySQL and other DBMSes don't provide,
like column -> title mappings and product-specificity. Those things
should rightly be tackled with tables like the ones in this proposal,
custom_fields and cf_membership, respectively, just as we currently
tackle them for standard fields with tables like fielddefs and
flag(in|ex)clusions.

-myk
Sean McAfee
2004-12-16 02:50:41 UTC
Permalink
Post by Myk Melez
Post by Sean McAfee
To kick things off (I hope), here are some tables from my custom fields
schema, shorn of some of the niftier additions that have accrued over time.
The basic issue with this approach is that it replicates infrastructure
that MySQL and other databases already provide for us and that we use
for non-custom fields.
MySQL also provides us tools for querying and modifying the structure of
tables in the database so that we can build tools to manage the columns,
and its search and other algorithms are designed with the assumption
that databases are built using its architecture for structuring and
relating data.
What are these tools?
Post by Myk Melez
So I'm loathe to replicate this architecture. I think we're better off
making custom fields be real columns in the database, just like standard
fields, and creating independent value list and mapping tables for
select and multi-select custom fields, just as we do for standard fields
of those types.
The standard fields of those types are of the MySQL-specific ENUM type,
aren't they? I had thought that greater platform-independence was an
ongoing goal. Or has this changed in recent releases? My information may
be out of date.
Post by Myk Melez
Besides making custom fields work like standard fields, simplifying
their integration with existing code and the migration of fields from
standard to custom (and vice versa), this approach minimizes the
side-effect risk associated with building independent field management
on top of MySQL's existing infrastructure and the development cost of
maintaining two ways of accessing and manipulating fields.
There are significant efficiency concerns with a fields-as-columns
approach.

Suppose you have a largeish Bugzilla installation with, oh, say 100,000
bugs and 200 custom fields across twenty products. Suppose further that you
want to run a simple query against one particular field: find all bugs where
the short string field "foo_field" contains the substring "meta". In my
scheme, you run these two queries:

SELECT field_id
FROM custom_fields
WHERE field_name = 'foo_field';

Then, using the returned field_id (say, 99):

SELECT bug_id
FROM cf_shortstring
WHERE field_id = 99
AND INSTR(value, 'meta');

The first query is very fast, thanks to the index on FIELD_NAME, and would
be fast anyway with only 200 rows to check. The second one quickly targets
only those bugs in products that have foo_field as a member, thanks to the
index on FIELD_ID. (I suppose the queries could be combined into one.)

If custom fields were "real" columns, you'd do something like this:

SELECT bug_id
FROM bugs
WHERE INSTR(cf_foo_field, 'meta');

This query has to scan *every bug in the installation*; there's no way to
restrict it only to bugs that actually have a foo_field. BUGS is a huge,
sparse table, with 90% of its columns NULL in any given row, on average.
Say goodbye to casual "SELECT * FROM BUGS WHERE..." queries at the mysql
client command line, too.

Speaking of NULL, you'd have to give up the capability to have NULL custom
field values, since you couldn't do this:

SELECT bug_id
FROM bugs
WHERE cf_foo_field IS NULL;

My company's prior bug tracking system, TeamTrack, stored field data in huge
sparse tables, and its speed was/is atrocious. I didn't have to think very
hard in the early days to know I didn't want to go the same route.

As an example of another, less important efficiency issue, suppose you want
get all of a single bug's custom field data. You'd either do this:

SELECT *
FROM bugs
WHERE bug_id = 99;

This would fetch all column data, most of which would be NULL and irrelevant
in the hypothetical situation described above. The alternative would be to
construct a list of known-to-be-relevant column names, and do this:

SELECT field_1, field_2, field_3, field_4, field_5, field_6,
field_7, field_8, field_9, field_10, field_11, field_12,
field_13, field_14, field_15, field_16, field_17, field_18,
field_19, field_20
FROM bugs
WHERE bug_id = 99;

Yuck! Both solutions are inelegant, which experience has taught me is
usually an indication of bad design.

Fields-as-columns may make life easier for the programmer, but it's the
needs of the users that should be coming first.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2004-12-16 02:58:24 UTC
Permalink
Post by Sean McAfee
The standard fields of those types are of the MySQL-specific ENUM type,
aren't they?
Enum types are going away. You can see my recent work on bug 17453.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2004-12-19 17:02:32 UTC
Permalink
Post by Sean McAfee
Post by Myk Melez
MySQL also provides us tools for querying and modifying the structure of
tables in the database so that we can build tools to manage the columns,
and its search and other algorithms are designed with the assumption
that databases are built using its architecture for structuring and
relating data.
What are these tools?
They're the MySQL data definition and utility statements.
Post by Sean McAfee
Post by Myk Melez
So I'm loathe to replicate this architecture. I think we're better off
making custom fields be real columns in the database, just like standard
fields, and creating independent value list and mapping tables for
select and multi-select custom fields, just as we do for standard fields
of those types.
The standard fields of those types are of the MySQL-specific ENUM type,
aren't they? I had thought that greater platform-independence was an
ongoing goal. Or has this changed in recent releases? My information may
be out of date.
Nope, you're not out of date, we are indeed replacing enums with tables
of values. I'm referring to fields like product, component, and
target_milestone as well as the fields currently using enums which are
being converted to work like the aforementioned fields.
Post by Sean McAfee
Post by Myk Melez
Besides making custom fields work like standard fields, simplifying
their integration with existing code and the migration of fields from
standard to custom (and vice versa), this approach minimizes the
side-effect risk associated with building independent field management
on top of MySQL's existing infrastructure and the development cost of
maintaining two ways of accessing and manipulating fields.
There are significant efficiency concerns with a fields-as-columns
approach.
Suppose you have a largeish Bugzilla installation with, oh, say 100,000
bugs and 200 custom fields across twenty products. Suppose further that you
want to run a simple query against one particular field: find all bugs where
the short string field "foo_field" contains the substring "meta". In my
SELECT field_id
FROM custom_fields
WHERE field_name = 'foo_field';
SELECT bug_id
FROM cf_shortstring
WHERE field_id = 99
AND INSTR(value, 'meta');
The first query is very fast, thanks to the index on FIELD_NAME, and would
be fast anyway with only 200 rows to check. The second one quickly targets
only those bugs in products that have foo_field as a member, thanks to the
index on FIELD_ID. (I suppose the queries could be combined into one.)
SELECT bug_id
FROM bugs
WHERE INSTR(cf_foo_field, 'meta');
This query has to scan *every bug in the installation*; there's no way to
restrict it only to bugs that actually have a foo_field. BUGS is a huge,
sparse table, with 90% of its columns NULL in any given row, on average.
Say goodbye to casual "SELECT * FROM BUGS WHERE..." queries at the mysql
client command line, too.
Notwithstanding that we should use fulltext indexes for these kinds of
searches, this works pretty well for an existing standard field on the
production b.m.o server:
------------------------------------------------------------------------
mysql> select count(*) from bugs;
+----------+
| count(*) |
+----------+
| 274656 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from bugs where status_whiteboard != '';
+----------+
| count(*) |
+----------+
| 32931 |
+----------+
1 row in set (0.75 sec)

mysql> select bug_id from bugs where INSTR(status_whiteboard,'meta');
+--------+
| bug_id |
+--------+
| 4553 |
| 9410 |
| 12309 |
| 16029 |
| 23583 |
| 35154 |
| 38761 |
| 45349 |
| 51279 |
| 52577 |
| 56812 |
| 63871 |
| 63872 |
| 69533 |
| 80613 |
| 89424 |
| 90669 |
| 92763 |
| 93969 |
| 127034 |
| 127147 |
| 139820 |
| 150783 |
| 161891 |
| 190406 |
| 190545 |
+--------+
27 rows in set (0.79 sec)
------------------------------------------------------------------------
(one bug ID removed from the result set above because it's a
confidential bug)
Post by Sean McAfee
Speaking of NULL, you'd have to give up the capability to have NULL custom
SELECT bug_id
FROM bugs
WHERE cf_foo_field IS NULL;
Why not?

Of course, we don't have to put all of these columns into the bugs
table. We can do that, but we can also put each one into its own table
(so that bugs->custom field is a 1->0|1 relationship, and the database
uses no more storage for the custom field than necessary), all of them
into one other table, or some combination of these three, depending on
what makes the most sense.
Post by Sean McAfee
My company's prior bug tracking system, TeamTrack, stored field data in huge
sparse tables, and its speed was/is atrocious. I didn't have to think very
hard in the early days to know I didn't want to go the same route.
Bugzilla is also somewhat sparse, but it's not a major performance
problem, as far as I know. In any case, relationship DB models are
designed to deal with sparseness (in fact, that's part of the reason
they were developed) by moving sparse data to a separate table, as we do
already for some standard fields.

We can do that for sparse custom fields (and even sparse standard ones
still in the bugs table) with the fields-as-columns approach while
keeping non-sparse fields in the bugs table, improving performance for
those fields (because we don't have to do joins or multiple queries),
unlike with the fields-as-data approach, which always requires joins or
multiple queries.
Post by Sean McAfee
As an example of another, less important efficiency issue, suppose you want
SELECT *
FROM bugs
WHERE bug_id = 99;
This would fetch all column data, most of which would be NULL and irrelevant
in the hypothetical situation described above. The alternative would be to
SELECT field_1, field_2, field_3, field_4, field_5, field_6,
field_7, field_8, field_9, field_10, field_11, field_12,
field_13, field_14, field_15, field_16, field_17, field_18,
field_19, field_20
FROM bugs
WHERE bug_id = 99;
Yuck! Both solutions are inelegant, which experience has taught me is
usually an indication of bad design.
But we'd have a list of custom fields (probably generated from a simple
query or a sub-query), so the latter solution would be more like:

SELECT $list_of_fields
FROM bugs
WHERE bug_id = 99;

... which is both simple and elegant.
Post by Sean McAfee
Fields-as-columns may make life easier for the programmer, but it's the
needs of the users that should be coming first.
Agreed, and that's why I'm suggesting it, not to make our lives easier,
especially considering I think fields-as-columns is harder to develop.
Users would be better served by an app living on top of a database which
uses the optimum architecture for custom fields. For my money, that's
the relational model developed over the last several decades for the
purpose, and which we already use for standard fields.

-myk
Sean McAfee
2004-12-19 23:38:43 UTC
Permalink
Post by Myk Melez
Post by Sean McAfee
Fields-as-columns may make life easier for the programmer, but it's the
needs of the users that should be coming first.
Agreed, and that's why I'm suggesting it, not to make our lives easier,
especially considering I think fields-as-columns is harder to develop.
Users would be better served by an app living on top of a database which
uses the optimum architecture for custom fields. For my money, that's
the relational model developed over the last several decades for the
purpose, and which we already use for standard fields.
Well, here's some hard data.

Attached is a Perl program, construct-tables.pl. It creates two tables,
REALCOL and DISTCOL. REALCOL has two columns, BUG_ID and CF_FOO, and is
meant to be a simplification of a BUGS table with a single
custom-field-as-real-column called FOO. DISTCOL has three columns, BUG_ID,
FIELD_ID, and VALUE, with an index on FIELD_ID, and is essentially identical
to my proposed CF_SHORTSTRING table. The program then populates both tables
with data for 300,000 bugs. For each bug, it constructs a string of ten
joined words randomly chosen from /usr/share/dict/words, inserting the
string "meta" at a random position in the middle .1% of the time. The bugs
are inserted into DISTCOL with a field_id that starts at 0 and increases by
one each time, going back to zero when it reaches 100. The string is
inserted into REALCOL only when the field ID is 0; otherwise NULL is
inserted for the column CF_FOO.

The tables constructed, here's what I see:

mysql> select count(*) from realcol where instr(cf_foo, 'meta');
+----------+
| count(*) |
+----------+
| 20 |
+----------+
1 row in set (0.40 sec)

mysql> select count(*) from distcol where field_id = 0 and instr(value, 'meta');+----------+
| count(*) |
+----------+
| 20 |
+----------+
1 row in set (0.03 sec)

So the distributed-tables solution is clearly much faster.


--Sean
Bradley Baetz
2004-12-20 07:18:18 UTC
Permalink
Post by Sean McAfee
Well, here's some hard data.
1 row in set (0.40 sec)
1 row in set (0.03 sec)
0.4 seconds is not hard data - you're going to be hitting cache only.
You need something that takes at least 10 seconds, and hits disk.
Post by Sean McAfee
So the distributed-tables solution is clearly much faster.
It can usually never be faster, since there are additional lookups to
do. Where it can be faster is where indexes can't be used, and you can
scan all of a smaller table plus some of a large table, rather than all
of a larger table - thats what you're doing here with |instr|. Noone
expends instr to be fast, though.

Try what you had, but using a small set of strings (eg products, before
I moved it over to product_id), and some selects on some values that
appear often, and some that are rare.

However, the biggest issue you run into is MySQL's use of at most one
index per table per query. Try it again with product and component, and
requirements on both, (joining the main table twice to the secondary
table) and you'll see the difference. Then try it with a 'text table' and an
'integer table' (so that you really have multiple tables).

For added fun, do it on postgres (>= 7.4, remembering to VACUUM after
the initial data population) and compare the times then...

Separate tables are definately cleaner, though, and my prefered solution.

Bradley
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2004-12-20 19:34:50 UTC
Permalink
Post by Bradley Baetz
However, the biggest issue you run into is MySQL's use of at most one
index per table per query. Try it again with product and component, and
requirements on both, (joining the main table twice to the secondary
table) and you'll see the difference. Then try it with a 'text table' and an
'integer table' (so that you really have multiple tables).
OK, more hard data ahead...

Attached is a revised construct-table.pl program, which can create and
populates arbitrary numbers of string and integer fields (ten and two,
respectively, as currently written). String-field columns in REALCOL are
called cf_aa, cf_ab, etc, and integer-field columns are cf_int1, cf_int2,
etc. There are now two distributed-column tables, DISTCOL_STR and
DISTCOL_INT. String fields are assigned field IDs going incrementally from
zero, and integer fields get field IDs going incrementally after the last
string field ID. String fields are given random values taken from
/usr/share/dict/words, as before, with "meta" inserted randomly 1% of the
time. Integer fields are given random values from 0 to 999.

All string-field conditions are INSTR searches for "meta", as before. All
integer-field conditions are for numbers greater than 500. The SQL
statements get pretty lengthy, so I've put those in another attachment,
sql-statements.txt. I ran each query three times.

+--------------+---------------------+---------------------+------------+
| Query | "Real" times | "Distributed" times | Bugs found |
+--------------+---------------------+---------------------+------------+
| 1 str | 8.96, 7.72, 7.74 | 19.71, 4.70, 4.73 | 4786 |
| 2 str | 13.23, 13.39, 13.56 | 11.71, 6.68, 5.87 | 77 |
| 3 str | 14.27, 14.49, 14.53 | 6.70, 5.79, 5.54 | 2 |
| 10 str | 14.22, 14.56, 14.69 | 5.99, 4.84, 4.90 | 0 |
| 1 str, 1 int | 14.52, 14.42, 14.48 | 9.65, 8.52, 7.32 | 2371 |
| 2 str, 2 int | 13.81, 13.75, 14.57 | 12.44, 9.17, 7.77 | 16 |
+--------------+---------------------+---------------------+------------+

Both methods agreed on the number of bugs found in all cases.

Except for that first anomalous one-string distributed-column search, the
distributed-column search times are consistently significantly less than the
real-column search times, even with uncommonly large numbers of search
terms. The real-column performance remains fairly flat, while the
distributed-column performance seems to benefit from repeated searches; I
assume this is due to cacheing.

Huff, puff...

Is this enough data to put this issue to rest? Pretty please?


--Sean
Myk Melez
2004-12-24 02:22:50 UTC
Permalink
Post by Sean McAfee
Is this enough data to put this issue to rest? Pretty please?
Not really. First, it doesn't measure the impact of indexes. Second,
it doesn't include the cost of looking up field IDs. Third, it only
measures search performance, but most queries retrieve bug data for
specific bugs. Fourth, it doesn't take what bbaetz says into account.
And fifth, I never claimed bugs table columns were more performant, only
that they were performant enough, which they are, considering the
"custom fields" already in use as standard colums.

But the primary reason it doesn't put the issue to rest is that sparse
fields go into separate tables under the fields-as-columns proposal as
well, so that proposal gets any benefit to separate tables.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2004-12-24 03:52:05 UTC
Permalink
Post by Myk Melez
Post by Sean McAfee
Is this enough data to put this issue to rest? Pretty please?
Not really. First, it doesn't measure the impact of indexes.
Which indexes would those be?
Post by Myk Melez
Second,
it doesn't include the cost of looking up field IDs.
Insignificant, even in the ludicrous case of hundreds of thousands of
fields, thanks to indices.
Post by Myk Melez
Third, it only
measures search performance, but most queries retrieve bug data for
specific bugs.
The search yields bug IDs, which are looked up by primary key in all data
tables. Again, insignificant.
Post by Myk Melez
Fourth, it doesn't take what bbaetz says into account.
My revised tests took most of what he said into account. What did I miss
that's important?

(By the way, I was unaware of the "one index per table per query" MySQL
limitation bbaetz mentioned. I was prepared to see my solution's
performance dive into the ground with larger numbers of fields of different
types, but it didn't happen. What's the deal? If it's true, I guess "one
index" is counted even if a table is joined against multiple times. All of
my query joins use only the index on field_id.)
Post by Myk Melez
And fifth, I never claimed bugs table columns were more performant, only
that they were performant enough, which they are, considering the
"custom fields" already in use as standard colums.
13-14 seconds versus 4-5 seconds is "performant enough"?

It's not as if there's even a greater cost associated with developing the
faster solution. It exists today. The slower one does not.
Post by Myk Melez
But the primary reason it doesn't put the issue to rest is that sparse
fields go into separate tables under the fields-as-columns proposal as
well, so that proposal gets any benefit to separate tables.
Do you have a specific proposal that can be tested? And, er, I don't want
to sound rude, but would you mind testing it yourself? I've already done
quite a lot in that area.

As an aside, my own intuition is against the notion of modifying the
database schema as part of routine system administration.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2004-12-24 16:17:04 UTC
Permalink
----- Original Message -----
Post by Sean McAfee
As an aside, my own intuition is against the notion of modifying the
database schema as part of routine system administration.
Indeed. I also recall justdave pointing out (somewhere in the userprefs-table bug, that I don't recall the number of) that for the "Bugzilla plugins" idea, we should stick to "columns-as-table-rows," because that makes a drop-in plugin for Bugzilla much easier to write.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Shane H. W. Travis
2004-12-24 17:21:53 UTC
Permalink
Post by Maxwell Kanat-Alexander
----- Original Message -----
Post by Sean McAfee
As an aside, my own intuition is against the notion of modifying the
database schema as part of routine system administration.
Indeed. I also recall justdave pointing out (somewhere in the
userprefs-table bug, that I don't recall the number of) that for the
"Bugzilla plugins" idea, we should stick to "columns-as-table-rows,"
because that makes a drop-in plugin for Bugzilla much easier to write.
I was just about to say the exact same thing, Max.

Reference: https://bugzilla.mozilla.org/show_bug.cgi?id=98123#c23

The comment was made in the User Preferences bug, but the implications are
far-reaching. I cannot help but think that if it matters here, then it would
matter even more for something as large and far-reaching as customized
fields.

(For the record, I started off designing new tables for bug 98123 with
each-pref-is-a-column category firmly fixed in my mind as 'the only/best way
to go'. After Dave's comment I started considering other ideas; the latest
schema abandons that line of thought, and IMHO is better for having done
so.)

Shane H.W. Travis | The greatest of all mistakes is to do nothing
travis-leLQYRwCyNZIo2TaICnI/***@public.gmane.org | because you can only do a little.
Saskatoon, Saskatchewan | Do what you can. -- Sydney Smith
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-21 19:56:50 UTC
Permalink
Post by Shane H. W. Travis
Post by Maxwell Kanat-Alexander
Indeed. I also recall justdave pointing out (somewhere in the
userprefs-table bug, that I don't recall the number of) that for the
"Bugzilla plugins" idea, we should stick to "columns-as-table-rows,"
because that makes a drop-in plugin for Bugzilla much easier to write.
I was just about to say the exact same thing, Max.
Reference: https://bugzilla.mozilla.org/show_bug.cgi?id=98123#c23
The comment was made in the User Preferences bug, but the implications are
far-reaching. I cannot help but think that if it matters here, then it would
matter even more for something as large and far-reaching as customized
fields.
But preferences and custom fields are very different beasts.
Preferences are data, like the other real-world objects (users, bugs) we
model, while custom fields are structure, like the data structures we
use when modeling those objects.

It makes sense to model preferences using the most appropriate
combination of table, column, and row structures (f.e. with set theory,
or using an object-oriented model where table = class, column =
property, and row = instance); but doing so for custom fields just adds
an unnecessary layer of abstraction and complexity.

Custom fields are exactly what relational database columns were designed
for; they fit perfectly into the column metaphor, just as the standard
Bugzilla fields do. They're modifiable via SQL statements just as
easily as the data within them is. And while Bugzilla doesn't modify
its schema very much today, there's nothing inherently more dangerous
about it doing so. You can wipe out your data just as easily with
"DELETE FROM <columnname>" as you can with "DROP COLUMN <columnname>".

-myk
Christopher Hicks
2005-01-22 15:59:12 UTC
Permalink
It makes sense to model preferences using the most appropriate combination of
table, column, and row structures (f.e. with set theory, or using an
object-oriented model where table = class, column = property, and row =
instance); but doing so for custom fields just adds an unnecessary layer of
abstraction and complexity.
Bah. Your conclusion of what's necessary here seems to be based on a very
selective view of the universe. Several "necessities" stick out when
considering a custom fields implemention based on abstraction at the
database level:

(1) People can add as many custom fields as they want without worrying
about reaching the maximum record size of their database

(2) Code for dealing with custom fields is going to need to have a goodly
portion of the abstraction tables for keeping track of stuff anyways.

(3) Since queries that involve custom fields will now have to be written
on the fly they are less able to be optimized by using a database that can
deal with prepare() usefully.

(4) Since queries involving custom fields are going to take a few database
hits to figure out what the field names so the query could be written you
end up with cases where 1 query turns into 4 queries. If the database is
across a WAN from the bugzilla instance the effect of multiple queries
where there were one will be more noticable.

(5) Custom fields should be able to be implemented without the bugzilla
user having database privs to alter tables.

Myk - it sounds like you're basing the decision on what way to go here
totally based on performance and I think there's a lot more that should go
into this decision.
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-25 00:23:59 UTC
Permalink
Post by Christopher Hicks
Bah. Your conclusion of what's necessary here seems to be based on a
very selective view of the universe. Several "necessities" stick out
when considering a custom fields implemention based on abstraction at
(1) People can add as many custom fields as they want without worrying
about reaching the maximum record size of their database
Under my fields-as-columns (FAC) proposal, fields can live in their own
tables, and there is no limit to the number of tables per database in
MySQL, so this isn't an issue between my proposal and Sean's
fields-as-data (FAD) proposal.
Post by Christopher Hicks
(2) Code for dealing with custom fields is going to need to have a
goodly portion of the abstraction tables for keeping track of stuff
anyways.
I'm not sure what you mean by abstraction tables, but both proposals
will require some meta-data about fields to be stored in fielddefs, just
as we already do for standard fields, and will store lists of possible
values for some fields in separate tables, just as we already do for
standard fields like component and should be doing for fields like op_sys.

That doesn't mean we should store all meta-data as data. We should use
the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
Post by Christopher Hicks
(3) Since queries that involve custom fields will now have to be
written on the fly they are less able to be optimized by using a
database that can deal with prepare() usefully.
How does FAC require queries to be written "on the fly" in a less
optimizable way, reducing performance on prepare()-happy databases? Can
you demonstrate this?
Post by Christopher Hicks
(4) Since queries involving custom fields are going to take a few
database hits to figure out what the field names so the query could be
written you end up with cases where 1 query turns into 4 queries. If
the database is across a WAN from the bugzilla instance the effect of
multiple queries where there were one will be more noticable.
Sure, more queries is slower. But FAC would use less queries overall,
and simpler ones at that.

Consider a simple query on a single custom field "foo" where the user
wants bugs where foo=bar. With FAC, we search for bugs where the "foo"
column contains "bar". With FAD, we look up the field ID for "foo" and
then search for bugs where the "field_id" column contains that ID and
the "value" column contains "bar" (or do it in one query with a join).

Even in the worst case, when you couldn't infer the column name from the
form field name, FAC lookups would only be equal to, not worse than, FAD
lookups. And if these lookups mattered (which Sean claims they don't),
the list of custom fields and associated identifiers would get cached
under either proposal the same way we cache components and versions.
Post by Christopher Hicks
(5) Custom fields should be able to be implemented without the
bugzilla user having database privs to alter tables.
FAD is even more insecure in this regard, since a compromised Bugzilla
user account that wasn't allowed to alter tables would still be able to
alter custom fields under that proposal (unless we implemented
table/column-specific privileges for that account, which would be more
work and complexity--and thus risk).

Nevertheless, note that the Bugzilla user account already has such
privileges today (checksetup.pl uses them to set up and update the
database schema), and even if we took them away, the Bugzilla
administrators, to whom we will entrust the creation of custom fields,
will certainly retain those privileges via a separate account.
Post by Christopher Hicks
Myk - it sounds like you're basing the decision on what way to go here
totally based on performance and I think there's a lot more that
should go into this decision.
To the contrary, my proposal is based on much more than performance. My
previous email was about performance only because that was Sean's
primary argument against it (he thought my proposal would be slower and
offered data to support his conclusion--I ran his tests myself and found
the opposite was true).

I think we should use real columns for custom fields because:

1. that's what they're there for;

Custom fields are no different from standard fields in how they're
used (queried, displayed, updated, etc.), and columns were
designed for this express purpose when database systems were
developed. Given that they've been used to represent "fields" of
all kinds for decades, and that we've used them in Bugzilla to
represent the standard fields for over five years, they're a
mature and proven technology for doing what we want and likely to
be better than any new mechanism we come up with which represents
fields as data.

2. then they work the same as standard fields;

Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.

3. it makes them significantly faster;

Per my tests and standard database design theory, real columns are
much faster than data columns.

-myk
Vlad Dascalu
2005-01-25 05:28:39 UTC
Permalink
I agree with Myk. The FAC proposal makes more sense for me, from all
points of view.

Vlad.
Post by Myk Melez
Post by Christopher Hicks
Bah. Your conclusion of what's necessary here seems to be based on a
very selective view of the universe. Several "necessities" stick out
when considering a custom fields implemention based on abstraction at
(1) People can add as many custom fields as they want without
worrying about reaching the maximum record size of their database
Under my fields-as-columns (FAC) proposal, fields can live in their
own tables, and there is no limit to the number of tables per database
in MySQL, so this isn't an issue between my proposal and Sean's
fields-as-data (FAD) proposal.
Post by Christopher Hicks
(2) Code for dealing with custom fields is going to need to have a
goodly portion of the abstraction tables for keeping track of stuff
anyways.
I'm not sure what you mean by abstraction tables, but both proposals
will require some meta-data about fields to be stored in fielddefs,
just as we already do for standard fields, and will store lists of
possible values for some fields in separate tables, just as we already
do for standard fields like component and should be doing for fields
like op_sys.
That doesn't mean we should store all meta-data as data. We should
use the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
Post by Christopher Hicks
(3) Since queries that involve custom fields will now have to be
written on the fly they are less able to be optimized by using a
database that can deal with prepare() usefully.
How does FAC require queries to be written "on the fly" in a less
optimizable way, reducing performance on prepare()-happy databases?
Can you demonstrate this?
Post by Christopher Hicks
(4) Since queries involving custom fields are going to take a few
database hits to figure out what the field names so the query could
be written you end up with cases where 1 query turns into 4 queries.
If the database is across a WAN from the bugzilla instance the effect
of multiple queries where there were one will be more noticable.
Sure, more queries is slower. But FAC would use less queries overall,
and simpler ones at that.
Consider a simple query on a single custom field "foo" where the user
wants bugs where foo=bar. With FAC, we search for bugs where the
"foo" column contains "bar". With FAD, we look up the field ID for
"foo" and then search for bugs where the "field_id" column contains
that ID and the "value" column contains "bar" (or do it in one query
with a join).
Even in the worst case, when you couldn't infer the column name from
the form field name, FAC lookups would only be equal to, not worse
than, FAD lookups. And if these lookups mattered (which Sean claims
they don't), the list of custom fields and associated identifiers
would get cached under either proposal the same way we cache
components and versions.
Post by Christopher Hicks
(5) Custom fields should be able to be implemented without the
bugzilla user having database privs to alter tables.
FAD is even more insecure in this regard, since a compromised Bugzilla
user account that wasn't allowed to alter tables would still be able
to alter custom fields under that proposal (unless we implemented
table/column-specific privileges for that account, which would be more
work and complexity--and thus risk).
Nevertheless, note that the Bugzilla user account already has such
privileges today (checksetup.pl uses them to set up and update the
database schema), and even if we took them away, the Bugzilla
administrators, to whom we will entrust the creation of custom fields,
will certainly retain those privileges via a separate account.
Post by Christopher Hicks
Myk - it sounds like you're basing the decision on what way to go
here totally based on performance and I think there's a lot more that
should go into this decision.
To the contrary, my proposal is based on much more than performance.
My previous email was about performance only because that was Sean's
primary argument against it (he thought my proposal would be slower
and offered data to support his conclusion--I ran his tests myself and
found the opposite was true).
1. that's what they're there for;
Custom fields are no different from standard fields in how
they're used (queried, displayed, updated, etc.), and columns
were designed for this express purpose when database systems
were developed. Given that they've been used to represent
"fields" of all kinds for decades, and that we've used them in
Bugzilla to represent the standard fields for over five years,
they're a mature and proven technology for doing what we want
and likely to be better than any new mechanism we come up with
which represents fields as data.
2. then they work the same as standard fields;
Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.
3. it makes them significantly faster;
Per my tests and standard database design theory, real columns
are much faster than data columns.
-myk
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2005-01-25 11:33:29 UTC
Permalink
Post by Vlad Dascalu
I agree with Myk. The FAC proposal makes more sense for me, from all
points of view.
I also used to think so.

But if we want people to be able to write "plugins" for Bugzilla,
Fields-As-Rows makes much more sense.

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
Kerio Technologies, Inc.
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-2500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-25 22:12:43 UTC
Permalink
Post by Maxwell Kanat-Alexander
Post by Vlad Dascalu
I agree with Myk. The FAC proposal makes more sense for me, from all
points of view.
I also used to think so.
But if we want people to be able to write "plugins" for Bugzilla,
Fields-As-Rows makes much more sense.
Plugins can be roughly divided into three categories:

* add no new fields (NNF), just manipulate existing ones (f.e. a
plugin that added a "checked in" comment and resolved bugs "FIXED"
when a patch on them was checked into a source control system);
* add new generic fields (NGF) that don't need to be manipulated
specially and can be implemented using custom fields;
* add new special fields (NSF) that need to be manipulated specially
and cannot be implemented using custom fields (f.e. Test Runner
<http://sourceforge.net/projects/testrunner/>);

Of these, only NGF plugins can use custom fields, and if a plugin just
creates a custom field and then lets the generic custom field code
handle interaction with it, then why take the trouble to make it a
plugin? It'd be much easier for installations that want the
functionality to just define the custom field themselves using whatever
interface we develop for managing custom fields.

I think very few, if any, plugins will create custom fields. But even
if some did (f.e. a plugin that created a bunch of related custom fields
which would otherwise be burdensome to create by hand via the interface,
or a plugin which reused most generic custom fields code but with a few
special tweaks), why does FAD make more sense for them? Real columns
are as easy to create programmatically, keep plugin data more separate
from standard and installation-defined custom field data, and can be
made more secure if necessary (by giving plugins a MySQL account that
only permitted write access to its own tables).

-myk
Max Kanat-Alexander
2005-01-26 02:05:26 UTC
Permalink
Post by Sean McAfee
[snip]
OK. Fair enough.

My experience is that implementations which try to predict the future,
and limit themselves in design to only what they predict, are doomed to
failure. That's why I'm in favor of FAR, because it's more extensible. I
can't know what types of plugins people will create, so I'd rather give
them as many options as possible.

As a single example, what if I created a single plugin that was
"Project Management for Bugzilla?"

Fields-As-Data makes it easier to write generic SQL, also, to grab data
out of custom fields, without knowing the type of data in those fields
in advance, or how we'd have to JOIN in order to get the data, or even
having to modify the SELECT statement in a fashion that can't use
placeholders.

To be honest, I haven't tried to implement it either way, personally. I
know that Sean has. I think he used Fields-As-Data, and I think his
implementation is working nicely at Transmeta.

Of course, Fields-As-Columns is the way that Bugzilla works now. Except
for things like longdescs, which is a sort of Fields-As-Data thing.

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-4500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-27 14:03:08 UTC
Permalink
Post by Max Kanat-Alexander
Post by Sean McAfee
[snip]
OK. Fair enough.
My experience is that implementations which try to predict the future,
and limit themselves in design to only what they predict, are doomed to
failure. That's why I'm in favor of FAR, because it's more extensible. I
can't know what types of plugins people will create, so I'd rather give
them as many options as possible.
In my experience it's just as dangerous to overgenericize because
someone might need some capability you don't know about yet. But FAC
doesn't limit the use of custom fields by plugins, and it allows
installations and plugins to create just as many and as varied custom
fields as FAD. How is FAD more extensible?
Post by Max Kanat-Alexander
As a single example, what if I created a single plugin that was
"Project Management for Bugzilla?"
Ok, let's say you did. How does that inform this debate?
Post by Max Kanat-Alexander
Fields-As-Data makes it easier to write generic SQL
What's generic SQL, why is it important, and how is it easier to write
in FAD?
Post by Max Kanat-Alexander
, also, to grab data
out of custom fields, without knowing the type of data in those fields
in advance
Not true. FAD uses field types just as surely as FAC and standard
fields. Both proposals similarly enable installations to stuff data of
all types into text fields if desired. But manipulating typeless data
is of limited utility, anyway, and suboptimal (which is why we're
storing data in a database in the first place).
Post by Max Kanat-Alexander
, or how we'd have to JOIN in order to get the data,
Both proposals store meta-data that tells code how to access custom
fields and when a JOIN will be necessary in a meta-data table (either by
extending fielddefs or creating an equivalent). With FAC, it may
sometimes be possible to derive this information from a form field name,
which is easier than looking it up in the meta-data table. Otherwise,
FAC will do the lookup, but FAD will too. How is it easier with FAD,
and why does that matter?
Post by Max Kanat-Alexander
or even
having to modify the SELECT statement in a fashion that can't use
placeholders.
How does FAC make it harder to use placeholders?
Post by Max Kanat-Alexander
To be honest, I haven't tried to implement it either way, personally. I
know that Sean has. I think he used Fields-As-Data, and I think his
implementation is working nicely at Transmeta.
It's possible to implement custom fields via either approach, and it's
likely that either approach will work on a given installation, but we're
trying to figure out the best approach for all Bugzilla installations
and developers, and while the success of an approach on one site can
inform the debate, it can't decide it.

While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations for
a number of years, both for standard fields and for custom ones, not to
mention the general success of FAC in database design.
Post by Max Kanat-Alexander
Of course, Fields-As-Columns is the way that Bugzilla works now. Except
for things like longdescs, which is a sort of Fields-As-Data thing.
longdescs isn't FAD at all, it stores comment entities which have
attributes represented by columns and whose rows each represent a unique
instance of a comment. It's a classic example of an object modeled by a
table, just like the bugs, attachments, and users tables.

-myk
Christopher Hicks
2005-01-27 14:41:16 UTC
Permalink
While weighing Sean's success with FAD at his installation, we should also
weigh the success of FAC on hundreds of Bugzilla installations for a number
of years, both for standard fields and for custom ones, not to mention the
general success of FAC in database design.
There's enough text being sent around about this without straw man
arguments such as this. If none of those folks are doing custom fields as
extensively as Sean is then they're not relevant to this discussion.
That leaves one working example that applies to the question of how to
implement custom fields and that is Sean's. Given the choice between
betting on the horse going around the track and the horse in your
imagination, my money is on Sean. If I could only find a bookie.

Beyond that, I don't think nearly as much credit has been given to Sean
for (A) making something that works and being willing to share it or (B)
his persistence in battling with bugzilla developers who seem to ignore A.

In this whole FAC vs. FAR debate I think I've missed something. In Sean's
discussion of Myk's proposal he indicated that Myk was proposing that each
new custom field would be in its own table. Is that really so? Am I the
only person who finds that bizarre? I can understand FAR being a tough
pill for a relational die hard to swallow, but coming up with something
even more outlandish as an alternative is bizarre.
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-02-02 16:56:46 UTC
Permalink
Post by Christopher Hicks
There's enough text being sent around about this without straw man
arguments such as this. If none of those folks are doing custom
fields as extensively as Sean is then they're not relevant to this
discussion.
I suspect that the majority of installations that use custom fields
don't use them nearly as extensively as Sean, but that doesn't make them
irrelevant, especially if they are the large majority.

Nevertheless, the hundreds of Bugzilla installations I was referring to
include all those making extensive use of standard fields, all of which
are built on FAC, and which show FAC working well in extensive use.
Post by Christopher Hicks
That leaves one working example that applies to the question of how to
implement custom fields and that is Sean's. Given the choice between
betting on the horse going around the track and the horse in your
imagination, my money is on Sean.
Sean's implementation of custom fields is certainly not the only one,
although it may be the most extensively developed. But if "the
alternative is all in my imagination" is the anti-FAC argument-du-jour,
fine, I've attached an alternative implementation to the bug.
Post by Christopher Hicks
Beyond that, I don't think nearly as much credit has been given to
Sean for (A) making something that works and being willing to share it
or (B) his persistence in battling with bugzilla developers who seem
to ignore A.
Sean deserves much credit for making something that works and being
willing to share it, and he's demonstrated valuable persistence in
arguing his case, but it's a gross misstatement to say that Bugzilla
developers have been ignoring it. The problem is what works for one
installation at the moment is not necessarily the same as what works for
all Bugzilla installations in the long term, and it's important to give
those issues due consideration, especially with a feature that has so
many far-reaching and long-term implications for the codebase.
Post by Christopher Hicks
In this whole FAC vs. FAR debate I think I've missed something. In
Sean's discussion of Myk's proposal he indicated that Myk was
proposing that each new custom field would be in its own table. Is
that really so? Am I the only person who finds that bizarre? I can
understand FAR being a tough pill for a relational die hard to
swallow, but coming up with something even more outlandish as an
alternative is bizarre.
No, I wasn't proposing that, but I did propose putting sparse fields
into their own tables, which is part of the reason multi-tabular
database systems were designed in the first place. There's nothing
outlandish about it at all.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Shane H. W. Travis
2005-01-27 16:07:49 UTC
Permalink
Post by Myk Melez
While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations for
a number of years, both for standard fields and for custom ones, not to
mention the general success of FAC in database design.
Something else to weigh: Are any of those 'hundreds of installations' (where
do you get this figure from anyway?) making any effort to improve Bugzilla
by re-committing their code? Do they have a developer who is willing to
commit to the multiple re-writes and constant criticism that's going to be
necessary to get a patch of this size and magnitude landed on the trunk?

AIUI, Sean has already had to develop this locally -- because his bosses
told him to. He is now trying to make the results of his efforts available
to Bugzilla, because it's something that people have been griping about
wanting done for the last five years... and he's basically being told to go
piss up a rope because his design isn't good enough. Way to foster major
contributions! Funny thing is that it's good enough to hold 187 custom
fields at his site... but that's not good enough for us (or, more
specifically, for Myk).

If one of FAC/FAD were a complete abomination, and implementing it that way
would be universally looked back on as a horrendous mistake... then
absolutely I agree that the code shouldn't be taken just for the sake of
having it... but that doesn't seem to be the case here. There are benefits
and trade-offs to each method. Each one has its proponents and its
detractors, and this discussion is rapidly taking on some characteristics of
a Religious Flame War.

Working code (and dedicated developers) trumps beautiful theories nine times
out of ten, in my books.

Shane H.W. Travis | The greatest of all mistakes is to do nothing
travis-leLQYRwCyNZIo2TaICnI/***@public.gmane.org | because you can only do a little.
Saskatoon, Saskatchewan | Do what you can. -- Sydney Smith
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Steven Suson
2005-01-27 17:21:59 UTC
Permalink
I have to agree whole heartedly with this. It seems every time that
a discussion of custom fields begins, that it deteriorates into a flame
war... And each and every time, the proposed contributor is nit picked
to death, until the custom fields discussion dies away. Come on guys!
MANY bugzilla users have wanted custom fields for several years now
(including my company and I); remember that best is the enemy of good
enough.

Granted I'm not a relational purist, but having a seperate table for
each custom field seems quite excessive. Perhaps it gives us better
normalization, but at what cost? I back Sean on this one!

Steven Suson
Post by Shane H. W. Travis
Post by Myk Melez
While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations for
a number of years, both for standard fields and for custom ones, not to
mention the general success of FAC in database design.
Something else to weigh: Are any of those 'hundreds of installations' (where
do you get this figure from anyway?) making any effort to improve Bugzilla
by re-committing their code? Do they have a developer who is willing to
commit to the multiple re-writes and constant criticism that's going to be
necessary to get a patch of this size and magnitude landed on the trunk?
AIUI, Sean has already had to develop this locally -- because his bosses
told him to. He is now trying to make the results of his efforts available
to Bugzilla, because it's something that people have been griping about
wanting done for the last five years... and he's basically being told to go
piss up a rope because his design isn't good enough. Way to foster major
contributions! Funny thing is that it's good enough to hold 187 custom
fields at his site... but that's not good enough for us (or, more
specifically, for Myk).
If one of FAC/FAD were a complete abomination, and implementing it that way
would be universally looked back on as a horrendous mistake... then
absolutely I agree that the code shouldn't be taken just for the sake of
having it... but that doesn't seem to be the case here. There are benefits
and trade-offs to each method. Each one has its proponents and its
detractors, and this discussion is rapidly taking on some characteristics of
a Religious Flame War.
Working code (and dedicated developers) trumps beautiful theories nine times
out of ten, in my books.
Shane H.W. Travis | The greatest of all mistakes is to do nothing
Saskatoon, Saskatchewan | Do what you can. -- Sydney Smith
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Daniel Berlin
2005-01-28 03:45:15 UTC
Permalink
I have to agree whole heartedly with this. It seems every time that a
discussion of custom fields begins, that it deteriorates into a flame war...
And each and every time, the proposed contributor is nit picked to death,
until the custom fields discussion dies away. Come on guys! MANY bugzilla
users have wanted custom fields for several years now (including my company
and I); remember that best is the enemy of good enough.
I can't stress enough how true this is.

Myk, i hate to say it, because you are a valued bugzilla contributor, but
unless you garner more support for your position (which nobody seems to
have thrown their weight behind, no offense), i believe it is time to say
"objection noted, let's move on".
This is not a design that is so flawed it is unusable, because there is an
existence proof that it is usable.

If you truly believe your design is better, i believe it is time for
you to implement a custom fields patch yourself, so we have two working
patches to compare, instead of comparing "database theory" and "working
code".
Steven Suson
Post by Shane H. W. Travis
Post by Myk Melez
While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations for
a number of years, both for standard fields and for custom ones, not to
mention the general success of FAC in database design.
Something else to weigh: Are any of those 'hundreds of installations' (where
do you get this figure from anyway?) making any effort to improve Bugzilla
by re-committing their code? Do they have a developer who is willing to
commit to the multiple re-writes and constant criticism that's going to be
necessary to get a patch of this size and magnitude landed on the trunk?
AIUI, Sean has already had to develop this locally -- because his bosses
told him to. He is now trying to make the results of his efforts available
to Bugzilla, because it's something that people have been griping about
wanting done for the last five years... and he's basically being told to go
piss up a rope because his design isn't good enough. Way to foster major
contributions! Funny thing is that it's good enough to hold 187 custom
fields at his site... but that's not good enough for us (or, more
specifically, for Myk).
If one of FAC/FAD were a complete abomination, and implementing it that way
would be universally looked back on as a horrendous mistake... then
absolutely I agree that the code shouldn't be taken just for the sake of
having it... but that doesn't seem to be the case here. There are benefits
and trade-offs to each method. Each one has its proponents and its
detractors, and this discussion is rapidly taking on some characteristics of
a Religious Flame War.
Working code (and dedicated developers) trumps beautiful theories nine times
out of ten, in my books.
Shane H.W. Travis | The greatest of all mistakes is to do nothing
Saskatoon, Saskatchewan | Do what you can. -- Sydney Smith
-
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Christopher Hicks
2005-01-30 14:25:31 UTC
Permalink
Post by Daniel Berlin
I have to agree whole heartedly with this. It seems every time that a
discussion of custom fields begins, that it deteriorates into a flame
war... And each and every time, the proposed contributor is nit picked to
death, until the custom fields discussion dies away. Come on guys! MANY
bugzilla users have wanted custom fields for several years now (including
my company and I); remember that best is the enemy of good enough.
I can't stress enough how true this is.
Myk, i hate to say it, because you are a valued bugzilla contributor, but
unless you garner more support for your position (which nobody seems to have
thrown their weight behind, no offense), i believe it is time to say
"objection noted, let's move on".
This is not a design that is so flawed it is unusable, because there is an
existence proof that it is usable.
If you truly believe your design is better, i believe it is time for you to
implement a custom fields patch yourself, so we have two working patches to
compare, instead of comparing "database theory" and "working code".
Abso-bleeping-lutely. Props to Steven and Daniel for jumping into the
melee that is custom fields.

There seems to be somewhat broad agreement on a few points:

(1) Anybody who tried to get a design approved by the bugzilla community
before implementing custom fields would never start implementing.

(2) A workable implementation exists and people seem to be falling back in
the quagmire of (1).

(3) Working code should trump theory, but its not. As a long time
pedantic a*hole myself I truly sympathise with the purists... but this
situation lost its "virginity" so long ago that purity and idealism
haven't fit for a long time.

One other red herring that I'm tired of seeing is that we're stuck with a
bad implementation and that another lump of checksetup.pl tweaking can't
move us to another way of doing things. If FAC and FAD are implementing
the same fields in different ways then there is a one-to-one mapping
between how something would be represented in each and there's an
algorithm for converting them. It would seem easier for the FAC purists
to patch something that works so that it works they way they want than to
start over. (But since the purists are busy arguing against somebody who
made a major accomplishment no progress on either has been made.)

Yet another red herring is that big changes are bad. Something like
custom fields is something that by its very nature is a big change.
Splitting it up into a dozen chunks doesn't change this fact and is just
a pointless exercise in fitting a 10 pound bad of salt through a one pound
slot.
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Gervase Markham
2005-01-30 23:58:32 UTC
Permalink
Post by Christopher Hicks
One other red herring that I'm tired of seeing is that we're stuck with
a bad implementation and that another lump of checksetup.pl tweaking
can't move us to another way of doing things.
Without commenting on any of the other issues, I don't see this as a red
herring. While in theory it's probably true that one could convert from
FAC to FAD using checksetup.pl, in practice such a thing is more complex
than any other dynamic schema change we've ever tried, and would be hard
to write and liable to fail in unexpected ways due to the "meta" nature
of custom fields and the fact that we can't predict what people will use
them for. It would never happen.

I certainly agree that endless debate is bad - but the correct way to
deal with endless debate is for the person charged with ending it to
bring it to an end. Dave should step up and make a decision - FAC or
FAD. The fact that one has an implementation available is certainly a
factor in that decision, but should not by itself be conclusive.

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Christopher Hicks
2005-01-31 17:09:27 UTC
Permalink
Post by Gervase Markham
Post by Christopher Hicks
One other red herring that I'm tired of seeing is that we're stuck with a
bad implementation and that another lump of checksetup.pl tweaking can't
move us to another way of doing things.
Without commenting on any of the other issues, I don't see this as a red
herring. While in theory it's probably true that one could convert from
FAC to FAD using checksetup.pl, in practice such a thing is more complex
than any other dynamic schema change we've ever tried, and would be hard
to write and liable to fail in unexpected ways due to the "meta" nature
of custom fields and the fact that we can't predict what people will use
them for. It would never happen.
Bah. I don't think the meta nautre of custom fields changes the fact that
FAC and FAR are two different ways to represent the same thing. There's a
one-to-one correspondance between them and calling it "hard" to convert
doesn't make it so. I'm not saying it would be a trivial task, but it
strikes me as being pretty straight forward and something that anybody who
has experience dealing with nontrivial databases could do without agony.
Post by Gervase Markham
I certainly agree that endless debate is bad - but the correct way to
deal with endless debate is for the person charged with ending it to
bring it to an end. Dave should step up and make a decision - FAC or
FAD. The fact that one has an implementation available is certainly a
factor in that decision, but should not by itself be conclusive.
It seems that Dave is working on that. Yay!
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Vlad Dascalu
2005-01-27 19:35:31 UTC
Permalink
Shane, wrong asnwer.
Post by Shane H. W. Travis
AIUI, Sean has already had to develop this locally -- because his bosses
told him to. He is now trying to make the results of his efforts available
to Bugzilla, because it's something that people have been griping about
wanting done for the last five years... and he's basically being told to go
piss up a rope because his design isn't good enough.
This should come as no surprise to you. As you probably know, there are
a lot of software cycles out there. However, most of them include some
basics:
-> see the requirements
-> build a design
-> analyse the design (get feedback, analyse complexity, benchmark it)
-> go back to step 2 if you can improve it
-> perform low level design
-> code it

(the LLD stage might be placed in another spot for some cycles).

It should come as no surprise to either Sean or you that when you skip
those steps and jump directly to the "code it" stage, you risk having
the whole code invalidated and the need to redo from scratch. It's like
heading from Madrid to Paris by airplane when the core team was still
debating whether if airplane is a good idea or not compared to other
means of transportation. Suggesting that we should take the code as it
just because it's there really is againest all the review and design
practices that improved Bugzilla code over the years.
Post by Shane H. W. Travis
Way to foster major
contributions!
Way to foster quality! Nobody worked together with the core team in
order to perform the analysis stage of the development. I think it's
unfair to skip this stage and to try to make the core team accept a
suboptimal code without having either review or basic design stages.
Post by Shane H. W. Travis
Funny thing is that it's good enough to hold 187 custom
fields at his site... but that's not good enough for us (or, more
specifically, for Myk).
It's not good enough for the entire core team, AFAIK, well, maybe except
you. Trying to isolate Myk and to forge pal-alliances based on your
ideas and friends helps no one. Certainly it doesn't help Bugzilla
development. Please leave the list of supporters (and the numbers on
those lists, which apparently ignore me and the rest of the core team)
aside. This is not how Bugzilla works. We don't have public polls
regarding what code to checkin or not. We have a review procedure in
place, and it's paramount regarding the quality of the code that's going
to be checked in.
Post by Shane H. W. Travis
If one of FAC/FAD were a complete abomination, and implementing it that way
would be universally looked back on as a horrendous mistake... then
absolutely I agree that the code shouldn't be taken just for the sake of
having it... but that doesn't seem to be the case here.
It certainly seems that way to me. Mind you, that's what we've been
debating until now.
Post by Shane H. W. Travis
There are benefits
and trade-offs to each method. Each one has its proponents and its
detractors, and this discussion is rapidly taking on some characteristics of
a Religious Flame War.
I saw it as a perfect rationale discussion until you and others tried to
go down the "religious flame war". Please don't side-rail a productive
discussion. Myk's, Mkanat's, even Sean's arguments all seemed rationale
and very insightful. I don't see why you feel the need to stop it.
Post by Shane H. W. Travis
Working code (and dedicated developers) trumps beautiful theories nine times
out of ten, in my books.
That's why probably in your book you need to rewrite the code in nine
out of ten cases.

Vlad.
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Daniel Berlin
2005-01-28 04:05:31 UTC
Permalink
Post by Shane H. W. Travis
Way to foster major
contributions!
Way to foster quality! Nobody worked together with the core team in order to
perform the analysis stage of the development.I think it's unfair to skip
this stage and to try to make the core team accept a suboptimal code without
having either review or basic design stages.
I'm a usually silent observer here, but i've watched what is happening now
happen time and time again, and it's always the same arguments on both
sides.
I'll also note that one of the reasons i don't work with the core team
before i implement gcc features is because i have the distinct impression
(from watching the bugzilla bug reports and mailing lists for years
now) that some members of the core team believe it is their way, or the
highway, and i don't have time to theoretically masturbate over bug system
design. Unless some other design for one of my gcc bugzilla patches is
going to be wildly easier to maintain, it's just not worth the time. This
is not the same as saying i believe i should be able to contribute a bunch
of complete hacks. However, too often people here seem to delve into
design issues that are way too picky for a system of the complexity of
bugzilla.

Effectively claiming that we can't get to some good place starting
with sean's design and code, and incrementally improving it (IE
making the rest of it acceptable, committing it, and improving it as we go
along) is just not a good argument to me.
I've seen it done in every open source project i've ever worked
on[1].
You know what?
Unless the design just plain can't work, you can do this, and you know
what, it makes the users and developers a *lot* happier, because they have
*something*, instead of *nothing*.

Sean's design meets the requirements of being a sane custom fields
implementation. He is willing to conform his *code* in order to meet the
other design requirements of a custom fields implementation for bugzilla.

His existing design may not be the most performant, or whatever, but it
works, and it is a starting point.

These are just fields and tables in a database.
We can always migrate.

Or do you want to wait a few years for the next guy to try to make a
custom fields patch, and then we can beat him down too for it not being
theoretically perfect, when we could have spent the time turning something
we've got *now*, into what we wanted, by then.


--Dan
[1] Hell,we just, over the course of 2 years, added a new SSA based middle
end to GCC (which had no middle end, and for that matter, no well-defined
interface between the frontends and the backend).
The starting design was incrementally improved into something wildly
different. Yes it required some coding changes on the part of people
working on new optimizers. I was one of these people.
You know what?
It was better to have to change large amounts of my code because diego
decided some abstraction wasn't giving us the information or performance
we needed, than it would have been to sit there and do nothing while he
sat around for 3 years trying to design the perfect interfaces and
abstraction level. It never would have been completed had we done it that
way.
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Luis Villa
2005-01-29 17:37:07 UTC
Permalink
Amen, on /all/ counts.
Luis


On Thu, 27 Jan 2005 23:05:31 -0500 (EST), Daniel Berlin
Post by Daniel Berlin
Post by Shane H. W. Travis
Way to foster major
contributions!
Way to foster quality! Nobody worked together with the core team in order to
perform the analysis stage of the development.I think it's unfair to skip
this stage and to try to make the core team accept a suboptimal code without
having either review or basic design stages.
I'm a usually silent observer here, but i've watched what is happening now
happen time and time again, and it's always the same arguments on both
sides.
I'll also note that one of the reasons i don't work with the core team
before i implement gcc features is because i have the distinct impression
(from watching the bugzilla bug reports and mailing lists for years
now) that some members of the core team believe it is their way, or the
highway, and i don't have time to theoretically masturbate over bug system
design. Unless some other design for one of my gcc bugzilla patches is
going to be wildly easier to maintain, it's just not worth the time. This
is not the same as saying i believe i should be able to contribute a bunch
of complete hacks. However, too often people here seem to delve into
design issues that are way too picky for a system of the complexity of
bugzilla.
Effectively claiming that we can't get to some good place starting
with sean's design and code, and incrementally improving it (IE
making the rest of it acceptable, committing it, and improving it as we go
along) is just not a good argument to me.
I've seen it done in every open source project i've ever worked
on[1].
You know what?
Unless the design just plain can't work, you can do this, and you know
what, it makes the users and developers a *lot* happier, because they have
*something*, instead of *nothing*.
Sean's design meets the requirements of being a sane custom fields
implementation. He is willing to conform his *code* in order to meet the
other design requirements of a custom fields implementation for bugzilla.
His existing design may not be the most performant, or whatever, but it
works, and it is a starting point.
These are just fields and tables in a database.
We can always migrate.
Or do you want to wait a few years for the next guy to try to make a
custom fields patch, and then we can beat him down too for it not being
theoretically perfect, when we could have spent the time turning something
we've got *now*, into what we wanted, by then.
--Dan
[1] Hell,we just, over the course of 2 years, added a new SSA based middle
end to GCC (which had no middle end, and for that matter, no well-defined
interface between the frontends and the backend).
The starting design was incrementally improved into something wildly
different. Yes it required some coding changes on the part of people
working on new optimizers. I was one of these people.
You know what?
It was better to have to change large amounts of my code because diego
decided some abstraction wasn't giving us the information or performance
we needed, than it would have been to sit there and do nothing while he
sat around for 3 years trying to design the perfect interfaces and
abstraction level. It never would have been completed had we done it that
way.
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Joel Peshkin
2005-01-29 17:22:28 UTC
Permalink
OK, time for me to join the chous.... we need this.

As long as it causes no MAJOR regression on sites not making heavy use
of custom fields and does not make a MAJOR mess of the code, we should
get this in. Myk, Justdave -- can you live with this?

I stated my wish list for this a long time ago. I won't restate it here.

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Paulo Casanova
2005-01-31 15:21:36 UTC
Permalink
Hi all!

I was away on (lovely :)) holidays and when I came I was (positively)
suprised to see so much discussion. We have been working (internally) on
CF for quite some time but with a very different approach.

I fell some degree of unrealism on these discussions on CF
implementations. Firstly, I see too much emphasis on FAC and FAD and
RDBMS theory. I work with SGBDs for several years and can state with
that, with MySQL, they're both irrelevant. Either will work. MySQL is
(1) too limited and (2) too good within its limits.

[Some unimportant notes to support what I'm saying: if you'd be working
with Oracle things would be much different because other factors such as
histogram computation, query optimization, index analysis, locking
policy and other "features" might make both designs work with very
different levels of performance on different usage patterns. With MySQL,
however, I think we're discussing peanuts.]

Unfortunately I see too much discussion on *how it does* instead of
*what it does*.

My opinion, anyway, if it is at all relevant, is that other problems
should be taken care of before discussing FAC or FAD.

Actually, because I have read hundreds or emails today on BZ and are,
therefore, somewhat lost, I would like to pose a simple question: do we
know already *what* are we going to do? I mean, what functionalities
will CF provide?

1. Can we have typed CFs? Dates, numbers, floats, URLs or it is just text?
2. Are there validation rules for CFs? Plug-ins, regexs?
3. Can we have "derived" or "computed" CFs?
4. Are complex datatypes (such as timetracking) implementable as CFs?
5. Can we make our own plug-ins for data validation?
6. How will we decide which CFs are available for each bug? Are all CFs
available for all bugs? Can CFs be enabled/ disabled or shown/ hidden
based on other field values?

BTW, we do have some effort available to work on this but we need a plan
:( I see too much flame and too little fuel :). I would much prefer
using available effort to support BZ's standard implementation and,
honestly, I don't give a damn on whether we FAC or FAD... and I think
nobody will... ever.

Best,
Paulo


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Joel Peshkin
2005-01-27 18:29:46 UTC
Permalink
Post by Shane H. W. Travis
Post by Myk Melez
While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations for
a number of years, both for standard fields and for custom ones, not to
mention the general success of FAC in database design.
Something else to weigh: Are any of those 'hundreds of installations' (where
do you get this figure from anyway?) making any effort to improve Bugzilla
by re-committing their code? Do they have a developer who is willing to
commit to the multiple re-writes and constant criticism that's going to be
necessary to get a patch of this size and magnitude landed on the trunk?
I believe that 'hundreds of installations' have hacked additional fields
into the system in a manner that serves the purpose but is not landable
in any way. The people who have done this include people (such as me)
who have made huge contributions to the product including OTHER features
of equal or greater magnitude. Features of this magnitude usually
require someone who has it as their number one priority. Any user who
wants this badly enough can either invest that level of effort or hire
one of the consultants to implement the feature and land it.

-Joel



-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-02-02 17:42:52 UTC
Permalink
Post by Shane H. W. Travis
Something else to weigh: Are any of those 'hundreds of installations' (where
do you get this figure from anyway?) making any effort to improve Bugzilla
by re-committing their code? Do they have a developer who is willing to
commit to the multiple re-writes and constant criticism that's going to be
necessary to get a patch of this size and magnitude landed on the trunk?
"Hundreds of installations" refers to both custom fields and standard
ones. And I don't know about other developers, but I'm willing to
commit to those rewrites and criticism of my code.
Post by Shane H. W. Travis
AIUI, Sean has already had to develop this locally -- because his bosses
told him to. He is now trying to make the results of his efforts available
to Bugzilla, because it's something that people have been griping about
wanting done for the last five years... and he's basically being told to go
piss up a rope because his design isn't good enough.
No one has told Sean to "piss up a rope" or anything like it, nor has
anyone complained about the vast majority of the work he's done. That's
a gross mischaracterization of the tenor of the debate.

As far as I know I'm the only one who has criticized Sean's work to any
significant degree, and I have merely stated that the database design is
significantly suboptimal and that we should be employing a different
design for this feature. That's a reasonable opinion to state and discuss.
Post by Shane H. W. Travis
If one of FAC/FAD were a complete abomination, and implementing it that way
would be universally looked back on as a horrendous mistake... then
absolutely I agree that the code shouldn't be taken just for the sake of
having it... but that doesn't seem to be the case here.
I think if we implement FAD we will regret it.
Post by Shane H. W. Travis
Working code (and dedicated developers) trumps beautiful theories nine times
out of ten, in my books.
This is the tenth time, when the ramifications of a feature with as
significant an impact on the codebase warrant getting it as right as
possible the first time. Nonetheless, I've posted the first version of
a FAC-based custom fields implementation to the bug, so there's working
code for developers to compare on both sides of the debate now.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
mexico mouse
2005-02-03 18:04:49 UTC
Permalink
i´m trying to use 'fake sendmail", as shown on this article:
http://www.bugzilla.org/docs/win32install.html

Does anyone could help me with this bug ?
https://bugzilla.mozilla.org/show_bug.cgi?id=280911

I´ve tried the hints of Marc, but I didn´t get sucess.

Greetings,
mexicomouse

_________________________________________________________________
MSN Messenger: converse online com seus amigos .
http://messenger.msn.com.br

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2005-01-27 22:49:45 UTC
Permalink
Post by Myk Melez
In my experience it's just as dangerous to overgenericize because
someone might need some capability you don't know about yet.
Yeah, that's true.
Post by Myk Melez
But FAC doesn't limit the use of custom fields by plugins, and it
allows installations and plugins to create just as many and as varied
custom fields as FAD. How is FAD more extensible?
Because it's easier for a developer who knows nothing about Bugzilla
internals (a plugin developer) to add and maintain rows than it is to
add and maintain columns.
Post by Myk Melez
Post by Max Kanat-Alexander
As a single example, what if I created a single plugin that was
"Project Management for Bugzilla?"
Ok, let's say you did. How does that inform this debate?
Because it spans across the three groups of fields that you mentioned.
And because it can require functionality that I can't possibly predict
at this point, without actually doing the design.
Post by Myk Melez
[snip]
Post by Max Kanat-Alexander
or even
having to modify the SELECT statement in a fashion that can't use
placeholders.
How does FAC make it harder to use placeholders?
FAC looks like this:

SELECT $column1, $column2, $column3, etc.
FROM $table1 $possibleJoin1 $table2 $possibleJoin2
WHERE <criteria>

FAD looks like this:

SELECT field_name, field_id, field_data, bug_id
FROM field_table <some_standard_joins>
WHERE field_name = $fieldname

Only $fieldname can be a placeholder. You can't use placeholders for
column or table names.

Generally, this is also what I meant by generic SQL.

I think it's easier to deal with the FAD query that I wrote above, for
fields that I don't know about in advance, than it is to deal with the
FAC query.

The FAC query is great for fields that I know about in advance, because
I know what the field names will be, when I'm working with it in perl,
and for various other reasons.

I'll admit, FAD does create a bit of a flat namespace, and that can be
a problem. Without proper indexes, it can create a large table that's
hard to parse. However, thankfully, the table structure is definite and
the indexes are easy to write. With FAC, there are also a larger number
of indexes, and it would be somewhat difficult to change them if we had
to, on an upgrade or something like that (that is, assuming that certain
columns had their own tables).
Post by Myk Melez
It's possible to implement custom fields via either approach, and it's
likely that either approach will work on a given installation, but
we're trying to figure out the best approach for all Bugzilla
installations and developers, and while the success of an approach on
one site can inform the debate, it can't decide it.
OK, I definitely agree with that.
Post by Myk Melez
While weighing Sean's success with FAD at his installation, we should
also weigh the success of FAC on hundreds of Bugzilla installations
for a number of years, both for standard fields and for custom ones,
not to mention the general success of FAC in database design.
True. Of course, in most of those implementations, each developer knows
exactly what specific fields they are adding, and exactly how they're
going to be used. So they're not so much "generic" custom fields as they
are "specific" custom fields.
Post by Myk Melez
longdescs isn't FAD at all, it stores comment entities which have
attributes represented by columns and whose rows each represent a
unique instance of a comment. It's a classic example of an object
modeled by a table, just like the bugs, attachments, and users tables.
OK, you're right about that. :-)

FWIW, before a few weeks ago, I was also a very strong proponent of
FAC, because it's better database design, for the backend.

I just think that FAC will allow us to accomplish our goals for
Bugzilla more easily.

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-4500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2005-01-27 23:32:12 UTC
Permalink
Post by Max Kanat-Alexander
I just think that FAC will allow us to accomplish our goals for
Bugzilla more easily.
Opps. I meant "FAD", there. :-) Darn acronyms. :-)

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-4500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Andreas Höfler
2005-01-28 07:33:24 UTC
Permalink
***************************************************************************************************
The information in this email is confidential and may be legally privileged. Access to this email by anyone other than the intended addressee is unauthorized. If you are not the intended recipient of this message, any review, disclosure, copying, distribution, retention, or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you are not the intended recipient, please reply to or forward a copy of this message to the sender and delete the message, any attachments, and any copies thereof from your system.
***************************************************************************************************
Gervase Markham
2005-01-25 08:52:44 UTC
Permalink
Post by Myk Melez
2. then they work the same as standard fields;
Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.
Let's not underestimate the advantage of this point. Currently,
Bugzilla's default field set includes some fields which most people
don't use (e.g. URL). It would be great if the custom fields schema were
such that we could easily transition fields from "hard-coded default" to
"shipped-enabled custom", so people could remove them easily.

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2005-01-23 06:43:23 UTC
Permalink
Post by Myk Melez
But preferences and custom fields are very different beasts.
True.

The reason for implementing preferences as rows is so that people can
write custom plugins for Bugzilla that can add new preferences.

The same logic would apply to custom fields.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2004-12-30 05:11:36 UTC
Permalink
Post by Sean McAfee
Do you have a specific proposal that can be tested? And, er, I don't want
to sound rude, but would you mind testing it yourself? I've already done
quite a lot in that area.
*sigh*, ok, I'll do some testing next week when I get back from
vacation, although I really think the burden of proof should be on you,
given that you're the one suggesting we overturn thirty years of RDBMS
and relational database theory, design, and practical usage, not to
mention six years of Bugzilla development techniques.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-21 19:09:05 UTC
Permalink
Post by Sean McAfee
Do you have a specific proposal that can be tested? And, er, I don't want
to sound rude, but would you mind testing it yourself? I've already done
quite a lot in that area.
Ok, I modified construct-tables.pl to add fulltext indexes to the text
columns in the tests, create "real-distributed" columns (real columns,
but in a separate table rather than the bugs table, which is how a "real
columns" custom fields implementation would handle sparse columns), and
make distributed and real-distributed versions of the status whiteboard
field.

I then wrote a script to automate testing of the queries in your earlier
email. It runs each query six times, discards the first result (which
is often inaccurate because it includes costs unrelated to the query
itself), averages the rest, and reports the results in a table similar
to the one you included in your earlier email along with each test's
details (the query, its query optimizer "explanation", and individual
run times).

I didn't modify your test queries except to fix some typos that caused
several of them not to run. I added additional tests, including
versions of yours that use fulltext indexes, a couple that join tables
the way Bugzilla does in its queries (labeled "bzlike" in the tests),
and one that runs an actual Bugzilla query (a search for "meta" in the
status whiteboard of all bugs) along with distributed and
real-distributed versions of it against a recent copy of the b.m.o database.

I ran the tests on two machines, one called holepunch with two 733Mhz
x86 processors and one called megalon with two hyperthreaded Intel Xeon
2.xGhz processors. holepunch did nothing but run the tests, but megalon
did other things at the same time, so its results may be subject to more
variance.

I've attached the modified construct-tables.pl script, the
test-performance.pl script that runs the tests, and both machines'
results, which show the real and real-distributed queries outperforming
the distributed queries by significant margins in every case, regardless
of indexing, but up to 10x faster when using fulltext indexes.

I think these results are pretty accurate. They validate database
theory (as bbaetz said of the distributed model, "it can usually never
be faster") and the consequencies of MySQL's "one index per table"
restriction. I'm sure there are still optimizations that could be done,
and the queries themselves could be made much more indicative of real
Bugzilla queries along the lines bbaetz suggested in his email.

It would also be useful to run these against a PostGreSQL database and
on additional machines. But I think the result would be the same: real
columns (either within the bugs table or in their own table if sparse)
would outperform distributed ones.

-myk


Notes:

* Make sure you're running at least MySQL 4.0 if you try these tests
yourself. Version 3.23 creates fulltext indexes too slowly on
datasets of the size generated by construct-tables.pl.
* Found records will vary between non-indexed and indexed queries
because the fulltext searches look for words starting with "meta"
instead of the "meta" substring. This limitation of fulltext
searching is well worth the advantages in query speed and
relevance ranking--most people most of the time will want to
fulltext search text.
* I made a single change to the construct-tables.pl script since
running it on the machines, removing generation of fulltext
indexes for the real distributed tables other than cf_aa (since
that one is actually used in a test, while the others aren't).
This shouldn't have any effect on test results.
Nick Barnes
2005-01-22 22:14:22 UTC
Permalink
Post by Myk Melez
Post by Sean McAfee
Do you have a specific proposal that can be tested? And, er, I don't want
to sound rude, but would you mind testing it yourself? I've already done
quite a lot in that area.
Ok, I modified construct-tables.pl to add fulltext indexes to the text
columns in the tests, create "real-distributed" columns (real columns,
but in a separate table rather than the bugs table, which is how a "real
columns" custom fields implementation would handle sparse columns), and
make distributed and real-distributed versions of the status whiteboard
field.
And I think you are both wrong! As I said before, now is surely not
the time to be making performance measurements of prototype solutions
of the custom fields defect, given that any of the proposed solutions
are plausibly capable of acceptable performance.

First implement a solution.

Then fix the glaring bugs in it.

Then get it into the trunk.

Then deploy it on test sites such as b.m.o. and let people bang on it
with real data.

Then fix the other bugs in it.

*Then* fix any performance problems with it, if necessary by replacing
the database schema part of the overall custom fields solution.

Please.

"Premature optimization is the root of all evil in programming." -
C.A.R. Hoare.

Nick B
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Christopher Hicks
2005-01-22 23:33:57 UTC
Permalink
Post by Nick Barnes
And I think you are both wrong! As I said before, now is surely not
the time to be making performance measurements of prototype solutions
of the custom fields defect, given that any of the proposed solutions
are plausibly capable of acceptable performance.
I agree wholeheartedly.
Post by Nick Barnes
"Premature optimization is the root of all evil in programming." -
C.A.R. Hoare.
I was trying to be nice and not beat them with that well-worn stick, but
this seems exactly like the case where he'd apply it.
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Gervase Markham
2005-01-23 12:18:36 UTC
Permalink
Post by Nick Barnes
First implement a solution.
Then fix the glaring bugs in it.
Then get it into the trunk.
Then deploy it on test sites such as b.m.o. and let people bang on it
with real data.
Then fix the other bugs in it.
*Then* fix any performance problems with it, if necessary by replacing
the database schema part of the overall custom fields solution.
But what happens at this stage, in the real world, is that no-one has
time to rip out the schema and start again, and lots of people have made
local modifications which would break anyway, and so we all grumble
about how we should have got it right the first time, and live with the
bad implementation.

While deciding on the correct approach should not delay the
implementation of custom fields indefinitely, I think that doing some
up-front design is a perfectly reasonable step on the way to implementation.

In the end, the choice of implementation method rests with Dave, as
project leader. Dave - do you have enough data to make a choice yet?

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Christopher Hicks
2005-01-23 14:13:13 UTC
Permalink
But what happens at this stage, in the real world, is that no-one has time to
rip out the schema and start again, and lots of people have made local
modifications which would break anyway, and so we all grumble about how we
should have got it right the first time, and live with the bad
implementation.
Look at all of the database tweaks in checksetup.pl and tell me we're
prone to just live with it and not tweak it into shape over time.
While deciding on the correct approach should not delay the
implementation of custom fields indefinitely, I think that doing some
up-front design is a perfectly reasonable step on the way to
implementation.
The design has been out there for ages and now people are just ripping
apart one section - the database design's affect on performance - which
was actually done correctly already. If there's some other "up-front
design" that's needed, what is it?
--
</chris>

"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Gervase Markham
2005-01-24 22:53:31 UTC
Permalink
Post by Christopher Hicks
Look at all of the database tweaks in checksetup.pl and tell me we're
prone to just live with it and not tweak it into shape over time.
But the schema for custom fields is going to be a whole different order
of complexity. Switching between the two main proposals here isn't a
case of "drop this column, tweak the type of this table", it's a
wholesale internal rearrangement.
Post by Christopher Hicks
The design has been out there for ages and now people are just ripping
apart one section - the database design's affect on performance - which
was actually done correctly already. If there's some other "up-front
design" that's needed, what is it?
Maybe I haven't been paying attention; I've seen several designs flow
past, but I didn't realise one of them had been accepted as "the design".

Gerv
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Nick Barnes
2004-12-20 11:59:58 UTC
Permalink
On custom fields, I think that choosing a schema design for
performance reasons at this early stage is a classic example of
premature optimization.

It seems unlikely that any of the proposed schema designs [*] would
present insurmountable performance obstacles on current hardware with
the data volumes we're talking about (~2^20 bugs). Furthermore, most
of the work for custom fields is not in the schema design, and in fact
is totally independent of the schema design. As far as the central
functionality of custom fields is concerned, the data could be stored
in a shoe-box under my bed, or in the songs of a troupe of trained
Mesopotamian aardvarks [*2].

Plainly we should choose the simplest reasonable overall design and go
ahead and build a custom fields system and get it into the trunk. If
that then turns out to be too slow, or too big, or insufficiently
lemon-scented, we can *then* optimize (for instance, by retro-fitting
one of the other schema designs).

I could certainly imagine a release (2.22, say) with a somewhat slow
and clunky custom fields system. People who want custom fields could
then either (a) do without or (b) have slow custom fields. If they
choose (b) and are unhappy, *then* they will raise bugs, and in the
comments on those bugs we can revisit the schema design. Then 2.24 or
2.26 can have better faster whizzier custom fields.

For exactly analogous reasons, I think we should consider a first-cut
custom fields system which only provides a small number of custom
field types (maybe only one?) and layout options (a table below the
Keywords box?). Users will then raise bugs saying "no timestamp
custom fields", or "custom fields not where I want them" or "custom
fields not in 16-point DingBats", and we can fix those bugs and move
on.

Ditto a first cut in which adding or modifying a custom field requires
the administrator to *edit a file* (or even to *type some MySQL*). I
know that this puts me pretty far out on left field, but all my
industry experience has taught me the prime importance of evolutionary
delivery.

Let's do *something* [*4], then raise bugs describing all the myriad
ways in which it can be improved, then fix those bugs. "I don't like
the schema design, I think it should be like this" is not a bug. "It
is too slow" is a bug, but not one which can or should be raised until
someone actually has a Bugzilla which actually is too slow [*3].

Nick B

[*] I count three, at least:
fields-as-columns-in-bugs-table;
fields-as-columns-in-other-tables;
fields-as-data.

[*2] very rare.

[*3] As I understand it, Sean actually has a Bugzilla with his custom
fields extension in it, and presumably that isn't too slow.

[*4] No, this is not the fallacy: "Something must be done; this is
something; therefore this must be done". I'm not advocating doing
just anything; I'm advocating doing some small thing which works.
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
john fisher
2004-12-16 17:24:22 UTC
Permalink
"Scuse me for popping up from the lurker's section.
I have just done my second customization of BZ, and I think you folks
have got the wrong tree here.
;>)
Defining fields is really trivial, and, it only has to be done once, or
occasionally; processing fields is hard, and the processing code will be
tweaked many times.
Paulo made some good engineering suggestions, starting with how-to-think
about the problem.
Maybe it would be a good idea to take a step back and look at the
proverbial forest?
just my two pennies.

John
--
John Fisher at Znyx Networks
Santa Barbara office

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2005-01-26 18:37:22 UTC
Permalink
Post by Myk Melez
*sigh*, ok, I'll do some testing next week when I get back from
vacation, although I really think the burden of proof should be on you,
given that you're the one suggesting we overturn thirty years of RDBMS
and relational database theory, design, and practical usage,
OK, I take exception to this. Am I really such a maverick? Most of the
people who have stated an opinion on the subject have expressed approval
of my design. Can we all be as naive as you say?
Post by Myk Melez
not to
mention six years of Bugzilla development techniques.
Custom fields are a completely new kind of beast. There is no precedent for
them in Bugzilla's development history. (Not in the core distribution,
anyway; some partial solutions are attached to bug 91037. I don't think
that's what you're talking about, though.) There's no a priori reason to
apply past Bugzilla development techniques to them.
Post by Myk Melez
Custom fields are exactly what relational database columns were designed
for; they fit perfectly into the column metaphor, just as the standard
Bugzilla fields do.
What is this "column metaphor"? Your design treats custom fields very
differently than standard fields, applying more of a "table metaphor".
Post by Myk Melez
They're modifiable via SQL statements just as
easily as the data within them is. And while Bugzilla doesn't modify
its schema very much today, there's nothing inherently more dangerous
about it doing so.
But much less elegant. To paraphrase Einstein, I think the schema ought to
be as simple as possible, but no simpler. Transmeta's Bugzilla installation
has 187 custom fields. A schema with in excess of 250 tables is not simple.
Imagine trying to manage a schema of that size with a visual tool!
Post by Myk Melez
That doesn't mean we should store all meta-data as data. We should use
the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
Yes, that is right, because all bugs share the same standard fields. That
condition is violated by custom fields. And, again, your proposal
implements custom fields very differently from the way standard fields are
implemented, anyway.
Post by Myk Melez
Post by Christopher Hicks
(4) Since queries involving custom fields are going to take a few
database hits to figure out what the field names so the query could be
written you end up with cases where 1 query turns into 4 queries. If
the database is across a WAN from the bugzilla instance the effect of
multiple queries where there were one will be more noticable.
Sure, more queries is slower. But FAC would use less queries overall,
and simpler ones at that.
Consider a simple query on a single custom field "foo" where the user
wants bugs where foo=bar. With FAC, we search for bugs where the "foo"
column contains "bar". With FAD, we look up the field ID for "foo" and
then search for bugs where the "field_id" column contains that ID and
the "value" column contains "bar" (or do it in one query with a join).
Querying against N custom fields results in joins against N tables in your
scheme. In mine, it results in joins against T(N), the number of distinct
datatypes among those N fields. The total number of joins among those
T(N) tables is N including repeated joins, but I suspect that it is still
cheaper to access fewer tables.

Consider also simply retrieving custom field data. For a bug with twenty
custom short string fields, your design would require SELECTs against
twenty different tables; mine requires only one.
Post by Myk Melez
Post by Christopher Hicks
Myk - it sounds like you're basing the decision on what way to go here
totally based on performance and I think there's a lot more that
should go into this decision.
To the contrary, my proposal is based on much more than performance. My
previous email was about performance only because that was Sean's
primary argument against it (he thought my proposal would be slower and
offered data to support his conclusion--I ran his tests myself and found
the opposite was true).
I haven't analyzed your tests in detail yet--it's been a hassle getting
MySQL 4 to peacefully coexist with my previous MySQL 3 system. (By the way,
if my design ignores thirty years of database theory, as you assert, why
does yours require a recent version of MySQL to best it?)

Can you describe exactly what was wrong with my test that it went from being
3-4 times better to being nearly an order of magnitude worse? I frankly
find that hard to believe.
Post by Myk Melez
1. that's what they're there for;
Custom fields are no different from standard fields in how they're
used (queried, displayed, updated, etc.), and columns were
designed for this express purpose when database systems were
developed. Given that they've been used to represent "fields" of
all kinds for decades, and that we've used them in Bugzilla to
represent the standard fields for over five years, they're a
mature and proven technology for doing what we want and likely to
be better than any new mechanism we come up with which represents
fields as data.
I posted to comp.databases yesterday seeking advice regarding the merits of
our two designs. The subject of the thread is "Best database schema for
object fields". To date, the only poster to offer substantial criticism has
stated "They both stink", but he did provide the useful information that the
model I implemented has a name, EAV, or "Entity-Attribute-Value". Armed
with that knowledge, I was able to Google several articles on the subject.
A few were highly critical of EAV, but most were more balanced, listing
advantages and disadvantages, and describing in what situations it's
appropriate. In all cases, though, the data model EAV was compared against
was the classic all-columns-in-one-table approach; I could find no example
of your one-table-per-field design. (The crotchety poster in comp.databases
described both of our designs as variants of EAV, but I can't really see how
that's true.) So, it's hard to accept your assertion that
one-field-per-table is "what columns are for". Can you refer me to any
systems that use your design?
Post by Myk Melez
2. then they work the same as standard fields;
Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.
If I'm not mistaken, the long-term plan is to migrate standard fields to
custom fields, so short-term discrepancies are not really relevant.
Besides--yet again!--your design treats custom fields very differently than
standard fields.
Post by Myk Melez
3. it makes them significantly faster;
Per my tests and standard database design theory, real columns are
much faster than data columns.
Again, I find this hard to believe. I suspect either some flaw in your test
program, or some unfair advantage in the limited nature of the tests.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Kevin Benton
2005-01-26 19:05:11 UTC
Permalink
I have pretty much kept my mouth shut regarding this debate lately until
now. I think it's getting a little out of hand. I think what really needs
to happen is for us to look at what we want out of the design, and more
importantly, to ask ourselves how will our proposal be used realistically?

Right now, I can tell you that I am really beating up MySQL 3.23.x doing
standard queries against the tables given with my statuscount.cgi. I was
able to flood MySQL to the point where it took over a minute to get results
back from a query that basically asked - show me a list of bugs that have
been opened but never had a status change in it. Until I added some new
indexes, that report was taking far too long to make it usable.

Now, we're talking about having the ability to add custom fields. This is a
good thing, but let's be sure we do it wisely. No matter which method we
choose, it needs to be easy to use, understand, and perform without a
serious hit to the DB. To do that - we must consider some kinds of custom
fields that might be created and how that creation would happen. If it's
something that can be handled without adding new code, then we should expect
ours to handle optimization reasonably. If not, we should guide admins. on
how to properly optimize after adding new fields.

Conceptually, for me, it's a lot easier to understand fields as columns than
fields as rows. Why? A table contains a record. That record contains
fields. That's traditional design. Creating a table for each field simply
doesn't make sense to me because the fields are no longer directly related
to one another. Understanding how field A relates to field B relates to
field C can then become a real nightmare.

If we have Bugzilla administer the fields itself, there are optimization
considerations. Because it's nearly impossible for us to figure out ahead
of time what fields will be added and how they will relate to each other, we
must also consider how to implement some kind of optimization aid on top of
it as well. So, if new field 1 relates to existing field A, and existing
field B, the admin should be able to tell Bugzilla that they're related and
have it add the indexes.

Please note that I am not saying that this is how it should be done. As an
administrator, developer, and report writer, I want to make sure that
anything we do does not kill performance while obtaining some form of
elegance or simplicity in coding. We have to find the best middle ground
where performance is excellent, but code maintenance is reasonably easy as
well.
-----Original Message-----
On Behalf Of Sean McAfee
Sent: Wednesday, January 26, 2005 11:37 AM
Subject: Re: Custom fields schema
Post by Myk Melez
not to
mention six years of Bugzilla development techniques.
Custom fields are a completely new kind of beast. There is no precedent for
them in Bugzilla's development history. (Not in the core distribution,
anyway; some partial solutions are attached to bug 91037. I don't think
that's what you're talking about, though.) There's no a priori reason to
apply past Bugzilla development techniques to them.
Post by Myk Melez
Custom fields are exactly what relational database columns were designed
for; they fit perfectly into the column metaphor, just as the standard
Bugzilla fields do.
What is this "column metaphor"? Your design treats custom fields very
differently than standard fields, applying more of a "table metaphor".
Post by Myk Melez
They're modifiable via SQL statements just as
easily as the data within them is. And while Bugzilla doesn't modify
its schema very much today, there's nothing inherently more dangerous
about it doing so.
But much less elegant. To paraphrase Einstein, I think the schema ought to
be as simple as possible, but no simpler. Transmeta's Bugzilla installation
has 187 custom fields. A schema with in excess of 250 tables is not simple.
Imagine trying to manage a schema of that size with a visual tool!
Post by Myk Melez
That doesn't mean we should store all meta-data as data. We should use
the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
Yes, that is right, because all bugs share the same standard fields. That
condition is violated by custom fields. And, again, your proposal
implements custom fields very differently from the way standard fields are
implemented, anyway.
Post by Myk Melez
Post by Christopher Hicks
(4) Since queries involving custom fields are going to take a few
database hits to figure out what the field names so the query could be
written you end up with cases where 1 query turns into 4 queries. If
the database is across a WAN from the bugzilla instance the effect of
multiple queries where there were one will be more noticable.
Sure, more queries is slower. But FAC would use less queries overall,
and simpler ones at that.
Consider a simple query on a single custom field "foo" where the user
wants bugs where foo=bar. With FAC, we search for bugs where the "foo"
column contains "bar". With FAD, we look up the field ID for "foo" and
then search for bugs where the "field_id" column contains that ID and
the "value" column contains "bar" (or do it in one query with a join).
Querying against N custom fields results in joins against N tables in your
scheme. In mine, it results in joins against T(N), the number of distinct
datatypes among those N fields. The total number of joins among those
T(N) tables is N including repeated joins, but I suspect that it is still
cheaper to access fewer tables.
Consider also simply retrieving custom field data. For a bug with twenty
custom short string fields, your design would require SELECTs against
twenty different tables; mine requires only one.
Post by Myk Melez
Post by Christopher Hicks
Myk - it sounds like you're basing the decision on what way to go here
totally based on performance and I think there's a lot more that
should go into this decision.
To the contrary, my proposal is based on much more than performance. My
previous email was about performance only because that was Sean's
primary argument against it (he thought my proposal would be slower and
offered data to support his conclusion--I ran his tests myself and found
the opposite was true).
I haven't analyzed your tests in detail yet--it's been a hassle getting
MySQL 4 to peacefully coexist with my previous MySQL 3 system. (By the way,
if my design ignores thirty years of database theory, as you assert, why
does yours require a recent version of MySQL to best it?)
Can you describe exactly what was wrong with my test that it went from being
3-4 times better to being nearly an order of magnitude worse? I frankly
find that hard to believe.
Post by Myk Melez
1. that's what they're there for;
Custom fields are no different from standard fields in how they're
used (queried, displayed, updated, etc.), and columns were
designed for this express purpose when database systems were
developed. Given that they've been used to represent "fields" of
all kinds for decades, and that we've used them in Bugzilla to
represent the standard fields for over five years, they're a
mature and proven technology for doing what we want and likely to
be better than any new mechanism we come up with which represents
fields as data.
I posted to comp.databases yesterday seeking advice regarding the merits of
our two designs. The subject of the thread is "Best database schema for
object fields". To date, the only poster to offer substantial criticism has
stated "They both stink", but he did provide the useful information that the
model I implemented has a name, EAV, or "Entity-Attribute-Value". Armed
with that knowledge, I was able to Google several articles on the subject.
A few were highly critical of EAV, but most were more balanced, listing
advantages and disadvantages, and describing in what situations it's
appropriate. In all cases, though, the data model EAV was compared against
was the classic all-columns-in-one-table approach; I could find no example
of your one-table-per-field design. (The crotchety poster in
comp.databases
described both of our designs as variants of EAV, but I can't really see how
that's true.) So, it's hard to accept your assertion that
one-field-per-table is "what columns are for". Can you refer me to any
systems that use your design?
Post by Myk Melez
2. then they work the same as standard fields;
Custom fields and standard fields are both used (queried,
displayed, updated) in much the same way, and using the same
technology to store them means we can use the same code in many
cases (and the same kind of code in others) to access and
manipulate them, making the source simpler, more robust, and
easier to develop.
If I'm not mistaken, the long-term plan is to migrate standard fields to
custom fields, so short-term discrepancies are not really relevant.
Besides--yet again!--your design treats custom fields very differently than
standard fields.
Post by Myk Melez
3. it makes them significantly faster;
Per my tests and standard database design theory, real columns are
much faster than data columns.
Again, I find this hard to believe. I suspect either some flaw in your test
program, or some unfair advantage in the limited nature of the tests.
--Sean
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Joel Peshkin
2005-01-26 19:21:54 UTC
Permalink
<snip>
Right now, I can tell you that I am really beating up MySQL 3.23.x doing
standard queries against the tables given with my statuscount.cgi. I was
able to flood MySQL to the point where it took over a minute to get results
back from a query that basically asked - show me a list of bugs that have
been opened but never had a status change in it. Until I added some new
indexes, that report was taking far too long to make it usable.
Now, we're talking about having the ability to add custom fields. This is a
good thing, but let's be sure we do it wisely. No matter which method we
choose, it needs to be easy to use, understand, and perform without a
serious hit to the DB. To do that - we must consider some kinds of custom
fields that might be created and how that creation would happen. If it's
something that can be handled without adding new code, then we should expect
ours to handle optimization reasonably. If not, we should guide admins. on
how to properly optimize after adding new fields.
For the record, Bugzilla will continue to be compatible with MySQL
3.23.x for a few more major releases but 3.23 is NOT the benchmark for
performance. Anyone using MySQL 3.23 will be presumed to be a small
site that doesn't care about optimum performance. Large
high-performance sites will be expected to be on MySQL 4.something.recent

When judging the resulting performance of a new feature, let's focus on
MySQL 4. Just don't break 3.23 altogether.

That said, the intention is to have BMO use custom fields as well, so it
cannot be a total dog.

-Joel

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2005-01-27 04:02:17 UTC
Permalink
Post by Kevin Benton
If we have Bugzilla administer the fields itself, there are optimization
considerations.
[snip]
Actually, with "fields-as-columns," we have an unknown number of
indexes required to make the one-index-per-query MySQL perform properly.

If we do "fields-as-rows," we create the indexes once with checksetup,
and never worry about them again.

-Max

P.S. It's polite to trim quotes down, so that you don't send hunks of
stuff that the list has already seen. :-)
--
Max Kanat-Alexander
Technical Support Manager, USA
Kerio Technologies, Inc.
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-2500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-01-27 13:54:20 UTC
Permalink
Post by Sean McAfee
Post by Myk Melez
*sigh*, ok, I'll do some testing next week when I get back from
vacation, although I really think the burden of proof should be on you,
given that you're the one suggesting we overturn thirty years of RDBMS
and relational database theory, design, and practical usage,
OK, I take exception to this. Am I really such a maverick? Most of the
people who have stated an opinion on the subject have expressed approval
of my design. Can we all be as naive as you say?
I didn't say you or your supporters were naive, nor do I think you are.
I only said that your suggestion is contrary to proven general and
Bugzilla-specific database design principles, and thus the burden is on
you to prove its superiority.

And I disagree that most people have approved of your design. I count
only Joel Peshkin, Maxwell Kanat-Alexander, Shane H. W. Travis,
Christopher Hicks as having expressed a clear opinion in support of your
solution in this thread, while Gervase Markham and Vlad Dascalu both
seem to oppose it, and John Fisher, Kevin Benton, Bradley Baetz, and
Nick Barnes have not expressed a clear position either way.

But even if all of those people supported your position, the sample size
is too small for it to prove that your solution is preferable in the
face of credible contrary evidence from the field. We're a small group,
and we could all be incorrect.
Post by Sean McAfee
Custom fields are a completely new kind of beast. There is no precedent for
them in Bugzilla's development history. (Not in the core distribution,
anyway; some partial solutions are attached to bug 91037. I don't think
that's what you're talking about, though.) There's no a priori reason to
apply past Bugzilla development techniques to them.
Custom fields aren't very different from standard fields, and a number
of our standard fields will become custom fields or reuse custom fields
code once it's done. But even if they weren't similar, there is plenty
of precedent for them, as installations have been adding custom fields
for years, often with real columns.
Post by Sean McAfee
What is this "column metaphor"? Your design treats custom fields very
differently than standard fields, applying more of a "table metaphor".
My design uses real columns to represent fields, accounting for sparsity
variance by putting dense columns into the bugs table (like the standard
"severity" field, which lives in the bugs.bug_severity column) and
sparse columns into their own tables (like the standard "duplicate of
bug #" field, which lives in the duplicates.dupe_of column). In both
cases, however, each custom field is represented by its own unique
database column, so the term "column metaphor" is apropo.
Post by Sean McAfee
Post by Myk Melez
They're modifiable via SQL statements just as
easily as the data within them is. And while Bugzilla doesn't modify
its schema very much today, there's nothing inherently more dangerous
about it doing so.
But much less elegant. To paraphrase Einstein, I think the schema ought to
be as simple as possible, but no simpler. Transmeta's Bugzilla installation
has 187 custom fields. A schema with in excess of 250 tables is not simple.
Imagine trying to manage a schema of that size with a visual tool!
Elegance, in this case, truly seems to be in the eye of the beholder.
To my mind, FAC is more elegant because it uses the database system as
it was designed to be used, and that makes it simpler (although not too
simple to be useful). And while it might be difficult to manage a
schema with 250 tables using certain visual tools (f.e. an ER
diagrammer), it wouldn't be with others (f.e. a list representation of
tables). Plus, at least it's possible to visualize FAC with standard
visual tools, while it's not possible for FAD. And visual overload in
ER diagrammers can be overcome by limiting the scope of tables visualized.
Post by Sean McAfee
Post by Myk Melez
That doesn't mean we should store all meta-data as data. We should use
the right tool for the job, as we have already done with standard
fields, for which we rightly use columns.
Yes, that is right, because all bugs share the same standard fields. That
condition is violated by custom fields.
Actually some standard fields are used only by certain products on some
installations, and some installations never use certain standard
fields. That hasn't prevented those fields from serving Bugzilla well,
so it is no reason to throw that storage model out (although it's worth
tweaking it to store sparse fields in separate tables and converting
frequently unused fields to custom fields).
Post by Sean McAfee
And, again, your proposal
implements custom fields very differently from the way standard fields are
implemented, anyway.
Actually my proposal implements custom fields as columns within the bugs
table or within their own table, just as Bugzilla does today with
standard fields.
Post by Sean McAfee
Querying against N custom fields results in joins against N tables in your
scheme. In mine, it results in joins against T(N), the number of distinct
datatypes among those N fields. The total number of joins among those
T(N) tables is N including repeated joins, but I suspect that it is still
cheaper to access fewer tables.
My tests, which include the queries you designed to showcase the
performance of your approach, indicate otherwise.
Post by Sean McAfee
Consider also simply retrieving custom field data. For a bug with twenty
custom short string fields, your design would require SELECTs against
twenty different tables; mine requires only one.
This isn't strictly accurate, since under my proposal only sparse fields
will live in separate tables. But even in a worst-case scenario,
retrieving data for a single bug is insignificantly expensive under both
designs.
Post by Sean McAfee
I haven't analyzed your tests in detail yet--it's been a hassle getting
MySQL 4 to peacefully coexist with my previous MySQL 3 system. (By the way,
if my design ignores thirty years of database theory, as you assert, why
does yours require a recent version of MySQL to best it?)
First of all, I said your design contradicts thirty years of database
design theory, not performance optimization principles. As I said from
the beginning, performance isn't my only consideration when choosing a
design. Nevertheless, I expect a design conformant with standard
database design principles to perform better, too, and my tests show
that it does, even on MySQL 3.x (see below).

Second of all, my design does not require a recent version of MySQL. I
suggested MySQL 4.x+ because 3.x contains an inefficient fulltext
indexer, so the fulltext indexes take too long to create on tables of
the size in the test. But you can run my tests on MySQL 3.x without
creating fulltext indexes, and the results are the same: FAC wins in
most cases, and with bigger margins.

(I've attached new versions of construct-tables.pl and
test-performance.pl which don't create/use fulltext indexes by default
(specify "--fulltext" on the command line to turn them back on) and work
with MySQL 3.x, along with some test results from the machine "myk".)

Third of all, MySQL 3.x search results are relatively unimportant,
because MySQL AB no longer recommends 3.x except in special cases (they
now recommend 4.1, two generations newer than 3.x), Bugzilla will
require MySQL 4.x in the near future (probably before a custom fields
implementation lands), and 4.x is already preferable for Bugzilla today
due to stability and fulltext performance.
Post by Sean McAfee
Can you describe exactly what was wrong with my test that it went from being
3-4 times better to being nearly an order of magnitude worse? I frankly
find that hard to believe.
I cannot. I took the exact queries you ran, fixed a number of syntax
errors in them that prevented them from running at all on my machines,
plugged them into an automated script that runs each query six times
(discarding the first result and averaging the rest), and flagged them
SQL_NO_CACHE to prevent the cache from skewing the results (but this
would not have affected comparisons between your tests and mine, since
MySQL version 3.x doesn't have a query cache).

I ran the script on two different machines (holepunch and megalon) and
reported their results. I subsequently ran the script on a third
machine (myk) and got similar results. I have since run the tests
against MySQL version 3.x on myk (skipping the fulltext index tests) and
got similar results (attached).

So I cannot explain the variance, but I'm confident in the reliability
of my testing code and results, and they're more apparently automated
and have been tested on more machines, so I think it's your results
which are errant. In any case, I'm happy to provide all comers with the
scripts for duplicating my tests, and I look forward to any additional
results (and tests, and refinements of tests) anyone else can provide.
Post by Sean McAfee
I posted to comp.databases yesterday seeking advice regarding the merits of
our two designs. The subject of the thread is "Best database schema for
object fields". To date, the only poster to offer substantial criticism has
stated "They both stink", but he did provide the useful information that the
model I implemented has a name, EAV, or "Entity-Attribute-Value".
http://groups-beta.google.com/group/comp.databases/browse_frm/thread/aa5eca674b5a2073/a38a196ace5ef6b5#a38a196ace5ef6b5

Actually Celko said that both models are EAV, but he bases that claim on
an error in your explanation of FAC which I've corrected in a followup.
In reality, FAC is a standard ER modeling approach.
Post by Sean McAfee
Armed
with that knowledge, I was able to Google several articles on the subject.
A few were highly critical of EAV, but most were more balanced, listing
advantages and disadvantages, and describing in what situations it's
appropriate.
I did a similar search and found that EAV modeling has applicability in
some specialized scientific and medical problem domains which experience
extremely numerous, highly variable, and very sparse fields. It is,
however, rarely used otherwise, while FAC, which models dense attributes
as bug table columns and sparse attributes as columns in separate
tables, employs the standard ER modeling approach which is widely used
by the majority of relational databases, especially those like Bugzilla.
Post by Sean McAfee
In all cases, though, the data model EAV was compared against
was the classic all-columns-in-one-table approach; I could find no example
of your one-table-per-field design. (The crotchety poster in comp.databases
described both of our designs as variants of EAV, but I can't really see how
that's true.) So, it's hard to accept your assertion that
one-field-per-table is "what columns are for". Can you refer me to any
systems that use your design?
Bugzilla, Bonsai, and Despot, to name only some Mozilla apps. But of
course now I'm talking about the ER modeling approach outlined above,
not some one-table-per-field approach which I have never advocated.
Post by Sean McAfee
If I'm not mistaken, the long-term plan is to migrate standard fields to
custom fields, so short-term discrepancies are not really relevant.
Actually the long-term plan is to migrate some but not all standard
fields to custom fields (some fields cannot be custom because their
processing is too specialized, while others are common to [virtually]
all bug tracking and so should be standard), and which fields are
standard/custom can and will change over time (in both directions), so
discrepancies between the two will remain relevant in all terms.
Post by Sean McAfee
Post by Myk Melez
Per my tests and standard database design theory, real columns are
much faster than data columns.
Again, I find this hard to believe. I suspect either some flaw in your test
program, or some unfair advantage in the limited nature of the tests.
Perhaps. I doubt the flaw is in my test program, since it uses the same
function to run and time all queries, whether FAC or FAD. But the tests
it runs are indeed pretty limited (consisting only of your tests and a
few of my own), so it's quite possible for them to be unfairly
advantageous--in either direction. I welcome any improvement in them,
but I still think the onus should be on you to prove EAV superiority,
not on me to disprove it, given its nonstandard approach, and especially
given documentation on the web about its niche value for certain extreme
applications.

-myk
Kevin Benton
2005-01-28 18:08:21 UTC
Permalink
Myk – I’m concerned about some of the numbers I’m seeing from the result
times you posted. I saw a number of queries at 50+ and assuming that’s in
seconds, to me, that just doesn’t cut it. Please correct me if I'm wrong.

I need queries to run in under 1 second the first time they're run
consistently. For example, in Real, on 1 str, you show the first query time
was 2.4 seconds, and the distributed query ran 56.2 seconds! 1 str bzlike
even in Real ran 17.8 seconds and from looking at the ref, it looks like
there was no index available.
From what I can see, the fields as columns performance was *much* better
than the fields as data performance (myk-result-3.html). This suggests to
me that this particular implementation is not going to work here. I'm not
saying that a FAR method can not be developed that will perform within
expectations, but from the data presented, this particular method doesn't
look like it will suit my needs.

I am writing statuscount.cgi - a completely new reporting package for
Bugzilla to show my management the information they need out of Bugzilla to
help them effectively direct development. I expect this report will be run
about one to five times a week. This report currently executes eight
distinct queries of which six are complex. So, if the queries are running
an average of 50+ seconds each first-time, it's going to take on the order
of five to six minutes for the manager to get the information they're
looking for. That will prevent my managers from using my reports. On the
other hand, if my report takes less than ten seconds, I will be doing okay.
When I'm reasonably close to being done with this reporting software, I will
release it to the Bugzilla community. I've already got at least two non-AMD
testers evaluating it for me now.

Granted, I'm assuming that we're migrating to handling all the fields in a
unified way regardless of whether or not they're legacy or new. Again, as I
said, please correct me if I'm wrong. I just don't see this particular
implementation as viable. Again, I really don't care which way we go - FAR
or FAC as long as it is easy to understand, performs within expectations,
and is scalable. It seems to me that FAC is easier to grasp in my mind, but
if you or anyone else can make a solid case for FAR, great!

---
Kevin Benton
Perl/Bugzilla Developer
Advanced Micro Devices
 
The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management. 
This communication may contain sensitive and/or confidential and/or
proprietary information.  Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices.  This
communication is for the intended recipient(s) only.  If you have received
this communication in error, please notify the sender, then destroy any
remaining copies of this communication.
 




-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Luis Villa
2005-01-28 18:21:24 UTC
Permalink
Post by Kevin Benton
I am writing statuscount.cgi - a completely new reporting package for
Bugzilla to show my management the information they need out of Bugzilla to
help them effectively direct development. I expect this report will be run
about one to five times a week. This report currently executes eight
distinct queries of which six are complex. So, if the queries are running
an average of 50+ seconds each first-time, it's going to take on the order
of five to six minutes for the manager to get the information they're
looking for. That will prevent my managers from using my reports. On the
other hand, if my report takes less than ten seconds, I will be doing okay.
When I'm reasonably close to being done with this reporting software, I will
release it to the Bugzilla community. I've already got at least two non-AMD
testers evaluating it for me now.
(1) We've found that for excessively complex queries, generating them
in cron and dumping them as HTML is useful. That is just the
unfortunate reality at times.

(2) GNOME would love to help you evaluate it as well :)

Luis
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Daniel Berlin
2005-01-28 18:26:25 UTC
Permalink
Post by Luis Villa
(1) We've found that for excessively complex queries, generating them
in cron and dumping them as HTML is useful. That is just the
unfortunate reality at times.
However, if you remember correctly, one of gnome's expensive reports, the
weekly status summary, was slow because of deficiencies in mysql's
query optimizer.
I changed the query ever so slightly to convince mysql to do it the right
way, and it became roughly 10 times faster.

--Dan
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Kevin Benton
2005-01-28 21:28:21 UTC
Permalink
Post by Luis Villa
(1) We've found that for excessively complex queries, generating them
in cron and dumping them as HTML is useful. That is just the
unfortunate reality at times.
For the queries I'm doing, it isn't acceptable to Cron the queries. And if
I'm taking 60 seconds to run a query at ~3K records, how will that scale
when it's 100K+ records? 1M+ records?

---
Kevin Benton
Perl/Bugzilla Developer
Advanced Micro Devices

The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information. Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices. This
communication is for the intended recipient(s) only. If you have received
this communication in error, please notify the sender, then destroy any
remaining copies of this communication.
Post by Luis Villa
-----Original Message-----
On Behalf Of Luis Villa
Sent: Friday, January 28, 2005 11:21 AM
Subject: Re: Custom fields schema
Post by Kevin Benton
I am writing statuscount.cgi - a completely new reporting package for
Bugzilla to show my management the information they need out of Bugzilla
to
Post by Kevin Benton
help them effectively direct development. I expect this report will be
run
Post by Kevin Benton
about one to five times a week. This report currently executes eight
distinct queries of which six are complex. So, if the queries are
running
Post by Kevin Benton
an average of 50+ seconds each first-time, it's going to take on the
order
Post by Kevin Benton
of five to six minutes for the manager to get the information they're
looking for. That will prevent my managers from using my reports. On
the
Post by Kevin Benton
other hand, if my report takes less than ten seconds, I will be doing
okay.
Post by Kevin Benton
When I'm reasonably close to being done with this reporting software, I
will
Post by Kevin Benton
release it to the Bugzilla community. I've already got at least two
non-AMD
Post by Kevin Benton
testers evaluating it for me now.
(1) We've found that for excessively complex queries, generating them
in cron and dumping them as HTML is useful. That is just the
unfortunate reality at times.
(2) GNOME would love to help you evaluate it as well :)
Luis
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2005-01-28 23:56:22 UTC
Permalink
And if I'm taking 60 seconds to run a query at ~3K records, how will
that scale when it's 100K+ records? 1M+ records?
Well, thankfully, with indexes, it will scale very well. The only
problem with statuscount.cgi performance was indexes, and once we
resolved that, it ran pretty fast, as I recall. :-)

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-4500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Kevin Benton
2005-01-31 15:54:35 UTC
Permalink
Yes, however, MySQL (in 3.x) apparently won't use more than one index per
table. Since there's an awful lot of data to go through to get to the point
where I can extract the necessary information, I just don't see how FAR will
work at this point without being able to sub-index. Again, I'm glad to be
proven wrong.

---
Kevin Benton
Perl/Bugzilla Developer
Advanced Micro Devices

The opinions stated in this communication do not necessarily reflect the
view of Advanced Micro Devices and have not been reviewed by management.
This communication may contain sensitive and/or confidential and/or
proprietary information. Distribution of such information is strictly
prohibited without prior consent of Advanced Micro Devices. This
communication is for the intended recipient(s) only. If you have received
this communication in error, please notify the sender, then destroy any
remaining copies of this communication.
-----Original Message-----
On Behalf Of Max Kanat-Alexander
Sent: Friday, January 28, 2005 4:56 PM
Subject: statuscount.cgi performance (WAS Re: Custom fields schema)
And if I'm taking 60 seconds to run a query at ~3K records, how will
that scale when it's 100K+ records? 1M+ records?
Well, thankfully, with indexes, it will scale very well. The only
problem with statuscount.cgi performance was indexes, and once we
resolved that, it ran pretty fast, as I recall. :-)
-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-4500
Fax: (408) 496-6902
http://www.kerio.com/support.html
-
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Joel Peshkin
2005-01-31 16:02:19 UTC
Permalink
Post by Kevin Benton
Yes, however, MySQL (in 3.x) apparently won't use more than one index per
table. Since there's an awful lot of data to go through to get to the point
where I can extract the necessary information, I just don't see how FAR will
work at this point without being able to sub-index. Again, I'm glad to be
proven wrong.
We have a general consensus that sites with an interest in DB
Performance should be on MySQL 4.something. Does the same concern apply
there?

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Max Kanat-Alexander
2005-02-01 01:04:35 UTC
Permalink
Post by Kevin Benton
Yes, however, MySQL (in 3.x) apparently won't use more than one index per
table. Since there's an awful lot of data to go through to get to the point
where I can extract the necessary information, I just don't see how FAR will
work at this point without being able to sub-index. Again, I'm glad to be
proven wrong.
It'll work pretty easily. You just have to create the right
multi-column indexes. Thankfully, with just one table, that's not a
problem.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2005-01-28 20:50:06 UTC
Permalink
From what I can see, the fields as columns performance was *much* better
than the fields as data performance (myk-result-3.html). This suggests to
me that this particular implementation is not going to work here. I'm not
saying that a FAR method can not be developed that will perform within
expectations, but from the data presented, this particular method doesn't
look like it will suit my needs.
No query should ever take 50 seconds in reality unless the indexes are
off, or unless it actually has to return an immense amount of data that
must be read from the disk.

I'm pretty sure that we will have to require a Bugzilla with
transactions instead of table locking, though, to get ideal performance
from custom fields in either method (though it could be particularly
applicable to FAD).

-Max
--
Max Kanat-Alexander
Technical Support Manager, USA
Kerio Technologies, Inc.
2350 Mission College Blvd., Suite 400
Santa Clara, CA 95054
Phone: (408) 496-2500
Fax: (408) 496-6902
http://www.kerio.com/support.html


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-02-02 17:40:17 UTC
Permalink
Myk – I’m concerned about some of the numbers I’m seeing from the result
times you posted. I saw a number of queries at 50+ and assuming that’s in
seconds, to me, that just doesn’t cut it. Please correct me if I'm wrong.
The first run is often inaccurate for reasons that have little to do
with the queries themselves, so it's not a good indicator of how fast
queries can run. I think the average of second and subsequent runs are a
better indicator, even for queries you want to run right "the first time".
Granted, I'm assuming that we're migrating to handling all the fields in a
unified way regardless of whether or not they're legacy or new. Again, as I
said, please correct me if I'm wrong.
I don't think that'll happen in the near future under either proposal,
since many standard fields require special processing logic which
initial custom fields implementations are unlikely to support (and
probably shouldn't for rapid integration and iterative goodness). But in
the long-term there are some standard fields that don't need much
special processing, and custom fields are probably going to want to do
more special processing, so the code to handle the two kinds of fields
should converge over time.

-myk

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Sean McAfee
2005-01-28 19:56:37 UTC
Permalink
Post by Myk Melez
And I disagree that most people have approved of your design. I count
only Joel Peshkin, Maxwell Kanat-Alexander, Shane H. W. Travis,
Christopher Hicks as having expressed a clear opinion in support of your
solution in this thread, while Gervase Markham and Vlad Dascalu both
seem to oppose it, and John Fisher, Kevin Benton, Bradley Baetz, and
Nick Barnes have not expressed a clear position either way.
As I said, "most of the people who have stated an opinion".
Post by Myk Melez
Custom fields aren't very different from standard fields, and a number
of our standard fields will become custom fields or reuse custom fields
code once it's done. But even if they weren't similar, there is plenty
of precedent for them, as installations have been adding custom fields
for years, often with real columns.
Sure, but that's just because it's the quickest way to hack on additional
fields. No one who only wanted a few more fields would seriously consider
going to the lengths I have to design and implement a new field
architecture. This precedent is in no way an argument for FAC over FAD.
Post by Myk Melez
My design uses real columns to represent fields, accounting for sparsity
variance by putting dense columns into the bugs table (like the standard
"severity" field, which lives in the bugs.bug_severity column) and
sparse columns into their own tables (like the standard "duplicate of
bug #" field, which lives in the duplicates.dupe_of column).
First of all, for installations at all similar to Transmeta's, sparse fields
are the norm, dense fields the exception:

mysql> create temporary table bugcount
-> select product_id, count(*) as num_bugs
-> from bugs
-> group by product_id;
Query OK, 9 rows affected (0.01 sec)
Records: 9 Duplicates: 0 Warnings: 0

mysql> create temporary table fieldcount
-> select product_id, count(*) as num_fields
-> from cf_membership
-> group by product_id;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Duplicates: 0 Warnings: 0

mysql> select a.product_id, a.num_bugs, b.num_fields
-> from bugcount as a, fieldcount as b
-> where a.product_id = b.product_id;
+------------+----------+------------+
| product_id | num_bugs | num_fields |
+------------+----------+------------+
| 2 | 1892 | 60 |
| 4 | 129 | 2 |
| 5 | 3 | 3 |
| 12 | 198 | 42 |
| 13 | 578 | 28 |
| 14 | 220 | 14 |
| 15 | 52 | 28 |
| 16 | 34 | 6 |
| 17 | 6 | 2 |
+------------+----------+------------+
9 rows in set (0.00 sec)

There is minimal sharing of fields between products:

mysql> select field_id, count(*) as num_products
-> from cf_membership
-> group by field_id
-> having count(*) >= 2;
+----------+--------------+
| field_id | num_products |
+----------+--------------+
| 64 | 2 |
| 310 | 2 |
+----------+--------------+
2 rows in set (0.00 sec)

A few more fields could conceivably have been shared, but it was more
convenient to import all products independently from our old Teamshare
database. We use *none* of the standard fields version, rep_platform,
priority, resolution, severity, file_loc, or op_sys.

So, our most heavily-populated product has only 61% of the bugs and 32% of
the fields. How should an administrator categorize this field or that as
"sparse" or "dense"? What if he didn't know in advance the relative
populations of the products?

Furthermore, the classification of a field as sparse or dense can change
over time. Consider the case of an installation that starts off with three
products, all of which share all custom fields. Per your design, the
administrator makes them "dense" fields, stored as columns in BUGS. Over
time seven more products are added, none of which require any of the
original fields. If all of the products have comparable numbers of bugs and
fields, the original fields' density shrinks from 100% to 30%. What then?

My design handles all such scenarios equally well, and requires neither
prescience nor hundreds of judgment calls on the part of the administrator.
Post by Myk Melez
Post by Sean McAfee
And, again, your proposal
implements custom fields very differently from the way standard fields are
implemented, anyway.
Actually my proposal implements custom fields as columns within the bugs
table or within their own table, just as Bugzilla does today with
standard fields.
What current standard fields live in their own tables? I just wrote a quick
script to list all two-column tables for which the name of one column is
"bug_id", and found only BUG_GROUP_MAP, CC, and KEYWORDS. All of these are
multi-valued "fields" that couldn't be easily made to live within BUGS
anyway.
Post by Myk Melez
Post by Sean McAfee
Can you refer me to any
systems that use your design?
Bugzilla, Bonsai, and Despot, to name only some Mozilla apps. But of
course now I'm talking about the ER modeling approach outlined above,
not some one-table-per-field approach which I have never advocated.
You have, as a possibility:

]Of course, we don't have to put all of these columns into the bugs
]table. We can do that, but we can also put each one into its own table
](so that bugs->custom field is a 1->0|1 relationship, and the database
]uses no more storage for the custom field than necessary), all of them
]into one other table, or some combination of these three, depending on
]what makes the most sense.

Also, your test program employed this type of schema, so I hope you can
understand my mistake.


--Sean
-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Maxwell Kanat-Alexander
2005-01-28 20:52:09 UTC
Permalink
Sean, by the way -- I'm not sure you noticed my comments on the custom
fields bug from looking over your implementation, or if you got my
direct email, but I am fully willing to review your patches, provided
that we break it down into MANY, MANY bugs that incrementally get us
from where we are *now* in Bugzilla to where Custom Fields wants to be.

In short, the custom fields bug should be a meta-bug.

-Max


-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Myk Melez
2005-02-02 17:40:05 UTC
Permalink
Post by Sean McAfee
Sure, but that's just because it's the quickest way to hack on additional
fields. No one who only wanted a few more fields would seriously consider
going to the lengths I have to design and implement a new field
architecture. This precedent is in no way an argument for FAC over FAD.
I would, because I expect it to save me time and energy in the long run,
even if it costs me more up-front. Bugzilla has long needed the rigor
and reusability of custom fields code, even for its standard fields. Of
course, I wouldn't be designing a new field architecture, I'd be writing
code that handles custom fields within the current one.
Post by Sean McAfee
So, our most heavily-populated product has only 61% of the bugs and 32% of
the fields. How should an administrator categorize this field or that as
"sparse" or "dense"? What if he didn't know in advance the relative
populations of the products?
It's not a tragedy if he guesses wrong, since the difference in
performance between separate-table FAC and one-table FAC isn't orders of
magnitude, and switching between the models is trivial.
Post by Sean McAfee
Furthermore, the classification of a field as sparse or dense can change
over time. Consider the case of an installation that starts off with three
products, all of which share all custom fields. Per your design, the
administrator makes them "dense" fields, stored as columns in BUGS. Over
time seven more products are added, none of which require any of the
original fields. If all of the products have comparable numbers of bugs and
fields, the original fields' density shrinks from 100% to 30%. What then?
If it matters at that point, then we move the field to a separate table,
which is a trivial operation. For the status whiteboard field, for
example, it would look something like this:

CREATE TABLE status_whiteboard(bug_id MEDIUMINT PRIMARY KEY,
status_whiteboard MEDIUMTEXT NOT NULL);
INSERT INTO status_whiteboard(bug_id, status_whiteboard) SELECT bug_id,
status_whiteboard FROM bugs where status_whiteboard IS NOT NULL;
UPDATE fielddefs SET [appropriate columns] WHERE name = 'status_whiteboard';
ALTER TABLE bugs DROP column status_whiteboard;
Post by Sean McAfee
My design handles all such scenarios equally well, and requires neither
prescience nor hundreds of judgment calls on the part of the administrator.
Mine requires no prescience and a configurable (by us) number of
judgement calls on the part of the administrator. We could decide to
remove all calls from her purvey by picking the specific model
ourselves. But I think we're better off involving the administrator,
who knows better than we do what her data looks like, and who is better
off able to customize its storage.
Post by Sean McAfee
What current standard fields live in their own tables?
duplicates.dupe_of, for one.
Post by Sean McAfee
Post by Myk Melez
Bugzilla, Bonsai, and Despot, to name only some Mozilla apps. But of
course now I'm talking about the ER modeling approach outlined above,
not some one-table-per-field approach which I have never advocated.
]Of course, we don't have to put all of these columns into the bugs
]table. We can do that, but we can also put each one into its own table
](so that bugs->custom field is a 1->0|1 relationship, and the database
]uses no more storage for the custom field than necessary), all of them
]into one other table, or some combination of these three, depending on
]what makes the most sense.
Also, your test program employed this type of schema, so I hope you can
understand my mistake.
Sure. In fact, I do suggest it as an option, just as I suggest putting
all fields into the bugs table, putting them all into one other table,
or grouping them into sets, with one table per set, as possibilities. I
consider these options strengths of the FAC approach, and prefer an
approach in which can be made available when useful.

-myk
Nick Barnes
2005-01-31 14:25:59 UTC
Permalink
Post by Myk Melez
And I disagree that most people have approved of your design. I count
only Joel Peshkin, Maxwell Kanat-Alexander, Shane H. W. Travis,
Christopher Hicks as having expressed a clear opinion in support of your
solution in this thread, while Gervase Markham and Vlad Dascalu both
seem to oppose it, and John Fisher, Kevin Benton, Bradley Baetz, and
Nick Barnes have not expressed a clear position either way.
FWIW, I like Sean's design. I think it's acceptable from a database
design standpoint, but mainly it's infinitely better than any
alternative because it represents actual working code right now, and
is therefore a realistic candidate for actually getting into the trunk
before we all die of old age.

I've said before that getting working code into the trunk, given a
half-way reasonable [*] design and some thought about extensibility
[*2], should trump everything, including a degree of performance.

Nick B

[*] Yes, I know that this whole discussion revolves around a
disagreement over what constitutes "reasonable" here.

[*2] to address the huge wishlist of custom-field features,
e.g. multiple datatypes, per-product fields, access controls,
multi-selects, interactive form design, makes the tea.

-
To view or change your list settings, click here:
<http://bugzilla.org/cgi-bin/mj_wwwusr?user=gcbd-developers-Uylq5CNFT+***@public.gmane.org>
Continue reading on narkive:
Loading...