Elgg metadata performance issue

Elgg is a open source social networking framework.

In my Elgg instance (1.7.2) I have some entities with a metadata called tags. One entity can have zero, one or many tags. Tags are strings.

I’m trying to retrieve the last updated files with the tags tagA, tagB, … tagI in a profile widget using this query like this:

$options = Array (
    "types" => "object",
    "subtypes" => "file",
    "limit" => 10,
    "full_view" => FALSE,
    "order_by" => "time_updated DESC",
    "pagination" => FALSE,
    "metadata_name_value_pairs" => Array (
                Array("name" => tags, "value" => "tagA", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagB", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagC", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagD", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagE", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagF", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagG", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagH", "operands"=> "contains"),
                Array("name" => tags, "value" => "tagI", "operands"=> "contains")
    )
);
echo elgg_list_entities_from_metadata($options);

This is enough to put MySQL at 100% of processor use and make Elgg to stop.

I’m using phpMyAdmin to monitor the database and Wireshark to monitor the communication between Elgg and MySQL. I came to this SQL query:

SELECT COUNT(DISTINCT e.guid) AS total
FROM elgg_entities e
JOIN elgg_metadata n_table ON e.guid = n_table.entity_guid
JOIN elgg_metadata n_table1 ON e.guid = n_table1.entity_guid
JOIN elgg_metastrings msn1 ON n_table1.name_id = msn1.id
JOIN elgg_metastrings msv1 ON n_table1.value_id = msv1.id
JOIN elgg_metadata n_table2 ON e.guid = n_table2.entity_guid
JOIN elgg_metastrings msn2 ON n_table2.name_id = msn2.id
JOIN elgg_metastrings msv2 ON n_table2.value_id = msv2.id
JOIN elgg_metadata n_table3 ON e.guid = n_table3.entity_guid
JOIN elgg_metastrings msn3 ON n_table3.name_id = msn3.id
JOIN elgg_metastrings msv3 ON n_table3.value_id = msv3.id
JOIN elgg_metadata n_table4 ON e.guid = n_table4.entity_guid
JOIN elgg_metastrings msn4 ON n_table4.name_id = msn4.id
JOIN elgg_metastrings msv4 ON n_table4.value_id = msv4.id
JOIN elgg_metadata n_table5 ON e.guid = n_table5.entity_guid
JOIN elgg_metastrings msn5 ON n_table5.name_id = msn5.id
JOIN elgg_metastrings msv5 ON n_table5.value_id = msv5.id
JOIN elgg_metadata n_table6 ON e.guid = n_table6.entity_guid
JOIN elgg_metastrings msn6 ON n_table6.name_id = msn6.id
JOIN elgg_metastrings msv6 ON n_table6.value_id = msv6.id
JOIN elgg_metadata n_table7 ON e.guid = n_table7.entity_guid
JOIN elgg_metastrings msn7 ON n_table7.name_id = msn7.id
JOIN elgg_metastrings msv7 ON n_table7.value_id = msv7.id
JOIN elgg_metadata n_table8 ON e.guid = n_table8.entity_guid
JOIN elgg_metastrings msn8 ON n_table8.name_id = msn8.id
JOIN elgg_metastrings msv8 ON n_table8.value_id = msv8.id
JOIN elgg_metadata n_table9 ON e.guid = n_table9.entity_guid
JOIN elgg_metastrings msn9 ON n_table9.name_id = msn9.id
JOIN elgg_metastrings msv9 ON n_table9.value_id = msv9.id
WHERE (((msn1.string = 'tags'
         AND BINARY msv1.string = 'tagA'
         AND ((1 = 1)
              AND n_table1.enabled='yes'))
        AND (msn2.string = 'tags'
             AND BINARY msv2.string = 'tagB'
             AND ((1 = 1)
                  AND n_table2.enabled='yes'))
        AND (msn3.string = 'tags'
             AND BINARY msv3.string = 'tagC'
             AND ((1 = 1)
                  AND n_table3.enabled='yes'))
        AND (msn4.string = 'tags'
             AND BINARY msv4.string = 'tagD'
             AND ((1 = 1)
                  AND n_table4.enabled='yes'))
        AND (msn5.string = 'tags'
             AND BINARY msv5.string = 'tagE'
             AND ((1 = 1)
                  AND n_table5.enabled='yes'))
        AND (msn6.string = 'tags'
             AND BINARY msv6.string = 'tagF'
             AND ((1 = 1)
                  AND n_table6.enabled='yes'))
        AND (msn7.string = 'tags'
             AND BINARY msv7.string = 'tagG'
             AND ((1 = 1)
                  AND n_table7.enabled='yes'))
        AND (msn8.string = 'tags'
             AND BINARY msv8.string = 'tagH'
             AND ((1 = 1)
                  AND n_table8.enabled='yes'))
        AND (msn9.string = 'tags'
             AND BINARY msv9.string = 'tagI'
             AND ((1 = 1)
                  AND n_table9.enabled='yes'))))
  AND ((e.TYPE = 'object'
        AND e.subtype IN (1)))
  AND (e.site_guid IN (1))
  AND ((1 = 1)
       AND e.enabled='yes')

A special thanks to the SQL formatting service SQLformat.appspot.com.

When I try to retrieve up to 4 tags I can get a response fairly quickly but more than that the response time seems to exponentially grow.

Any clues?

The workaround I’ll try now is to retrieve entities by each tag and them sort by myself.

3 thoughts on “Elgg metadata performance issue

  1. Wow, this amount os relationships is VERY insane, that’s what kills MySQL! :( I know my answer could not apply but I suggest trying to use some NoSQL engine to solve your issue… Is it possible without much effort?

  2. Julio Viegas, the Elgg model allows me to create the applications without using SQL but it is build over a relational database (MySQL). I dont think change the engine is viable now. What I’m doing is use a hosting server that suppose to be optimized to the Elgg model.

  3. The simple answer is to not use that many metadata name/value pairs. If you’re using more than 2 or 3 you should probably refactor your code. The db interface layer doesn’t handle all the joins well because of the metastring normalization. You can see an example of how to denormalize in code at http://trac.elgg.org/changeset/8763

    FYI we do have plans both to improve the db interface and to denormalize the metastrings table in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *