Planet MySQL

Planet MySQL -
  1. It used to be the case where, in order to actually process data retrieved from the database using Connector/Node.js, you would have to resort to an API that required the use of both JavaScript callbacks and Promises. This was meant to provide more flexibility and control to the application developer and at the same time decrease the chance of buffering data unnecessarily. However this wasn’t useful for 99% of the use-cases and made simple tasks a little bit cumbersome. Also, the fact that it required using two different asynchronous constructs made it a little bit harder to grasp. To make matters worse, in order to consume operational metadata about the columns in the result set, you would have to provide an additional callback, making the whole thing spiral a bit out of control, particularly when there were multiple result sets involved. In that case, you needed to create a shared context between the two functions in order to map data and metadata for each column in each result set. Keep in mind that .execute() doesn’t return a promise, but rather receives a callback function to do your data processing of each individual row. This could be a bit annoying for you. Gabriela Ferrara Additionally, given the increasing pervasiveness of the MySQL Shell and the sheer number of examples and tutorials in the wild, some users found themselves shoehorning its synchronous JavaScript API in Node.js code which didn’t really work due to the asynchronous nature of the latter, leading to some confusion and a lot of annoyances. This has changed with the release of Connector/Node.js 8.0.18 (in September 2019), wherein the Result instance, resolved by the Promise returned by the execute() method, includes a whole new set of utilities that allow you to consume the result set data in a way similar to a pull-based cursor. That cursor is implemented by the fetchOne() and fetchAll() methods, which are pretty much self-explanatory, but in essence, allow you to consume, from memory, either a single item or all the items from the result set. Calling these methods will consequently free the memory space allocated by the client code. This constitutes a good middle ground between a non-buffered approach like the one based on callbacks, or a fully-buffered approach where the result set is kept in memory unless the application explicitly clears it via an additional API. There are now three different kinds of Results with a contextual interface that varies depending on whether you are fetching documents from a collection (DocResult), rows from a table (RowResult) or raw data via SQL (SqlResult). This is exactly how it is specified by the X DevAPI Result Set standard. The only difference is that, in Connector/Node.js, the execute() method is asynchronous, so you access the Result instance by handling the resulting Promise instead of it being directly returned when calling the method (like in other existing implementations). In the same way, as described by the standard, for the and Session.sql() APIs, besides the fetchOne() and fetchAll() methods, these interfaces are now providing additional methods such as getColumns() to process column metadata and nextResult() to iterate over multiple result sets. So, looking back at the use cases analyzed before, this is how you can, at the moment, start working even more effectively with result sets using Connector/Node.js. Document Store With the same myCollection collection under the mySchema schema which contains the following: [{ "_id": "1", "name": "foo" }, { "_id": "2", "name": "bar" }] You can now retrieve all documents in the collection with: const collection = session.getSchema('mySchema').getCollection('myCollection'); const result = await collection.find().execute(); console.log(result.fetchAll()); // [{ _id: '1', name: 'foo' }, { _id: '2', name: 'bar' }] // alternatively, fetching one document at a time const result = await collection.find().execute(); const docs = []; while (doc = result.fetchOne()) { docs.push(doc); } console.log(docs); // [{ _id: '1', name: 'foo' }, { _id: '2', name: 'bar' }] Regular Tables Working with tables means that besides column values, you can also access specific details about the column, such as its name, type, size, encoding, etc. Processing column metadata becomes a lot less confusing using the getColumns() method. So, with a myTable table under the same schema, which contains the following: +-----+-------+ | _id | name | +-----+-------+ | "1" | "foo" | | "2" | "bar" | +-----+-------+ You can, similarly to the document mode counterpart, retrieve all the rows from the table with: const table = session.getSchema('mySchema').getTable('myTable'); const result = await; console.log(result.fetchAll()); // [['1', 'foo'], ['2', 'bar']] // alternatively, fetching one row at a time const result = await; const rows = []; while (row = result.fetchOne()) { rows.push(row); } console.log(rows); // [['1', 'foo'], ['2', 'bar']] And you can retrieve details about each row in the table with: const columns = result.getColumns(); const names = => c.getColumnName()); console.log(names); // ['_id', 'name'] const charsets = => c.getCharacterSetName()); console.log(charsets); // ['utf8mb4', 'utf8mb4'] const collations = => c.getCollationName()); console.log(collations); // ['utf8mb4_0900_ai_ci', 'utf8mb4_0900_ai_ci'] Creating an object mapping each column name to its value (similar to result set items in document mode) is now as easy as: // the column "label" accounts for aliases const mapping = res.fetchAll() .map(row => { return row.reduce((res, value, i) => { return Object.assign({}, res, { [columns[i].getColumnLabel()]: value }) }, {}); }); console.log(mapping); // [{ _id: '2', name: 'bar' }, { _id: '1', name: 'foo' }] SQL As already mentioned, one of the biggest advantages of this new API is that it also condenses the process for working with multiple result sets. So, with a table like the one used before and a PROCEDURE such as: DELIMITER // CREATE PROCEDURE proc() BEGIN SELECT _id AS s1_c1, name AS s1_c2 FROM myTable; SELECT '3' as s2_c1, 'baz' AS s2_c2; END// You can easily iterate over all the result sets, without keeping any kind of shared state, like the following: const rows = []; const columns = []; const res = await session.sql('CALL proc').execute(); do { columns.push(res.getColumns().map(c => c.getColumnLabel())); rows.push(res.fetchAll()); } while (res.nextResult() && res.hasData()); console.log(rows); // [[['1', 'foo'], ['2', 'bar']], [['3', 'baz']]] console.log(columns); // [['s1_c1', 's1_c2'], ['s2_c1', 's2_c2']] In Retrospect This new result set API closes a huge gap with regards to X DevAPI platform compatibility and brings all the implementations closer to a point of becoming almost drop-in replacements for each other. In particular it capitalizes on the success of the MySQL Shell and introduces syntax way more similar to the one used by its JavaScript implementation, and provides a better framework for developers switching between the two environments. We believe it also leads to more readable and maintainable code while making a good compromise in terms of resource requirements, in particular with regards to memory usage. Make sure you give it a try and let us know what you think about it. Report any issues you have via our bug tracker using the Connector for Node.js category or go one step further and submit a pull request. If you want to learn more about Connector/Node.js and the X DevAPI, please check the following: Make sure you also join our community Slack and come hang around at the #connectors channel: MySQL Community on Slack
  2. In this post I’ll expand on the subject of my MySQL pre-FOSDEM talk: what dbadmins need to know and do, when upgrading from MySQL 5.7 to 8.0. I’ve already published two posts on two specific issues; in this article, I’ll give the complete picture. As usual, I’ll use this post to introduce tooling concepts that may be useful in generic system administration. The presentation code is hosted on a GitHub repository (including the the source files and the output slides in PDF format), and on Slideshare. Contents: Summary of issues, and scope Requirements The new default character set/collation: utf8mb4/utf8mb4_0900_ai_ci Summary Tooling: MySQL RLIKE How the charset parameters work String, and comparison, properties Collation coercion, and issues general <> 0900_ai Comparisons utf8_general_ci column <> literals Comparisons utf8_general_ci column <> columns Summary of the migration path The new collation doesn’t pad anymore Triggers Sort-of-related suggestion Behavior with indexes Consequences of the increase in (potential) size of char columns Information schema statistics caching GROUP BY not sorted anymore by default (+tooling) Schema migration tools support Obsolete Mac Homebrew default collation Modify the formula, and recompile the binaries Ignore the client encoding on handshake Good practice for (major/minor) upgrades: comparing the system variables Conclusion Summary of issues, and scope The following are the basic issues to handle when migrating: the new charset/collation utf8mb4/utf8mb4_0900_ai_ci; the trailing whitespace is handled differently; GROUP BY is not sorted anymore by default; the information schema is now cached (by default); incompatibility with schema migration tools. Of course, the larger the scale, the more aspects will need to be considered; for example, large-scale write-bound systems may need to handle: changes in dirty page cleaning parameters and design; (new) data dictionary contention; and so on. In this article, I’ll only deal with what can be reasonably considered the lowest common denominator of all the migrations. Requirements All the SQL examples are executed on MySQL 8.0. The new default character set/collation: utf8mb4/utf8mb4_0900_ai_ci Summary References: An in depth DBA’s guide to migrating a MySQL database from the utf8 to the utf8mb4 charset MySQL 8.0 Collations: The devil is in the details. New collations in MySQL 8.0.0 MySQL introduces a new collation - utf8mb4_0900_ai_ci. Why? Basically, it’s an improved version of the general_ci version - it supports Unicode 9.0, it irons out a few issues, and it’s faster. The collation utf8(mb4)_general_ci wasn’t entirely correct; a typical example is Å: -- Å = U+212B SELECT "sÅverio" = "saverio" COLLATE utf8mb4_general_ci; -- +--------+ -- | result | -- +--------+ -- | 0 | -- +--------+ SELECT "sÅverio" = "saverio"; -- Default (COLLATE utf8mb4_0900_ai_ci); -- +--------+ -- | result | -- +--------+ -- | 1 | -- +--------+ From this, you can also guess what ai_ci means: accent insensitive/case insensitive. So, what’s the problem? Legacy. Technically, utf8mb4 has been available in MySQL for a long time. At least a part of the industry started the migration long before, and publicly documented the process. However, by that time, only utf8mb4_general_ci was available. Therefore, a vast amount of documentation around suggests to move to such collation. While this is not an issue per se, is it a big issue when considering that the two collations are incompatible. Tooling: MySQL RLIKE For people who like (and frequently use) them, regular expressions are a fundamental tool. In particular when performing administration tasks (using them in an application for data matching is a different topic), they can streamline some queries, avoiding lengthy concatenations of conditions. In particular, I find it practical as a sophisticated SHOW <object> supplement. SHOW <object>, in MySQL, supports LIKE, however, it’s fairly limited in functionality, for example: SHOW GLOBAL VARIABLES LIKE 'character_set%' -- +--------------------------+-------------------------------------------------------------------------+ -- | Variable_name | Value | -- +--------------------------+-------------------------------------------------------------------------+ -- | character_set_client | utf8mb4 | -- | character_set_connection | utf8mb4 | -- | character_set_database | utf8mb4 | -- | character_set_filesystem | binary | -- | character_set_results | utf8mb4 | -- | character_set_server | utf8mb4 | -- | character_set_system | utf8 | -- | character_sets_dir | /home/saverio/local/mysql-8.0.19-linux-glibc2.12-x86_64/share/charsets/ | -- +--------------------------+-------------------------------------------------------------------------+ Let’s turbocharge it! Let’s get all the meaningful charset-related variables, but not one more, in a single swoop: SHOW GLOBAL VARIABLES WHERE Variable_name RLIKE '^(character_set|collation)_' AND Variable_name NOT RLIKE 'system|data'; -- +--------------------------+--------------------+ -- | Variable_name | Value | -- +--------------------------+--------------------+ -- | character_set_client | utf8mb4 | -- | character_set_connection | utf8mb4 | -- | character_set_results | utf8mb4 | -- | character_set_server | utf8mb4 | -- | collation_connection | utf8mb4_general_ci | -- | collation_server | utf8mb4_general_ci | -- +--------------------------+--------------------+ Nice. The first regex reads: “string starting with (^) either character_set or collation”, and followed by _. Note that if we don’t group character_set and collation (via (…)), the ^ metacharacter applies only to the first. How the charset parameters work Character set and collation are a very big deal, because changing them in this case requires to literally (in a literal sense 😉) rebuild the entire database - all the records (and related indexes) including strings will need to be rebuilt. In order to understand the concepts, let’s have a look at the MySQL server settings again; I’ll reorder and explain them. Literals sent by the client are assumed to be in the following charset: character_set_client (default: utf8mb4) after, they’re converted and processed by the server, to: character_set_connection (default: utf8mb4) collation_connection (default: utf8mb4_0900_ai_ci) The above settings are crucial, as literals are a foundation for exchanging data with the server. For example, when an ORM inserts data in a database, it creates an INSERT with a set of literals. When the database system sends the results, it sends them in the following charset: character_set_results (default: utf8mb4) Literals are not the only foundation. Database objects are the other side of the coin. Base defaults for database objects (e.g. the databases) use: character_set_server (default: utf8mb4) collation_server (default: utf8mb4_0900_ai_ci) String, and comparison, properties Some developers would define a string as a stream of bytes; this is not entirely correct. To be exact, a string is a stream of bytes associated to a character set. Now, this concept applies to strings in isolation. How about operations on sets of strings, e.g. comparisons? In a similar way, we need another concept: the “collation”. A collation is a set of rules that defines how strings are sorted, which is required to perform comparisons. In a database system, a collation is associated to objects and literal, both through system and specific defaults: a column, for example, will have its own collation, while a literal will use the default, if not specified. But when comparing two strings with different collations, how is it decided which collation to use? Enter the “Collation coercibility”. Collation coercion, and issues general <> 0900_ai Reference: Collation Coercibility in Expressions Coercibility is a property of collations, which defines the priority of collations in the context of a comparison. MySQL has seven coercibility values: 0: An explicit COLLATE clause (not coercible at all) 1: The concatenation of two strings with different collations 2: The collation of a column or a stored routine parameter or local variable 3: A “system constant” (the string returned by functions such as USER() or VERSION()) 4: The collation of a literal 5: The collation of a numeric or temporal value 6: NULL or an expression that is derived from NULL it’s not necessary to know them by heart, since their ordering makes sense, but it’s important to know how the main ones work in the context of a migration: how columns will compare against literals; how columns will compare against each other. What we want to know is what happens in the workflow of a migration, in particular, if we: start migrating the charset/collation defaults; then, we slowly migrate the columns. Comparisons utf8_general_ci column <> literals Let’s create a table with all the related collations: CREATE TABLE chartest ( c3_gen CHAR(1) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci, c4_gen CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, c4_900 CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci ); INSERT INTO chartest VALUES('ä', 'ä', 'ä'); Note how we insert characters in the Basic Multilingual Plane) (BMP, essentially, the one supported by utf8mb3) - we’re simulating a database where we only changed the defaults, not the data. Let’s compare with BMP utf8mb4: SELECT c3_gen = 'ä' `result` FROM chartest; -- +--------+ -- | result | -- +--------+ -- | 1 | -- +--------+ Nice; it works. Coercion values: column: 2 # => wins literal implicit: 4 More critical: we compare against a character in the Supplementary Multilingual Plane (SMP, essentially, one added by utf8mb4), with explicit collation: SELECT c3_gen = '🍕' COLLATE utf8mb4_0900_ai_ci `result` FROM chartest; -- +--------+ -- | result | -- +--------+ -- | 0 | -- +--------+ Coercion values: column: 2 literal explicit: 0 # => wins MySQL converts the first value and uses the explicit collation. Most critical: compare against a character in the SMP, without implicit collation: SELECT c3_gen = '🍕' `result` FROM chartest; ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '=' WAT!! Weird? Well, this is because: column: 2 # => wins literal implicit: 4 MySQL tries to coerce the charset/collation to the column’s one, and fails! This gives a clear indication to the migration: do not allow SMP characters in the system, until the entire dataset has been migrated. Comparisons utf8_general_ci column <> columns Now, let’s see what happens between columns! SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c3_gen = b.c4_gen; -- +----------+ -- | COUNT(*) | -- +----------+ -- | 1 | -- +----------+ SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c3_gen = b.c4_900; -- +----------+ -- | COUNT(*) | -- +----------+ -- | 1 | -- +----------+ SELECT COUNT(*) FROM chartest a JOIN chartest b ON a.c4_gen = b.c4_900; ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '=' Ouch. BIG OUCH! Why? This is what happens to people who migrated, referring to obsolete documentation, to utf8mb4_general_ci - they can’t easily migrate to the new collation. Summary of the migration path The migration path outlined: update the defaults to the new charset/collation; don’t allow SMP characters in the application; gradually convert the tables/columns; now allow everything you want 😄. is viable for production systems. The new collation doesn’t pad anymore There’s another unexpected property of the new collation. Let’s simulate MySQL 5.7: -- Not exact, but close enough -- SELECT '' = _utf8' ' COLLATE utf8_general_ci; -- +---------------------------------------+ -- | '' = _utf8' ' COLLATE utf8_general_ci | -- +---------------------------------------+ -- | 1 | -- +---------------------------------------+ How does this work on MySQL 8.0?: -- Current (8.0): -- SELECT '' = ' '; -- +----------+ -- | '' = ' ' | -- +----------+ -- | 0 | -- +----------+ Ouch! Where does this behavior come from? Let’s get some more info from the collations (with a regular expression, of course 😉): SHOW COLLATION WHERE Collation RLIKE 'utf8mb4_general_ci|utf8mb4_0900_ai_ci'; -- +--------------------+---------+-----+---------+----------+---------+---------------+ -- | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | -- +--------------------+---------+-----+---------+----------+---------+---------------+ -- | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD | -- | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | -- +--------------------+---------+-----+---------+----------+---------+---------------+ Hmmmm 🤔. Let’s have a look at the formal rules from the SQL (2003) standard (section 8.2): 3) The comparison of two character strings is determined as follows: a) Let CS be the collation […] b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a space. In other words: the new collation does not pad. This is not a big deal. Just, before migrating, trim the data, and make 100% sure that new instances are not introduced by the application before the migration is completed. Triggers Triggers are fairly easy to handle, as they can be dropped/rebuilt with the new settings - just make sure to consider comparisons inside the trigger body. Sample of a trigger (edited): SHOW CREATE TRIGGER enqueue_comments_update_instance_event\G -- SQL Original Statement: CREATE TRIGGER `enqueue_comments_update_instance_event` AFTER UPDATE ON `comments` FOR EACH ROW trigger_body: BEGIN SET @changed_fields := NULL; IF NOT (OLD.description <=> NEW.description COLLATE utf8_bin AND CHAR_LENGTH(OLD.description) <=> CHAR_LENGTH(NEW.description)) THEN SET @changed_fields := CONCAT_WS(',', @changed_fields, 'description'); END IF; IF @changed_fields IS NOT NULL THEN SET @old_values := NULL; SET @new_values := NULL; INSERT INTO instance_events(created_at, instance_type, instance_id, operation, changed_fields, old_values, new_values) VALUES(NOW(), 'Comment',, 'UPDATE', @changed_fields, @old_values, @new_values); END IF; END -- character_set_client: utf8mb4 -- collation_connection: utf8mb4_0900_ai_ci -- Database Collation: utf8mb4_0900_ai_ci As you see, a trigger has associated charset/collation settings. This is because, differently from a statement, it’s not sent by a client, so it needs to keep its own settings. In the trigger above, dropping/recreating in the context of a system with the new default works, however, it’s not enough - there’s a comparison in the body! Conclusion: don’t forget to look inside the triggers. Or better, make sure you have a solid test suite 😉. Sort-of-related suggestion We’ve been long time users of MySQL triggers. They make a wonderful callback system. When a system grows, it’s increasingly hard (tipping into the unmaintainable) to maintain application-level callbacks. Triggers will never miss any database update, and with a logic like the above, a queue processor can process the database changes. Behavior with indexes Now that we’ve examined the compatibility, let’s examine the performance aspect. Indexes are still usable cross-charset, due to automatic conversion performed by MySQL. The point to be aware of is that the values are converted after being read from the index. Let’s create test tables: CREATE TABLE indextest3 ( c3 CHAR(1) CHARACTER SET utf8, KEY (c3) ); INSERT INTO indextest3 VALUES ('a'), ('b'), ('c'), ('d'), ('e'), ('f'), ('g'), ('h'), ('i'), ('j'), ('k'), ('l'), ('m'); CREATE TABLE indextest4 ( c4 CHAR(1) CHARACTER SET utf8mb4, KEY (c4) ); INSERT INTO indextest4 SELECT * FROM indextest3; Querying against a constant yields interesting results: EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM indextest4 WHERE c4 = _utf8'n'\G -- -> Aggregate: count(0) -- -> Filter: (indextest4.c4 = 'n') (cost=0.35 rows=1) -- -> Index lookup on indextest4 using c4 (c4='n') (cost=0.35 rows=1) MySQL recognizes that n is a valid utf8mb4 character, and matches it directly. Against a column with index: EXPLAIN SELECT COUNT(*) FROM indextest3 JOIN indextest4 ON c3 = c4; -- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ -- | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | -- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ -- | 1 | SIMPLE | indextest3 | NULL | index | NULL | c3 | 4 | NULL | 13 | 100.00 | Using index | -- | 1 | SIMPLE | indextest4 | NULL | ref | c4 | c4 | 5 | func | 1 | 100.00 | Using where; Using index | -- +----+-------------+------------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM indextest3 JOIN indextest4 ON c3 = c4\G -- -> Aggregate: count(0) -- -> Nested loop inner join (cost=6.10 rows=13) -- -> Index scan on indextest3 using c3 (cost=1.55 rows=13) -- -> Filter: (convert(indextest3.c3 using utf8mb4) = indextest4.c4) (cost=0.26 rows=1) -- -> Index lookup on indextest4 using c4 (c4=convert(indextest3.c3 using utf8mb4)) (cost=0.26 rows=1) MySQL is using the index, so all good. However, what’s the func? It simply tell us that the value used against the index is the result of a function. In this case, MySQL is converting the charset for us (convert(indextest3.c3 using utf8mb4)). This is another crucial consideration for a migration - indexes will still be effective. Of course, (very) complex queries will need to be carefully examined, but there are the grounds for a smooth transition. Consequences of the increase in (potential) size of char columns Reference: The CHAR and VARCHAR Types One concept to be aware of, although unlikely to hit real-world application, is that utf8mb4 characters will take up to 33% more. In storage terms, databases need to know what’s the maximum limit of the data they handle. This means that even if a string will take the same space both in utf8mb3 and utf8mb4, MySQL needs to know what’s the maximum space it can take. The InnoDB index limit is 3072 bytes in MySQL 8.0; generally speaking, this is large enough not to care. Remember!: [VAR]CHAR(n) refers to the number of characters; therefore, the maximum requirement is 4 * n bytes, but TEXT fields refer to the number of bytes. Information schema statistics caching Reference: The INFORMATION_SCHEMA STATISTICS Table Up to MySQL 5.7, information_schema statistics are updated real-time. In MySQL 8.0, statistics are cached, and updated only every 24 hours (by default). In web applications, this affects only very specific use cases, but it’s important to know if one’s application is subject to this new behavior (our application was). Let’s see the effects of this: CREATE TABLE ainc (id INT AUTO_INCREMENT PRIMARY KEY); -- On the first query, the statistics are generated. -- SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc'; -- +------------+----------------+ -- | TABLE_NAME | AUTO_INCREMENT | -- +------------+----------------+ -- | ainc | NULL | -- +------------+----------------+ INSERT INTO ainc VALUES (); SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc'; -- +------------+----------------+ -- | TABLE_NAME | AUTO_INCREMENT | -- +------------+----------------+ -- | ainc | NULL | -- +------------+----------------+ Ouch! The cached values are returned. How about SHOW CREATE TABLE? SHOW CREATE TABLE ainc\G -- CREATE TABLE `ainc` ( -- `id` int NOT NULL AUTO_INCREMENT, -- PRIMARY KEY (`id`) -- ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; This command is always up to date. How to update the statistics? By using ANALYZE TABLE: ANALYZE TABLE ainc; SELECT TABLE_NAME, AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'ainc'; -- +------------+----------------+ -- | TABLE_NAME | AUTO_INCREMENT | -- +------------+----------------+ -- | ainc | 2 | -- +------------+----------------+ There you go. Let’s find out the related setting: SHOW GLOBAL VARIABLES LIKE '%stat%exp%'; -- +---------------------------------+-------+ -- | Variable_name | Value | -- +---------------------------------+-------+ -- | information_schema_stats_expiry | 86400 | -- +---------------------------------+-------+ Developers who absolutely need to revert to the pre-8.0 behavior can set this value to 0. GROUP BY not sorted anymore by default (+tooling) Up to MySQL 5.7, GROUP BY’s result was sorted. This was unnecessary - optimization-seeking developers used ORDER BY NULL in order to spare the sort, however, accidentally or not, some relied on it. Those who relied on it are unfortunately required to scan the codebase. There isn’t a one-size-fits-all solution, and in this case, writing an automated solution may not be worth the time of manually inspecting the occurrences, however, this doesn’t prevent the Unix tools to help 😄 Let’s simulate a coding standard where ORDER BY is always on the line after GROUP BY, if present: cat > /tmp/test_groupby_1 << SQL GROUP BY col1 -- ends here GROUP BY col2 ORDER BY col2 GROUP BY col3 -- ends here GROUP BY col4 SQL cat > /tmp/test_groupby_2 << SQL GROUP BY col5 ORDER BY col5 SQL A basic version would be a simple grep scan with 1 line After each GROUP BY match: $ grep -A 1 'GROUP BY' /tmp/test_groupby_* /tmp/test_groupby_1: GROUP BY col1 /tmp/test_groupby_1- -- ends here -- /tmp/test_groupby_1: GROUP BY col2 /tmp/test_groupby_1- ORDER BY col2 -- /tmp/test_groupby_1: GROUP BY col3 /tmp/test_groupby_1- -- ends here -- /tmp/test_groupby_1: GROUP BY col4 -- /tmp/test_groupby_2: GROUP BY col5 /tmp/test_groupby_2- ORDER BY col5 However, with some basic scripting, we can display only the GROUP BYs matching the criteria: # First, we make Perl speak english: `-MEnglish`, which enables `$ARG` (among the other things). # # The logic is simple: we print the current line if the previous line matched /GROUP BY/, and the # current doesn't match /ORDER BY/; after, we store the current line as `$previous`. # perl -MEnglish -ne 'print "$ARGV: $previous $ARG" if $previous =~ /GROUP BY/ && !/ORDER BY/; $previous = $ARG' /tmp/test_groupby_* # As next step, we automatically open all the files matching the criteria, in an editor: # # - `-l`: adds the newline automatically; # - `$ARGV`: is the filename (which we print instead of the match); # - `unique`: if a file has more matches, the filename will be printed more than once - with # `unique`, we remove duplicates; this is optional though, as editors open each file(name) only # once; # - `xargs`: send the filenames as parameters to the command (in this case, `code`, from Visual Studio # Code). # perl -MEnglish -lne 'print $ARGV if $previous =~ /GROUP BY/ && !/ORDER BY/; $previous = $ARG' /tmp/test_groupby_* | uniq | xargs code There is another approach: an inverted regular expression match: # Match lines with `GROUP BY`, followed by a line _not_ matching `ORDER BY`. # Reference: # grep -zP 'GROUP BY .+\n((?!ORDER BY ).)*\n' /tmp/test_groupby_* This is, however, freaky, and as regular expressions in general, has a high risk of hairpulling (of course, this is up to the developer’s judgement). It will be the subject of a future article, though, because I find it is a very interesting case. Schema migration tools incompatibility This is an easily missed problem! Some tools may not support MySQL 8.0. There’s a known showstopper bug on the latest Gh-ost release, which prevents operations from succeeding on MySQL 8.0. As a workaround, one case use trigger-based tools, like pt-online-schema-change v3.1.1 or v3.0.x (but v3.1.0 is broken!) or Facebook’s OnlineSchemaChange. Obsolete Mac Homebrew default collation When MySQL is installed via Homebrew (as of January 2020), the default collation is utf8mb4_general_ci. There are a couple of solution to this problem. Modify the formula, and recompile the binaries A simple thing to do is to correct the Homebrew formula, and recompile the binaries. For illustrative purposes, as part of this solution, I use the so-called “flip-flop” operator, which is something frowned upon… by people not using it 😉. As one can observe in fact, for the target use cases, it’s very convenient. # Find out the formula location # $ mysql_formula_filename=$(brew formula mysql) # Out of curiosity, let's print the relevant section. # # Flip-flop operator (`<condition> .. <condition>`): it matches *everything* between lines matching two conditions, in this case: # # - start: a line matching `/args = /`; # - end: a line matching `/\]/` (a closing square bracket, which needs to be escaped, since it's a regex metacharacter). # $ perl -ne 'print if /args = / .. /\]/' "$(mysql_formula_filename)" args = %W[ -DFORCE_INSOURCE_BUILD=1 -DCOMPILATION_COMMENT=Homebrew -DDEFAULT_CHARSET=utf8mb4 -DDEFAULT_COLLATION=utf8mb4_general_ci -DINSTALL_DOCDIR=share/doc/#{name} -DINSTALL_INCLUDEDIR=include/mysql -DINSTALL_INFODIR=share/info -DINSTALL_MANDIR=share/man -DINSTALL_MYSQLSHAREDIR=share/mysql -DINSTALL_PLUGINDIR=lib/plugin -DMYSQL_DATADIR=#{datadir} -DSYSCONFDIR=#{etc} -DWITH_BOOST=boost -DWITH_EDITLINE=system -DWITH_SSL=yes -DWITH_PROTOBUF=system -DWITH_UNIT_TESTS=OFF -DENABLED_LOCAL_INFILE=1 -DWITH_INNODB_MEMCACHED=ON ] # Fix it! # $ perl -i.bak -ne 'print unless /CHARSET|COLLATION/' "$(mysql_formula_filename)" # Now recompile and install the formula # $ brew install --build-from-source mysql Ignore the client encoding on handshake An alternative solution is for the server to ignore the client encoding on handshake. When configured this way, the server will impose on the clients the the default character set/collation. In order to apply this solution, add character-set-client-handshake = OFF to the server configuration. Good practice for (major/minor) upgrades: comparing the system variables A very good practice when performing (major/minor) upgrades is to compare the system variables, in order to spot differences that may have an impact. The MySQL Parameters website gives a visual overview of the differences between versions. For example, the URL,8.0.19&diff=true shows the differences between the system variables of v5.7.29 and v8.0.19. Conclusion The migration to MySQL 8.0 at Ticketsolve has been one of the smoothest, historically speaking. This is a bit of a paradox, because we never had to rewrite our entire database for an upgrade, however, with sufficient knowledge of what to expect, we didn’t hit any significant bump (in particular, nothing unexpected in the optimizer department, which is usually critical). Considering the main issues and their migration requirements: the new charset/collation defaults are not mandatory, and the migration can be performed ahead of time and in stages; the trailing whitespace just requires the data to be checked and cleaned; the GROUP BY clauses can be inspected and updated ahead of time; the information schema caching is regulated by a setting; Gh-ost may be missed, but in worst case, there are valid comparable tools. the conclusion is that the preparation work can be entirely done before the upgrade, and subsequently perform it with reasonable expectations of low risk. Happy migration 😄
  3. I may get a chance to speak about proper bugs processing for open source projects later this year, so I have to keep reviewing recent MySQL bugs to be ready for that. In my previous post in this series I listed some interesting MySQL bug reports created in December, 2019. Time to move on to January, 2020! Belated Happy New Year of cool MySQL Bugs!As usual I mostly care about InnoDB, replication and optimizer bugs and explicitly mention bug reporter by name and give link to his other active reports (if any). I also pick up examples of proper (or improper) reporter and Oracle engineers attitudes. Here is the list: Bug #98103 - "unexpected behavior while logging an aborted query in the slow query log".  Query that was killed while waiting for the table metadata lock is not only get logged, but also lock wait time is saved as query execution time. I'd like to highlight how bug reporter, Pranay Motupalli, used gdb to study what really happens in the code in this case. Perfect bug report! Bug #98113 - "Crash possible when load & unload a connection handler". The (quite obvious) bug was verified based on code review, but only after some efforts were spent by Oracle engineer on denial to accept the problem and its importance. This bug was reported by Fangxin Flou. Bug #98132 - "Analyze table leads to empty statistics during online rebuild DDL ". Nice addition to my collections! This bug with a nice and clear test case was reported by Albert Hu, who also suggested a fix. Bug #98139 - "Committing a XA transaction causes a wrong sequence of events in binlog". This bug reported by Dehao Wang was verified as a "documentation" one, but I doubt documenting current behavior properly is an acceptable fix. Bug reporter suggested to commit in the binary log first, for example. Current implementation that allows users to commit/rollback a XA transaction by using another connection if the former connection is closed or killed, is risky. A lot of arguing happened in comments in the process, and my comment asking for a clear quote from the manual:Would you be so kind to share some text from this page you mentioned: any other fine MySQL 8 manual page stating that XA COMMIT is NOT supported when executed from session/connection/thread other than those prepared the XA transaction? I am doing something wrong probably, but I can not find such text anywhere.was hidden. Let's see what happens to this bug report next. Bug #98211 - "Auto increment value didn't reset correctly.". Not sure what this bug reported by Zhao Jianwei has to do with "Data Types", IMHO it's more about DDL or data dictionary. Again, some sarcastic comments from Community users were needed to put work on this bug back on track... Bug #98220 - "with log_slow_extra=on Errno: info not getting updated correctly for error". This bug was reported by lalit Choudhary from Percona. Bug #98227 - "innodb_stats_method='nulls_ignored' and persistent stats get wrong cardinalities". I think category is wrong for this bug. It's a but in InnoDB's persistent statistics implementation, one of many. The bug was reported by Agustín G from Percona. Bug #98231 - "show index from a partition table gets a wrong cardinality value". Yet another by report by Albert Hu. that ended up as a "documentation" bug for now, even though older MySQL versions provided better cardinality estimations than MySQL 8.0 in this case (so this is a regression of a kind). I hope the bug will be re-classified and properly processed later. Bug #98238 - "I_S.KEY_COLUMN_USAGE is very slow". I am surprised to see such a bug in MySQL 8. According to the bug reporter, Manuel Mausz, this is also a kind of regression comparing to older MySQL version, where these queries used to run faster. Surely, no "regression" tag in this case was added. Bug #98284 - "Low sysbench score in the case of a large number of connections". This notable performance regression of MySQL 8 vs 5.7 was reported by zanye zjy. perf profiling pointed out towards ppoll() where a lot of time is spent. There is a fix suggested by Fangxin Flou (to use poll() instead), but the bug is still "Open". Bug #98287 - "Explanation of hash joins is inconsistent across EXPLAIN formats". This bug was reported by Saverio M and ended up marked as a duplicate of Bug #97299 fixed in upcoming 8.0.20. Use EXPLAIN FORMAT=TREE in the meantime to see proper information about hash joins usage in the plan. Bug #98288 - "xa commit crash lead mysql replication error". This bug report from Phoenix Zhang (who also suggested a patch) was declared a duplicate of Bug #76233 - "XA prepare is logged ahead of engine prepare" (that I've already discussed among other XA transactions bugs here). Bug #98324 - "Deadlocks more frequent since version 5.7.26". Nice regression bug report by Przemyslaw Malkowski from Percona, with additional test provided later by Stephen Wei . Interestingly enough, test results shared by Umesh Shastry show that MySQL 8.0.19 is affected in the same way as 5.7.26+, but 8.0.19 is NOT listed as one of versions affected. This is a mistake to fix, along with missing regression tag. Bug #98427 - "InnoDB FullText AUX Tables are broken in 8.0". Yet another regression in MySQL 8 was found by Satya Bodapati. Change in default collation for utf8mb4 character set caused this it seems. InnoDB FULLTEXT search was far from perfect anyway... The are clouds in the sky of MySQL bugs processing. To summarize:  Still too much time and efforts are sometimes spent on arguing with bug reporter instead of accepting and processing bugs properly. This is unfortunate. Sometimes bugs are wrongly classified when verified (documentation vs code bug, wrong category, wrong severity, not all affected versions are listed, ignoring regression etc). This is also unfortunate. Percona engineers still help to make MySQL better. There are some fixes in upcoming MySQL 8.0.20 that I am waiting for :) XA transactions in MySQL are badly broken (they are not atomic in storage engine + binary log) and hardly safe to use in reality.
  4. Recently one of our client approach Mydbops with Query slowness on a MySQL environment . They deployed the new code for generate the huge reports for the year end analytics data . After the deployment the queries were extremely slow and they struggled lot , then they approached us for the solution. After the analysis, their OLAP database as expected it was IO bound with 100% disk IOPS utilised during the report generation. So, the queries were starving for the Disk IO slows the process . Problem statement : Reports are majorly focused on two larger log tables ( emp_Report_model , emp_details ) . The report generator (procedure) is using the count(*) statement to stimulate the aggregated data on each call. It is required for their business purpose . Count(*) is terribly slow in MySQL ( Using MySQL 5.7 ) as it needs to count all the rows in the table . ( MySQL 8.0 has Innodb parallel read threads that can make count(*) faster ) MySQL INDEX can’t help as we are aggregating the complete data ( 90% of data on each call ) the queries will be a Full Table Scan (FTS). Then on further analysis it is found it is only a INSERT workload on those tables. There is no UPDATE’s or DELETE’s on those tables . we proposed a solution to overcome the problem with the help of Clickhouse and migrating the data to Clickhouse. What is Clickhouse ? ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. Clickhouse Website The major limitation on MySQL to Clickhouse replication is we can only apply the INSERT statements (append) from the MySQL. Clickhouse will not support for UPDATE’s and DELETE’s as a columnar database it makes sense. Clickhouse Installation : The installation is quite straight forward. The steps are available in Clickhouse official web site, yum install rpm-build epel-release curl -s | sudo bash yum install -y mysql-community-devel python34-devel python34-pip gcc python-devel libevent-devel gcc-c++ kernel-devel libxslt-devel libffi-devel openssl-devel python36 python36-devel python36-libs python36-tools Clickhouse Server yum install -y clickhouse-server clickhouse-client Clickhouse MySQL replication Library pip3 install clickhouse-mysql Clickhouse startup : [root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server statusclickhouse-server service is stopped[root@mydbopslabs192 ~]#[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server startStart clickhouse-server service: /etc/init.d/clickhouse-server: line 166: ulimit: open files: cannot modify limit: Operation not permittedPath to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/DONE[root@mydbopslabs192 ~]#[root@mydbopslabs192 ~]# /etc/init.d/clickhouse-server statusclickhouse-server service is running [root@mydbopslabs192 ~]# clickhouse-client ClickHouse client version Connecting to localhost:9000 as user default. Connected to ClickHouse server version 19.17.4 revision 54428. mydbopslabs192 show databases; SHOW DATABASES ┌─name────┐ │ default │ │ system │ └─────────┘ 2 rows in set. Elapsed: 0.003 sec. It is all set with installation , Next step i need to migrate the data from MySQL to Clickhouse and configure the replication for those tables . Data Migration from MySQL to Clickhouse : Step 1 ( Dump the Clickhouse based schema structure from MySQL ) : First thing i need to migrate the MySQL tables structure to Clickhouse . MySQL and Clickhouse having different data types . So, we cannot apply the same structure from MySQL to Clickhouse . The below document providing the neat comparison between MySQL and Clickhouse data types . Let us convert table structure from MySQL to Clickhouse using the clickhouse-mysql tool. [root@mydbopslabs192 ~]# clickhouse-mysql --src-host= --src-user=clickhouse --src-password=Click@321 --create-table-sql-template --with-create-database --src-tables=data_Analytics.emp_Report_model,data_Analytics.emp_details > data_Reports_Jan21st.sql 2020-01-21 09:03:40,150/1579597420.150730:INFO:Starting 2020-01-21 09:03:40,150/1579597420.150977:DEBUG:{'app': {'binlog_position_file': None, 'config_file': '/etc/clickhouse-mysql/clickhouse-mysql.conf', 'create_table_json_template': False, 2020-01-21 09:03:40,223/1579597420.223511:DEBUG:Connect to the database host= port=3306 user=clickhouse password=Click@321 db=data_Analytics 2020-01-21 09:03:40,264/1579597420.264610:DEBUG:Connect to the database host= port=3306 user=clickhouse password=Click@321 db=data_Analytics Dumping the table structure , [root@mydbopslabs192 ~]# less data_Reports_Jan12th.sql | grep CREATECREATE DATABASE IF NOT EXISTS data_Analytics;CREATE TABLE IF NOT EXISTS data_Analytics.emp_details (CREATE DATABASE IF NOT EXISTS data_Analytics;CREATE TABLE IF NOT EXISTS data_Analytics.emp_Report_model ( [root@mydbopslabs192 ~]# cat data_Reports_Jan12th.sql | head -n7 CREATE DATABASE IF NOT EXISTS data_Analytics; CREATE TABLE IF NOT EXISTS data_Analytics.emp_details ( WatchID Nullable(String), JavaEnable Nullable(Int32), Title Nullable(String), GoodEvent Nullable(Int32), EventTime Nullable(DateTime), Step 2 ( Import the schema structure into Clickhouse ) : [root@mydbopslabs192 ~]# clickhouse-client -mn < data_Reports_Jan12th.sql [root@mydbopslabs192 ~]# [root@mydbopslabs192 ~]# fg clickhouse-client mydbopslabs192 use data_Analytics USE data_Analytics Ok. 0 rows in set. Elapsed: 0.001 sec. mydbopslabs192 show tables; SHOW TABLES ┌─name─────────────┐ │ emp_Report_model │ │ emp_details │ └──────────────────┘ 2 rows in set. Elapsed: 0.003 sec. Step 3 ( Migrating the data and keep replication sync ) : Before configure the replication , the MySQL server should be configured with the below variables . Mandatory MySQL settings : server-id = <your id>binlog_format = ROWbinlog_row_image = FULL Now, we can configure the replication in two ways , Migrate the existing data , then configure the replication Migrate the existing data and configure the replication in one step i) Migrate the existing data , then configure the replication : Commands to migrating the existing data [root@mydbopslabs192 ~]# clickhouse-mysql --src-host= --src-user=clickhouse --src-password=Click@321 --migrate-table --src-tables=data_Analytics.emp_Report_model --dst-host= --dst-schema data_Analytics --dst-table emp_Report_model --log-file=emp_Report_model.log [root@mydbopslabs192 ~]# less emp_Report_model.log | grep -i migra'migrate_table': True,'table_migrator': {'clickhouse': {'connection_settings': {'host': '',2020-01-21 11:04:57,744/1579604697.744533:INFO:List for migration:2020-01-21 11:04:57,744/1579604697.744947:INFO:Start migration data_Analytics.emp_Report_model2020-01-21 11:04:57,891/1579604697.891935:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model mydbopslabs192 select count(*) from data_Analytics.emp_Report_model; ┌─count()─┐ │ 8873898 │ └─────────┘ 1 rows in set. Elapsed: 0.005 sec. configuring the replication , [root@mydbopslabs192 ~]# clickhouse-mysql --src-host= --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host= --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --log-file=emp_Report_model_Replication.log 2020-01-21 11:22:18,974/1579605738.974186:INFO:CSVWriter() self.path=/tmp/csvpool_1579605738.9738157_d643efe5-5ae0-47df-8504-40f61f2c139f.csv 2020-01-21 11:22:18,976/1579605738.976613:INFO:CHCSWriter() connection_settings={'port': 9000, 'host': '', 'password': '', 'user': 'default'} dst_schema=data_Analytics dst_table=emp_Report_model 2020-01-21 11:22:18,976/1579605738.976936:INFO:starting clickhouse-client process 2020-01-21 11:22:19,160/1579605739.160906:INFO:['data_Analytics.emp_Report_model'] 2020-01-21 11:22:19,166/1579605739.166096:INFO:['data_Analytics.emp_Report_model'] 2020-01-21 11:22:19,170/1579605739.170744:INFO:['data_Analytics.emp_Report_model'] (END) ii) Migrate the existing data and configure the replication in one step : Here we need to define the flag –migrate-table with the replication command . [root@mydbopslabs192 ~]# clickhouse-mysql --src-host= --src-user=clickhouse --src-password=Click@321 --src-tables=data_Analytics.emp_Report_model --dst-host= --dst-schema data_Analytics --dst-table emp_Report_model --src-resume --src-wait --nice-pause=1 --log-level=info --csvpool --mempool-max-flush-interval=60 --mempool-max-events-num=1000 --pump-data --src-server-id=100 --migrate-table --log-file=emp_Report_model_replication_mig.log [root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i mig2020-01-21 11:27:53,263/1579606073.263505:INFO:List for migration:2020-01-21 11:27:53,263/1579606073.263786:INFO:Start migration data_Analytics.emp_Report_model2020-01-21 11:27:53,316/1579606073.316788:INFO:migrate_table. sql=SELECT WatchID,JavaEnable,Title,GoodEvent,EventTime,Eventdate,CounterID,ClientIP,ClientIP6,RegionID,UserID,CounterClass,OS,UserAgent,URL,Referer,URLDomain,RefererDomain,Refresh,IsRobot,RefererCategories,URLCategories,URLRegions,RefererRegions,ResolutionWidth,ResolutionHeight,ResolutionDepth,FlashMajor,FlashMinor,FlashMinor2,NetMajor,NetMinor,UserAgentMajor,UserAgentMinor,CookieEnable,JavascriptEnable,IsMobile,MobilePhone,MobilePhoneModel,Params,IPNetworkID,TraficSourceID,SearchEngineID,SearchPhrase,AdvEngineID,IsArtifical,WindowClientWidth,WindowClientHeight,ClientTimeZone,ClientEventTime,SilverlightVersion1,SilverlightVersion2,SilverlightVersion3,SilverlightVersion4,PageCharset,CodeVersion,IsLink,IsDownload,IsNotBounce,FUniqID,HID,IsOldCounter,IsEvent,IsParameter,DontCountHits,WithHash,HitColor,UTCEventTime,Age,Sex,Income,Interests,Robotness,GeneralInterests,RemoteIP,RemoteIP6,WindowName,OpenerName,HistoryLength,BrowserLanguage,BrowserCountry,SocialNetwork,SocialAction,HTTPError,SendTiming,DNSTiming,ConnectTiming,ResponseStartTiming,ResponseEndTiming,FetchTiming,RedirectTiming,DOMInteractiveTiming,DOMContentLoadedTiming,DOMCompleteTiming,LoadEventStartTiming,LoadEventEndTiming,NSToDOMContentLoadedTiming,FirstPaintTiming,RedirectCount,SocialSourceNetworkID,SocialSourcePage,ParamPrice,ParamOrderID,ParamCurrency,ParamCurrencyID,GoalsReached,OpenstatServiceName,OpenstatCampaignID,OpenstatAdID,OpenstatSourceID,UTMSource,UTMMedium,UTMCampaign,UTMContent,UTMTerm,FromTag,HasGCLID,RefererHash,URLHash,CLID,YCLID,ShareService,ShareURL,ShareTitle,IslandID,RequestNum,RequestTry FROM data_Analytics.emp_Report_model [root@mydbopslabs192 ~]# less emp_Report_model_replication_mig.log | grep -i process 2020-01-21 11:28:01,071/1579606081.071054:INFO:starting clickhouse-client process Validating count post inserting some records in MySQL (Source) mydbopslabs192 select count(*) from data_Analytics.emp_Report_model; ┌─count()─┐│ 8873900 │└─────────┘ MySQL to Clickhouse replication is working as expected . Performance comparison for OLAP workload ( MySQL vs Clickhouse ) : Count(*) in MySQL : mysql> select count(*) from emp_Report_model; 1 row in set (32.68 sec) Count(*) in clickhouse : mydbopslabs192 select count(*) from emp_Report_model; 1 rows in set. Elapsed: 0.007 sec. Aggregated query in MySQL : mysql> select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID; 292893 rows in set (1 min 2.61 sec) Aggregated query in Clickhouse : mydbopslabs192 select emp_Report_model.WatchID,emp_Report_model.JavaEnable,emp_Report_model.Title,emp_Report_model.RegionID from emp_Report_model inner join emp_details on emp_Report_model.WatchID=emp_details.WatchID and emp_Report_model.RegionID=emp_details.RegionID and emp_Report_model.UserAgentMajor=emp_details.UserAgentMajor where emp_Report_model.SocialSourcePage is not null and emp_details.FetchTiming != 0 order by emp_Report_model.WatchID; 292893 rows in set. Elapsed: 1.710 sec. Processed 9.37 million rows, 906.15 MB (7.75 million rows/s., 749.15 MB/s.) Yes, Clickhouse is performing very well with COUNT(*) and analytical queries . Query Model MySQL Clickhouse count(*) 33 seconds 0.1 seconds OLAP Query 63 seconds 1.7 seconds MySQL Vs ClickhouseThe above graph is just a pictorial representation of queries tested. Though Clickhouse excels in analytics workload it has it own limitations too. Now we have another happy customer at Mydbops who gets his analytics dashboard faster now. Featured image credits Stephen Dawson on Unsplash
  5. Galera is the best solution for High Availability, It is being used by many peoples world wide . Galera is doing synchronous replication ( really it is Certification based replication ) to keep update the data on group nodes . In this blog I have explained about “How the Galera replication works?” . For the better understanding, I have made an architecture diagram to describe the replication flow . I have also provided the explanation for the key words which has used in the architecture diagram . Architecture flow Diagram : What is writeset ? Writeset contains all changes made to the database by the transaction and append_key of the changed rows . What is append_key ? Append_key registers the key of the changed data by the transaction. The key for rows can be represented in three parts as DATABASE NAME, TABLE NAME, PRIMARY KEY . If the table don’t have the PRIMARY KEY, the HASH of the modified data will be the part of the writeset . What is Certification in Galera ? Certification in Galera will be performed to detect the conflicts and the data consistency among the group . It will be performed before the transaction comiit . What is CVV ( Central Certification Vector ) ? CVV is used to detect the conflcits . The modified keys will added in to the Central Certification Vector. If the added key is already part of the vector, then conflict resolution checks are triggered. Hope this blog will helps someone, who is working with Galera Cluster . I will be come up with my next blog soon . Thanks !!!