string theory string theory
play

String Theory String Theory Thiago Macieira Thiago Macieira Qt - PowerPoint PPT Presentation

String Theory String Theory Thiago Macieira Thiago Macieira Qt Developer Days 2014 Qt Developer Days 2014 Who am I? 2 How many string classes does Qt have? Present Non-Qt QString std::string QLatjn1String


  1. String Theory String Theory Thiago Macieira Thiago Macieira Qt Developer Days 2014 Qt Developer Days 2014

  2. Who am I? 2

  3. How many string classes does Qt have? • Present • Non-Qt – QString – std::string – QLatjn1String – std::wstring – QByteArray – std::u16string / std::u32string – QStringLiteral (not a class!) – Character literals ("", L"", u"", U"") – QStringRef – QVector<char> • Past – QCString / Q3CString 3

  4. Character types, Character types, charsets, and codecs charsets, and codecs 4

  5. What’s a charset? 5

  6. Legacy encodings • 6-bit encodings • EBCDIC • UTF-1 6

  7. Examples modern encodings • Fixed width • Variable width • Stateful – US-ASCII (ANSI X.3.4-1986) – UTF-7 – Shifu-JIS – Most DOS and Windows – UTF-8, CESU-8 – EUC-JP codepages – UTF-16 – ISO-2022 – ISO-8859 family – GB-18030 – KOI8-R, KOI8-U – UCS-2 – UTF-32 / UCS-4 7

  8. Unicode & ISO/IEC 10646 • Unicode Consortjum - htup://unicode.org • Character maps, technical reports • The Common Locale Data Repository 8

  9. Codec • enCOder/DECoder • Usually goes through UTF-32 / UCS-4 9

  10. Codecs in your editor / IDE • Qt Creator: UTF-8 • Unix editors: locale¹ • Visual Studio: locale² or UTF-8 with BOM 1) modern Unix locale is usually UTF-8; it always is for OS X 2) Windows locale is almost never UTF-8 10

  11. Codecs in Qt • Built-in – Unicode: UTF-8, UTF-16, UTF-32 / UCS-4 • ICU support 11

  12. C++ character types Type Width Literals Encoding "Hello" arbitrary char 1 byte u8"Hello" UTF-8 wchar_t L"Hello" Platgorm-specifjc Platgorm-specifjc char16_t (C++11) At least 16 bits u"Hello" UTF-16 char32_t (C++11) At least 32 bits U"Hello" UTF-32 12

  13. Using non-basic characters in the source code • Ofuen, bad idea – Compiler-specifjc behaviour char msg[] = "How are you?\n" char msg[] = "How are you?\n" "¿Como estás?\n" "¿Como estás?\n" "Hvordan går det?\n" "Hvordan går det?\n" " お元気ですか? \n" " お元気ですか? \n" " Как поживаешь " Как поживаешь ?\n" ?\n" " Τι κάνεις " Τι κάνεις ;\n" ; ;\n" ; 14

  14. The fjve C and C++ charsets Universal – (Basic/Extended) Source character set Required Translatjon Source – (Basic/Extended) Executjon character set – (Basic/Extended) Executjon wide- Exec Exec character set wide – Translatjon character set – Universal character set But usually Wide = Translatjon = Universal Source = exec 15

  15. Writing non-English • C++11 Unicode strings return QStringLiteral(u"Hvordan g\u00E5r det?\n"); return QStringLiteral(u"Hvordan g\u00E5r det?\n"); • Regular escape sequences return QLatin1String("Hvordan g\xE5r det?\n") + return QLatin1String("Hvordan g\xE5r det?\n") + QString::fromUtf8("\xC2\xBFComo est\xC3\xA1s?"); QString::fromUtf8("\xC2\xBFComo est\xC3\xA1s?"); 16

  16. Qt support Qt support 17

  17. Recalling Qt string types • Main classes – QString – QLatjn1String – QByteArray • Other – QStringLiteral – QStringRef 18

  18. Qt string classes in detail Type Overhead Stores 8-bit clean? QByteArray 16 / 24 bytes char Yes QString QChar 16 / 24 bytes No (stores 16-bit!) QLatin1String Non-owning char N/A QStringLiteral Same as QString QStringRef QString* Non-owning No 19

  19. Remember your encoding while (file. canReadLine ()) { while (file. canReadLine ()) { QString line = file.readLine(); QString line = file.readLine(); doSomething(line); doSomething(line); } } 20

  20. QString implicit casting • Assumes that char* are UTF-8 – Constructor – operator const char*() const • Use QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII 21

  21. QByteArray • Any 8-bit data • Allocates heap, with 16/24 byte overhead qint64 read(char *data, qint64 maxlen); qint64 read(char *data, qint64 maxlen); QByteArray read(qint64 maxlen); QByteArray read(qint64 maxlen); QByteArray readAll(); QByteArray readAll(); qint64 readLine(char *data, qint64 maxlen); qint64 readLine(char *data, qint64 maxlen); QByteArray readLine(qint64 maxlen = 0); QByteArray readLine(qint64 maxlen = 0); virtual bool canReadLine () const; virtual bool canReadLine () const; 22

  22. QLatin1String • Latjn 1 (ISO-8859-1) content – Not to be confused with Windows 1252 or ISO-8859-15 • No heap bool startsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool endsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; 23

  23. QStringLiteral • Read-only, shareable UTF-16 data* • No heap, but 16/24 byte overhead # define QStringLiteral(str) \ # define QStringLiteral(str) \ # define QStringLiteral(str) \ # define QStringLiteral(str) \ ([]() -> QString { \ ([]() -> QString { \ ([]() -> QString { \ ([]() -> QString { \ enum { Size = sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \ enum { Size = sizeof(QT_UNICODE_LITERAL(str))/2 - 1 }; \ QStringPrivate holder = { \ QStringPrivate holder = { \ static const QStaticStringData<Size> qstring_literal = { \ static const QStaticStringData<Size> qstring_literal = { \ QArrayData::sharedStatic(), \ QArrayData::sharedStatic(), \ Q_STATIC_STRING_DATA_HEADER_INITIALIZER(Size), \ Q_STATIC_STRING_DATA_HEADER_INITIALIZER(Size), \ reinterpret_cast<ushort *>(const_cast<qunicodechar *>(QT_UNICODE_LITERAL(str))), \ reinterpret_cast<ushort *>(const_cast<qunicodechar *>(QT_UNICODE_LITERAL(str))), \ QT_UNICODE_LITERAL(str) }; \ QT_UNICODE_LITERAL(str) }; \ sizeof(QT_UNICODE_LITERAL(str))/2 - 1 \ sizeof(QT_UNICODE_LITERAL(str))/2 - 1 \ QStrringDataPtr holder = { qstring_literal.data_ptr() }; \ QStrringDataPtr holder = { qstring_literal.data_ptr() }; \ }; \ }; \ const QString s(holder); \ const QString s(holder); \ return QString(holder); \ return QString(holder); \ return s; \ return s; \ }()) }()) }()) }()) *) Depends on compiler support: best with C++11 Unicode strings 24

  24. Standard Library types • std::string – QString::fromStdString QString::toStdString – • std::wstring – QString::fromStdWString QString::toStdWString – • std::u16string (C++11) • std::u32string (C++11) 25

  25. C++11 (partial) support static QString fromUtf16(const char16_t *str, int size = -1) static QString fromUtf16(const char16_t *str, int size = -1) { return fromUtf16(reinterpret_cast<const ushort *>(str), size); } { return fromUtf16(reinterpret_cast<const ushort *>(str), size); } static QString fromUcs4(const char32_t *str, int size = -1) static QString fromUcs4(const char32_t *str, int size = -1) { return fromUcs4(reinterpret_cast<const uint *>(str), size); } { return fromUcs4(reinterpret_cast<const uint *>(str), size); } 26

  26. Which one is best? (1) bool startsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QString &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(const QStringRef &s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QLatin1String s, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; bool startsWith(QChar c, Qt::CaseSensitivity cs = Qt::CaseSensitive) const; return s.startsWith("Qt Dev Days"); return s.startsWith("Qt Dev Days"); return s.startsWith(QLatin1String("Qt Dev Days")); return s.startsWith(QLatin1String("Qt Dev Days")); return s.startsWith(QStringLiteral("Qt Dev Days")); return s.startsWith(QStringLiteral("Qt Dev Days")); 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend