indexing - 如何生成(书籍)索引?

我需要为一本书创建一个索引。虽然乍一看这项任务很简单——按第一个字母对单词进行分组,然后对它们进行排序——但这个显而易见的解决方案仅适用于美国语言。然而,真正的词要复杂得多。参见 http://en.wikipedia.org/wiki/Collation :

The difference between computer-style numerical sorting and true alphabetical sorting becomes obvious in languages using an extended Latin alphabet. For example, the 29-letter alphabet of Spanish treats ñ as a basic letter following n, and formerly treated ch and ll as basic letters following c and l, respectively. Ch and ll are still considered letters, but are now alphabetized as two-letter combinations. (The new alphabetization rule was issued by the Royal Spanish Academy in 1994.) On the other hand, the digraph rr follows rqu as expected, both with and without the 1994 alphabetization rule. A numeric sort may order ñ incorrectly following z and treat ch as c + h, also incorrect when using pre-1994 alphabetization.

我试图找到一个现有的解决方案。

DocBook 样式表没有解决这个问题。

我找到的最佳匹配是 xindy ( http://xindy.sourceforge.net/ ),但是这个工具与 LaTeX 的联系太多了。

还有什么建议吗?

最佳答案

天真地,您可以检查文本中的每个单词并创建一个散列,使用这些单词作为键,并构建一个位置数组(页码?)作为值。

但索引通常比这更集中一些。

https://stackoverflow.com/questions/4397533/

相关文章:

wordpress - 将 Wordpress URL 路径转换为查询字符串

eclipse - 卸载eclipse插件报错

c# - 在 JSON.NET 中使用 LINQ 从字典创建 JProperty

c++ - 随机数生成的快速模数替换是什么?

maven-3 - mvn clean install + java.lang.NoClassDef

.net - .NET 资源文件字符串是否被保留?

asp.net - "Invalid use of response filter"压缩来自 IHt

ruby-on-rails - 如何在命名空间内部的 View 中呈现命名空间外部的 Rails 部

.net - ToString() 和调试器的字符串可视化工具

.net - NET 4.0 在 COM+ 中安装程序集加载额外的依赖项