Suds Library Required

Soap call using Suds library. String value is not required so i am not worry about it and siperianObjectUid is working as. The pressure on land for both new developments and in built-up areas is such that reducing the footprint of a SuDS scheme is a major factor in the design and delivery of SuDS. Components that intercept stormwater and slow the flow before it reaches a final attenuation or infiltration point reduce the amount of land required for the scheme. I’ll be using a Python SOAP library called Suds to talk to the Exchange server. There are a few catches, though. EWS doesn’t play nice with Suds, so we need a few patches and other modules to get it working well. Installing Suds. On Mac and Linux, this is trivially easy: run. Sudo pip install suds.

This web page will no longer be updated. Seethe author's web pagesfor further releases.

RLCSA [4, 3, 7]is a compressed suffix array implementation that has been optimized for highlyrepetitive text collections. Examples of such collections include version control dataand individual genomes. This implementation also serves as a testbed for many techniquesused with compressed suffix arrays.The most up-to-date description of RLCSA can be found in [7].

The current version includes experimental support for:

  • LCP information [5]
  • Distribution-aware sampling [6, 9]
  • Adaptive distribution-aware sampling
  • Space-efficient document listing [8]

See README in the package for further information.The implementation is available for download under the MIT / X11 License.

There is also a separate package containing implementations of additional document retrieval algorithms:

  • Brute-force document listing and top-k document retrieval.
  • New variants of PDL for document listing and top-k document retrieval [10].
  • Several variants of Sadakane's algorithm for document listing.

Full experimental results from [10] are also included in the document listing package.

News

  • 2014-07-23 This web page will no longer be updated. See the author's web pages for further releases.
  • 2014-04-18 An updated version of the document listing package, with a minor bug fix to RLCSA to support it.
  • 2014-01-11 A new version of the document listing package, and some new datasets from [10].
  • 2014-01-10 A new version with a number of technical changes. Required by January 2014 version of the doclist package.
  • 2013-05-21 A separate package with some additional document listing algorithms.
  • 2013-05-20 A new version with improved document listing support.
  • 2013-04-10 A new version with document listing support.
  • 2013-01-21 A new version with parallel sampling.
  • 2012-12-07 A new version with various changes accumulated over time.
  • 2012-11-05 More test data is now available online.
  • 2012-10-12 More test data is now available online.
  • 2012-02-14 More space-efficient construction. Option to use a succinct bit vector to mark sampled positions.
  • 2011-08-23 A minor update to support the August 2011 version of GCSA.
  • 2011-05-17 Support for distribution-aware sampling and an improved low-level interface for external modules.
  • 2011-01-17 A new version that supports the interface used in recent GCSA experiments.
  • 2010-11-25 Added a list of the data sets used in the experiments.
  • 2010-10-13 A new version with libstdc++ parallel mode support and performance improvements.
  • 2010-03-29 A new version with some compatibility and interface updates.
  • 2010-01-11 A new version with experimental support for LCP information.
  • 2009-11-25 A new version of RLCSA is available. This version includes bug fixes, more functionality and a bit cleaner interface.
  • 2009-06-15 The implementation of RLCSA is now available.
Library

Suds Library Required Free

Downloads

Current versions

  • RLCSA (April 2014)
  • Document listing (April 2014)

RLCSA

  • April 2014. A minor bug fix version to support the April 2014 version of the document listing package.
  • January 2014. This version includes a more consistent interface, faster SA construction for small alphabets, and some additions to the library. Required by January 2014 version of the doclist package.
  • May 2013. This version contains a simplified construction program for a single file as well as improved document listing support. Most importantly, the grammar rules in precomputed document listing are no longer run-length encoded if that increases their size.
  • April 2013. This version includes preliminary support for document listing using precomputed answers. A modified web graph compressor (Hernández, Navarro; SPIRE 2012) is required for building the document listing structure. The source code for it is available upon request. This version was used for the experiments in [8].
  • January 2013. This version includes a multi-threaded sampling algorithm for finding (almost) optimal distribution-aware samples quickly. There is also some undocumented work on document listing with highly repetitive data. This version was used in the experiments in [9].
  • December 2012. This version has many technical changes from the earlier versions. This version is required by the December 2012 version of GCSA.
  • February 2012. Construction algorithm for partial indexes changed from multiple Larsson-Sadakane algorithms to a parallel prefix-doubling algorithm. Succinct bit vectors can be used instead of gap encoded ones to e.g. mark sampled positions. This version was used in the experiments in [7].
  • August 2011. Minor updates to some library functions. This version is required by the August 2011 version of GCSA.
  • May 2011. The low-level interface has been improved in this version. Weighted / distribution-aware sampling is now also supported. The test program has also been extended.
  • January 2011. This version has a low-level interface compatible with that of GCSA. Also, the default sample rate was changed to more realistic 128, and the test program supports patterns in Pizza&Chili format.
  • October 2010. This version supports libstdc++ parallel mode as an alternative to MCSTL. There are also some speed optimizations (an alternative encoding for run-length encoded bit vectors and improved display()).
  • March 2010. Fixed some problems when compiling with a new version of g++. RLCSA now uses iterators and const operations to make parallelization easier.
  • January 2010. Includes experimental support for LCP information. This version was used in the experiments reported in [5].
  • November 2009. A cleaned up version with more functionality.
  • June 2009. This version was used in the experiments reported in [3]. There are some bugs that were introduced when cleaning up the code. You should use the November 2009 version instead.

Document listing

Suds library required
  • April 2014. A new variant of top-k PDL. Some technical updates and bug fixes. This package was used for the experiments in [10]. The package also includes full experimental results from [10]. Requires April 2014 version of RLCSA.
  • January 2014. A major update for the experiments in [10]. Requires January 2014 version of RLCSA.
  • May 2013. This is the initial release of the document listing package. The ILCP implementation is a rewrite of the one used in [8]. The package requires May 2013 version of RLCSA.

Data

Suds Library Required Pdf

The following data sets were used in experiments with RLCSA. Some of them are available fromthe Pizza&Chili Corpus (Chile,Italy).

  • DBLP archives and queries (available upon request) [6 (description), 9].
  • DNA sequences (with synthetic mutations in some cases) [1 (description), 2, 4, 5, 7, 10].
  • English language texts [5, 7, 10].
  • Finnish language words [10].
  • Human reference genome (NCBI build 34) [3].
  • Influenza genomes from NIH [8, 10].
  • Source code for OpenSSH version 4.7p1 and versions up to that [1, 2, 4].
  • Protein sequences [3].
  • Another set of protein sequences [10].
  • Saccharomyces cerevisiae and Saccharomyces paradoxus genomes from the Wellcome Trust Sanger Institute [2 (description), 4, 7].
  • Web queries (available upon request) and pages [6 (description), 9, 10].
  • Wikipedia archives: fiwiki, enwiki [3 (description), 5, 7, 8 (description), 10].

Suds Library Required

Library

The full test environment from [10] is also available upon request. Note that setting it up requires much manual work, and some of the implementations use 32-bit libraries.

References

  1. Jouni Sirén, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro: Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections.
    Proc. SPIRE 2008, Springer LNCS 5280, pp. 164-175, Melbourne, Australia, November 10-12, 2008.
  2. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki: Storage and Retrieval of Individual Genomes.
    Proc. RECOMB 2009, Springer LNCS 5541, pp. 121-137, Tucson, Arizona, USA, May 18-21, 2009.
    [Article] [Preprint]
  3. Jouni Sirén: Compressed Suffix Arrays for Massive Data.
    Proc. SPIRE 2009, Springer LNCS 5721, pp. 63-74, Saariselkä, Finland, August 25-27, 2009.
  4. Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki: Storage and Retrieval of Highly Repetitive Sequence Collections.
    Journal of Computational Biology 17(3):281-308, 2010.
    [Article] [Preprint] Conference versions: [1, 2]
  5. Jouni Sirén: Sampled Longest Common Prefix Array.
    Proc. CPM 2010, Springer LNCS 6129, pp. 227-237, New York, USA, June 21-23, 2010.
  6. Paolo Ferragina, Jouni Sirén, and Rossano Venturini: Distribution-Aware Compressed Full-Text Indexes.
    Proc. ESA 2011, Springer LNCS 6942, pp. 760-771, Saarbrücken, Germany, September 5-7, 2011.
    [Article] [Preprint]
  7. Jouni Sirén: Compressed Full-Text Indexes for Highly Repetitive Collections (PhD Thesis).
    Department of Computer Science, Series of Publications A, Report A-2012-5, University of Helsinki, June 2012.
  8. Travis Gagie, Kalle Karhu, Gonzalo Navarro, Simon J. Puglisi, and Jouni Sirén: Document Listing on Repetitive Collections.
    Proc. CPM 2013, Springer LNCS 7922, pp. 107-119, Bad Herrenalb, Germany, June 17-19, 2013.
    [Article]
  9. Paolo Ferragina, Jouni Sirén, and Rossano Venturini: Distribution-aware compressed full-text indexes.
    Algorithmica 67(4):529-546, 2013.
  10. Gonzalo Navarro, Simon J. Puglisi, and Jouni Sirén: Document Retrieval on Repetitive Collections.
    Accepted to ESA 2014.
    [Preprint]

Suds Library Required Documents

Jouni.Siren@cs.helsinki.fi