Detailed Changelog
Detailed Changelog of ReadXplorer
Changelog:
Plugin updates (2016-09-16, 2016-08-21 and 2016-12-16):
Updates available via the integrated Update Center for modules 'readxplorer-api-1.2.1', 'readxplorer-ui-1.2.3', 'readxplorer-utils-1.1.4', 'readxplorer-tools-transcriptionanalyses-1.3.3', 'readxplorer-parser-1.2.2' and 'readxplorer-tools-readpairclassifier-1.1.3' fixing these issues:
- FIXED: TSS Detection: Manually setting min percent increase always overwritten
- FIXED: TSS Detection: Associating TSS for genomes with multiple chromosomes
- FIXED: Effective length of genomic features formula in help
- NEW: Timed progress shown for read imports (every minute at least)
- NEW: Feature type "Transcript" for references added
- FIXED: RPKM/TPM histogram IndexOutOfBounds error when only very few expressed genes
Version 2.2.3 (2016-07-15):
- Improved viewer scrolling speed (adapted to zoom level)
- Improved R integration and documentation
Version 2.2.2 (2016-05-26):
- Includes both minor plugin updates from 2016-04-06 & 2016-04-24
- CHANGED: Disabled R bundling for Mac OS X
Minor plugin updates (2016-04-06) and (2016-04-24):
A small but VERY IMPORTANT update is available via the integrated Update Center for the "readxplorer-tools-readpairclassifier" version 1.1.2 module. It fixes a bug. The second update concerns the TSS analysis: "readxplorer-transcription-analyses" version 1.3 and "readxplorer-cli" version 1.0.2 (command line version).
- FIXED: Missing error message for read pair import which could lead to abortion of the import
- NEW: TSS analysis export: Enable appropriate handling of promoter sequences overlapping chromosome start/end depending on linear or circular chromosomes.
- FIXED: TSS analysis export: Downstream promoter sequence is stored in separate column now
Version 2.2 (2016-03-24):
- NEW: Read Count Normalization: Added option to switch between effective length and total feature length
- NEW: Configure number of bp to export upstream (default 70) and downstream of a TSS to analyze for promoters
- NEW: Automatically infer best number of records in RAM during mapping import to prevent crashes
- NEW: For non-strand specific data sets, coverage can be projected on forward or reverse strand only
- NEW: Different reference identifiers in BAM and reference file can be unified either automatically (where possible) or manually in a wizard
- NEW: ReadXplorer version added to each exported table for reproducibility
- NEW: Differential gene expression and correlation analysis store last parameter selection now
Differential Gene Expression:
- M/A plot added for DESeq2
- Received an icon in the toolbar
- Rserve setup simplified for Unix
Internal:
- Switch from picard to htsjdk
- Unify logging behaviour (sl4j is used everywhere now)
- Fixed several bugs (see below)
Detailed (For 2.2 only the bugs need a detailed description):
Fixed several bugs including these notable ones:
1. BUG: Fixed missing line breaks in operon detection table for locus, product and EC number.
2. BUG: Fixed that overlapping start/stop codons were not shown correctly.
3. BUG: Fixed screenshot SVG export on UNIX.
4. BUG: Fixed TSS detection result export for multi chromosome references.
5. BUG: Fixed that combined track viewers are resized when viewer height is changed.
6. BUG: Fixed that unstranded RNA-seq data led to negative read counts in read count analysis.
7. BUG: Switching chromosomes now forces a data refresh on track viewers.
8. BUG: TSS detection fix for positions where the coverage increases from 0 to x by calculating percentage increase from 1 to x.
Minor plugin updates (2015-09-30) and (2015-11-21):
2 minor updates are available via the integrated Update Center for the "readxplorer-utils" version 1.1.1 and "readxplorer-transcription-analyses" version 1.2.2 modules. They add functionality to:
- NEW: Read Count Normalization: Added option to switch between effective length and total feature length
- NEW: Automatically infer best number of records in RAM during mapping import to prevent crashes
- NEW: Configure number of bp to export upstream of a TSS (default 70) to analyze for promoters
Version 2.1 (2015-07-06):
Short:
- NEW: Automatic Genome Rearrangement (Structural Variant) Detection by integration of GASV (https://code.google.com/p/gasv/)
-
NEW: Command line interface added for use of ReadXplorer in pipelines
Read Count & Normalization Calculation (former RPKM analysis):
- NEW: TPM (Transcripts per million) read count normalization method added besides RPKM
- NEW: Refined read assignment strategy
- NEW: Added feature start and stop offset option
- CHANGED: Improved resolution of small values in RPKM/TPM histograms
TSS analysis:
- NEW: Option to merge neighboring TSS in a configurable bp window
- NEW: Distinction of "primary" and "secondary" TSS in a configurable bp window
- NEW: Start- and Stop codon estimation for novel transcripts based on genetic code
- FIXED: Each strand option has its own read start distribution now
- FIXED: No parameter and result variation anymore with automatic parameter estimation
Differential Gene Expression:
- Improved handling by switch from RJava to RServe
Other:
- NEW: Enabled zoom and alignment height adjustment in Alignment Viewer
- NEW: Detailed Viewer supports track combination now
- NEW: Dashboard button to export track statistics of all tracks in current DB
- NEW: Configurable link to enzyme DB for features with EC-number
- NEW: Added locus, product & EC-number to Operon Detection result export
- NEW: Change temp import directory option
- NEW: Change max zoom level of Alignment and Read Pair Viewer option
- NEW: Change viewer background color option
- NEW: Last parameter configuration stored for Normalization Calculation & Operon Detection
- CHANGED: SNP Detection frequency parameter accepts double values (e.g. 0.5%)
- CHANGED: Smaller track name bar in viewers
- CHANGED: Work with SAMRecord read pair tags and remove pair tags from read names
Internal:
-
Switch to Java 1.8
-
Switch from Ant to Maven
-
Fixed several bugs (see below)
Detailed:
NEW: Automatic Genome Rearrangement (Structural Variant) Detection by integration of GASV (https://code.google.com/p/gasv/) for read pair tracks. All GASV options are available and GASV output is directly presented in the GUI as for all other analyses.
NEW: Command line interface added for use of ReadXplorer in pipelines. It is started by using the .../readxplorer/bin/readxplorer_cli*.exe files and enables automatic import and successive analyses with a single command.
Read Count & Normalization Calculation (former RPKM analysis):
- NEW: TPM (Transcripts per million) read count normalization method added besides RPKM (see manual for related publications)
- NEW: Applied read assignment strategy similar to HTSeq count's union model (see RX manual)
- NEW: Added feature start and stop offset option
- CHANGED: Improved resolution of small values in RPKM/TPM histograms
TSS analysis (see RX manual for details):
- NEW: Option to merge neighboring TSS in a configurable bp window. Only the most significant TSS is kept, the others are displayed in the new "Associated TSS" column
- NEW: Distinction of "primary" and "secondary" TSS in a configurable bp window, where the most prominent one becomes primary.
- NEW: Start- and Stop codon estimation for novel transcripts based on genetic code
- FIXED: Each strand option has its own read start distribution now.
- FIXED: No parameter and result variation anymore when using the automatic parameter estimation more than once on the same data set.
Differential Gene Expression:
- Improved handling by switch from RJava to RServe. Allows to run multiple DGE analyses and comparing their results instead of only one analysis. R installation has been eased.
NEW: Enabled zoom and alignment height adjustment in Alignment Viewer
NEW: Detailed Viewer supports track combination now
NEW: Dashboard button to export track statistics of all tracks in current DB
NEW: Added locus, product & EC-number columns to Operon Detection result export
NEW: Configurable link to enzyme DB for features with EC-number (Change DB via Tools->Options->Miscellaneous->Locations)
NEW: Change temp import directory option (also via Tools->Options->Miscellaneous->Locations). Especially useful with a small/full system hard drive where the import of large bam files fails due to disk space requirements.
NEW: Change max zoom level of Alignment and Read Pair Viewer option (via Tools->Options->Viewer).
NEW: Change viewer background color option (via Tools->Options->Colors)
CHANGED: SNP Detection frequency parameter accepts double values (e.g. 0.5%)
CHANGED: Smaller track name bar in viewers = more space for coverage and other components.
CHANGED: Work with SAMRecord read pair tags and remove pair tags from read names if they are still present, like all major mappers do. Note: GASV only works on data sets where the pair tags have been removed from the read name.
Internal:
- Switch to Java 1.8 = Java 1.8 is necessary to run ReadXplorer.
- Switch from Ant to Maven (mostly interesting to know for developers, does not affect things for the user :))
Fixed several bugs including these notable ones:
1. BUG: Fixed feature start and stop offset on reverse strand in Differential Gene Expression analysis. It was accidently subtracted from the stop on the respective strand.
2. BUG: Fixed interchanged 2 headers in SNP Detection result export
3. BUG: Sequence export of reverse strand features now stores reverse strand sequence. Before it was always the forward strand sequence.
4. BUG: *.fai not found exception fixed when reference and tracks were imported together.
5. BUG: Progress of import process fixed.
6. BUG: Fixed multiple mapped reads selection in analysis wizards.
7. BUG: Dashboard resize issues fixed.
Version 2.0.1 (2014-12-10):
Short:
- NEW: TSS analysis: Parameter for max feature distance added
-
NEW: TSS analysis: Stores selected analysis parameters & shows more statistics
-
NEW: TSS analysis: "Leaderless" column added to exported tables
-
NEW: Differential Gene Expression (DGE): Chromosome, Gene start & stop added to count tables
-
NEW: Progress bar added for count table export in DGE analysis
-
CHANGED: Success messages after storing DGE plots modernized
-
NEW: Feature & Coverage analysis both store last parameter selection now
-
NEW: Improved description of read pair track import
-
Fixed several bugs
Detailed:
NEW: TSS analysis: Max distance of a genomic feature to be associated as "upstream/downstream feature" to a TSS can be configured now. This also affects the "novel transcript detection" - only TSS without a feature are treated as belonging to a novel transcript.
NEW: TSS analysis: Stores selected analysis parameters and export of TSS tables contains statistics for: 1. portion of TSS having up/downstream features, are novel, leaderless, ..., 2. number and percentage of TSS having up/downstream features in certain distance windows (e.g. 2-5 or 101-250 bp).
NEW: TSS analysis: "Leaderless" column added to exported tables: All TSS with a feature within the given "Max distance for feature for leaderless transcripts"-distance are marked with "yes" - all others do not contain an entry in this column.
NEW: Differential Gene Expression (DGE): Chromosome, Gene start & stop added to count tables
NEW: Progress bar added for count table export in DGE analysis
CHANGED: Success messages after storing DGE plots modernized
NEW: Feature & Coverage analysis both store last parameter selection now
NEW: Improved description of read pair track import: "Complete distance" of pair has been replaced by more specific "Fragment length" (total length of the sequenced fragment including the reads, not only the distance between the reads).
Fixed several bugs including these notable ones:
1. BUG: Uniquely mapped reads parameter output in xls/csv fixed
2. BUG: TSS table: Sort positions bug fixed
3. BUG: Parameter estimation for combined TSS analysis & strand options fixed
4. BUG: Coverage statistics for multi chromosome genomes fixed
5. BUG: Classification Update message & delete file fixed
6. BUG: Fixed that create DB under MacOS via dashboard was not possible
Version 2.0 (2014-10-02):
Short:
- NEW FEATURE: Added Single Perfect & Single Best Match classification
- NEW FEATURE: Action for mapping classification update added
- CHANGED: Removed inclusiveness of mapping classes
- NEW FEATURE: Differential Gene Expression: DESeq2 integrated
- NEW FEATURE: Coverage Correlation Analysis officially supported
- NEW FEATURE: Feature Coverage Analysis: Outputs mean coverage of feature
- NEW FEATURE: CoverageAnalysis: Export of interval sequences added
- NEW FEATURE: Basic functionality to import VCF files added
- NEW FEATURE: Histogram Viewer: Added mapping class selection
- NEW FEATURE: Alignment Viewer: Added option to color alignments by base quality
- NEW FEATURE: Alignment Viewer: Fixed legend & options at top left
- NEW FEATURE: Enhanced reference feature table by: ncRNA, 5'UTR, 3'UTR, RBS, -35_signal and -10_signal
- NEW FEATURE: Added support for split read mappings
- NEW FEATURE: De/select all button for all track selections added
- NEW FEATURE: Mapping strand selection added for analyses
- NEW FEATURE: TSS detection: Combined strand analysis added
- NEW FEATURE: Viewer option menu added
- NEW FEATURE: ReadPair classification: Missing pair tags are inferred for two file import
- NEW FEATURE: Differential Gene Expression: Count table export for single track added
- NEW FEATURE: Differential Gene Expression: Plot legends added
- NEW FEATURE: Coloring options for mapping classes and Double Track Viewer added/restructured
- PERFORMANCE: Improved coverage queries for viewers
- RESTRUCTURED: Statistics table consists of key/value pairs now
- Fixed several bugs
Detailed:
NEW FEATURE: Added Single Perfect & Single Best Match classification:
In a good data set with no repetitive regions, almost all mappings should fall into these classes. A Single Perfect Mapping has exactly ONE perfect mapping, but there might exists more Common Match mappings (= worse mappings with mismatches/gaps) for the same read. The same holds for the Single Best Match Mappings: Only ONE Best Match mapping exists in the whole data set for that read. Still, more Common Match mappings are allowed to exist for the same read. This classification allows a finer granulated visual gradation from multiple mapped reads to uniquely mapped reads. Reads in the old Perfect or Best Match Mapping classes now mean, that multiple perfect or best mappings exist for the corresponding read! Therefore, it is strongly recommended to update all existing databases!
NEW FEATURE: Added an integrated update option for an automatic mapping classification update for existing databases at "File->Update Read Classification". It conveniently updates all tracks in the DB automatically. This is done by creating a novel extended bam of the original bam file used by ReadXplorer next to the old one. The novel bam then includes the updated read mapping classification for all reads. BEFORE THIS ACTION IS INVOCED, MAKE SURE THAT ALL TRACK FILES ARE STILL AVAILABLE AND WRITING RIGHTS ARE CORRECTLY SET IN THE DIRECTORY WHERE THE BAM FILES ARE LOCATED. The old bam files can be deleted when the update has finished successfully = You can view your data and the mappings are also classified in Single Perfect and Single Best Match classes and no errors occurred.
CHANGED: Mapping classes were inclusive before 2.0. Now the coverage values of each mapping class at a position have to be summed to get the total coverage value (sum is shown in tooltip of Track Viewer and can be observed when checking the scales of each viewer). This also means, that the track viewer now paints all mapping classes above each other from center to top (bottom for rev strand) (instead of overlaying them).
NEW FEATURE: Differential Gene Expression: The follow-up R-package DESeq2 has also been integrated into ReadXplorer and can be used now!
See http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html for their paper and details.
NEW FEATURE: Coverage Correlation Analysis: It allows to compare the coverage of exactly two data sets (tracks). The analysis calculates the correlation of the coverage in genomic intervals between both tracks and enables identification of both positively and negatively correlated intervals between them. Two correlation methods are available: PEARSON ( http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient ) and SPEARMAN ( http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient )
This analysis has been available for some time, but not yet officially announced. From now on it is officially supported.
NEW FEATURE: Feature Coverage Analysis: The mean coverage of each feature is calculated and output now
NEW FEATURE: Coverage Analysis: The interval sequences of identified intervals can be exported in fasta format now
NEW FEATURE: VCF (Variant Call Format for SNPs & DIPs) files created with external programs can be imported and viewed in connection to a selected reference (and its tracks) from the DB
NEW FEATURE: Histogram Viewer: Added mapping class selection, like in all other viewers via the legend
NEW FEATURE: Alignment Viewer 1: Added option to color each base of each alignment by PHRED scaled base quality. The quality is shown by shading the alignment color. Bright color = good quality, dark color = wore quality.
NEW FEATURE: Alignment Viewer 2: Fixed legend & options at top left of the viewport - now it never vanishes from the visible area.
NEW FEATURE: Enhanced reference feature table by: ncRNA, 5'UTR, 3'UTR, RBS, -35_signal and -10_signal. When any of these feature types is contained in a reference file, they are completely stored as all other supported features (CDS, Gene, ...)
NEW FEATURE: Split read mappings (especially useful for eukaryotic RNA-seq) are supported now in all viewers and analyses
NEW FEATURE: De/select all button for all track selections added
NEW FEATURE: The mapping strand can be selected for all analyses, where this feature is useful. This means, that e.g. for TSS analysis one can adapt the analysis to the data set and type of the sequencing library: The default option assumes libraries with reads on the same strand as the genes ("Feature strand"), "Opposite strand" is for libraries, where all reads map to the opposite strand of the gene and "Combine both strands" is for non-stranded libraries, where all reads falling within the boundaries of a gene shall be included.
NEW FEATURE: TSS detection: Combined strand analysis added. When "Combine both strands" is selected during the mapping strand selection (bullet point above), this option lets you define in which direction the TSS shall be detected - forward or reverse.
NEW FEATURE: Viewer option menu added: "Tools->Options->Viewer". It allows adapting the height and de/activating automatic scaling for all TrackViewers.
NEW FEATURE: ReadPair classification: Missing pair tags are inferred for two file import
NEW FEATURE: Differential Gene Expression: Count table export for single track added. This enables extraction of count data for any track of interest.
NEW FEATURE: Differential Gene Expression: Plot legends added
NEW FEATURE: Coloring options for mapping classes and Double Track Viewer added/restructured. All these colors can be adapted via "Tools->Options->Colors".
PERFORMANCE: Improved coverage queries for viewers - increasing responsiveness especially when scrolling in the Track Viewer
RESTRUCTURED: Statistics table consists of key/value pairs now. Therefore, the new version executes an update of each existing database, preserving all data already stored in the DB and reformatting the statistics.
Fixed several bugs including these notable ones:
1. Import: Read pairs can be imported as single end without name collisions
2. Read Pair Import: Sorting bam error now aborts import correctly
3. Import: Progress calculation fixed
4. Import: Added error msg when file not readable
5. TSS table export: Column order fixed
6. DESeq: M/A plot fixed
7. AlignmentViewer: Painting bug fixed
8. Fixed that GnuR mirror could not be changed
9. Reference Editor fixed
10. Occurrence filter issues fixed
11. Navigator table click now updates parents and subfeatures in Reference Feature Panel
12. Disabled update of navigator table layout as long as same reference viewer is active
Version 1.9.2 (2014-05-10):
Short:
- NEW FEATURE: All analyses + Track & Alignment viewer: Mapping quality parameter added
- NEW FEATURE: SNP Detection: Mapping & Base quality parameters added
- NEW FEATURE: Enhanced export tables
- NEW FEATURE: Custom genetic codes with stop codons
- NEW FEATURE: Alignment Viewer shows mapping and base qualities
- RESTRUCTURED: GBK and EMBL import is less strict
- Fixed some bugs
Detailed:
NEW FEATURE: Mapping quality parameter (PHRED scale) added for all analyses and the Track and Alignment viewer. The mapping quality is set by most read mappers and contained in SAM/BAM files. Details about how a mapper sets the mapping quality should be found in the manual of the mapper.
NEW FEATURE: Mapping & Base quality parameters added to SNP Detection. This enables a better filtering of unreliable SNPs from low quality regions not only based on coverage and SNP (allele) frequency.
NEW FEATURE: Added Locus, EC-Number, Product and Read class parameters to exported result tables
NEW FEATURE: Custom genetic codes now include stop codons and can be created via "Tools->Options->Genetic Code"
NEW FEATURE: Alignment Viewer tooltip shows mapping and base qualities
RESTRUCTURED: GBK and EMBL import parsers are less strict with missing entries in the locus line. In most cases the import should still work now.
Version 1.9.1 (2014-04-25):
Short:
- NEW FEATURE: Import of xls tables introduced
- NEW FEATURE: Export of csv tables introduced
- BUGFIX: Improved reference import error handling
Detailed:
NEW FEATURE: Import of xls tables introduced via File -> Import any data table.
NEW FEATURE: Export of csv tables introduced for all features (analyses) offering table export.
BUGFIX: Error handling of reference imports improved. Errors during fasta import were not redirected to the ReadXplorer console. This issue has been fixed.
Version 1.9 (2014-03-27):
Short:
- NEW FEATURE: Multiple chromosome handling introduced
- NEW FEATURE: CSV table import
- NEW FEATURE: Occurrence filter for comparative multiple track analysis
- RESTRUCTURED: Reference sequences stored in indexed fasta
- RESTRUCTURED IMPORT: Significantly reduced memory footprint, simpler, more detailed
- NEW FEATURE: Export read count tables
- NEW FEATURE: "Leaderless" TSS classification added including columns for distance to feature
- Some bugfixes
Detailed:
NEW FEATURE: Multiple Chromosome handling introduced. Reference files with multiple sequences can be imported via one click. Mapping files can containing mappings to multiple chromosomes. Reference Viewer and Navigator show a chromosome drop down list. Analyses automatically run on the complete genome and a selected result entry updates the viewers to show the correct chromosome position.
NEW FEATURE: CSV tables can be imported via "File -> Import any data table". Synchronization of genome position with selected entry is given, if the table contains a position column.
NEW FEATURE: Occurrence filter for comparative multiple track analysis. A click on a table header allows filtering results by number of occurrence in multiple analyzed data sets (e.g. allows filtering SNPs, which occur in at least 3 tracks or only in 1 track, ... - freely configurable)
RESTRUCTURED: Reference sequences stored in indexed fasta next to the input reference genome file instead of in the DB (genomic features still stored in the DB). This enables efficient handling of large references (e.g. human genome).
IMPORT 1: SAM/BAM first sorted by read name during import to significantly reduce memory footprint. ~1GB of RAM should be sufficient for most imports.
IMPORT 2: Multiple tracks can now always be selected for import on the same reference.
IMPORT 3: Number of processed reads shown in detail view of the progress bar (bottom right).
NEW FEATURE: Differential gene expression analysis has a new mode to export read count tables.
NEW FEATURE: TSS Analysis shows the distance to the next up- or downstream feature. Added max. distance parameter for "leaderless" classification of TSS.
BUGFIXES: E.g. hard-clipped bases in SAM-cigar recognized correctly and a few more.