Clean Microsoft Word Pasted Text using JavaScript

Here is the function I wound up writing that does the job fairly well (as far as I can tell anyway). I am certainly open for improvement suggestions if anyone has any. Thanks. function cleanWordPaste( in_word_text ) { var tmp = document.createElement(“DIV”); tmp.innerHTML = in_word_text; var newString = tmp.textContent||tmp.innerText; // this next piece converts line … Read more

Mercurial and Word or PDF documents

Yes. You will be able to do meaningful diffs for MS Word documents. If you have TortoiseHg installed and you have set up a repository, right-click the file for which you want to check the diffs. On the context menu, click TortoiseHg > Visual Diffs. In the Visual Diffs dialog, select docdiff, instead of kdiff3. … Read more

Replace image in word doc using OpenXML

Although the documentation for OpenXML isn’t great, there is an excellent tool that you can use to see how existing Word documents are built. If you install the OpenXml SDK it comes with the DocumentReflector.exe tool under the Open XML Format SDK\V2.0\tools directory. Images in Word documents consist of the image data and an ID … Read more

extracting text from MS word files in python

Use the native Python docx module. Here’s how to extract all the text from a doc: document = docx.Document(filename) docText=”\n\n”.join( paragraph.text for paragraph in document.paragraphs ) print(docText) See Python DocX site Also check out Textract which pulls out tables etc. Parsing XML with regexs invokes cthulu. Don’t do it!

How to return generated file download with Django REST Framework?

Here’s an example of returning a file download directly from DRF. The trick is to use a custom renderer so you can return a Response directly from the view: from django.http import FileResponse from rest_framework import viewsets, renderers from rest_framework.decorators import action class PassthroughRenderer(renderers.BaseRenderer): “”” Return data as-is. View should supply a Response. “”” media_type=”” … Read more

Converting docx into pdf in java

In addition to the VivekRatanSinha answer, i would i like to post full code and required jars for the people who need it in future. Code: import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import org.apache.poi.xwpf.converter.pdf.PdfConverter; import org.apache.poi.xwpf.converter.pdf.PdfOptions; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class WordConvertPDF { public static void main(String[] args) { WordConvertPDF … Read more

Why words are shuffled when I insert English words in any Arabic/Urdu/Persian text on Notepad or MS Word?

For example: باللغة العربية “keyboard” انا أريد أن أعرف الكلمة Finish typing the Arabic word and add a space after it (this space separates the embedded text from the Arabic text to its right). Insert special character U+200F (to render the preceding space an Arabic character). The character name is “Right to Left Mark”. Insert … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)